How OpenAudioMc handles near-perfect music and voice synchronization 🕒 🔛🔊

OpenAudioMc is often used for in-game events where music plays an important role, if not a leading one. Usually, this is in some sort of show that is supposed to go in-sync with what is happening in-game.

That means that we want to trigger effects or commands without delay and keep response times quick and snappy.

We have to accept that with the magic which is the internet, we will always have at least a few milliseconds delay due to the communication itself, and above that, we have our own handling time.

Just like everything involving time, we need a reliable clock to use as our base. This, however, is where our problems start popping up. It is never a guarantee that the User is in the same timezone as their host or even that their clocks are in perfect sync. So, we build our solution!

To stop the endless debate over if it’s 1 AM or not we set up a central time server, and use it to not only push the correct time to all Users but also eliminate the problem of fluctuations in ping.

We do this by having our server push its UNIX timestamp to all clients, exactly every 5 seconds. When a User receives the server’s timestamp it starts to do some calculations. First, we want to see how much we differ from the server. We do this by subtracting the server’s timestamp from our own which will leave us with the offset in milliseconds.

Although this is ways better than assuming the servers time, we still aren’t accurate. All we know is what the timestamp of the server could be. Our magic value still contains other traces of dirt like the handling and the unpredictable Ping.

To smooth our measurements out we keep track of our last 10 results. When our 11th timing comes in we start the next step.

We go over all our queued measurements and subtract the measurement from its next neighbor in the queue and then subtract 5000 from that, for the five seconds interval that we have. What we then are left with, is an array containing the times in between the timestamps that the server pushed. In the perfect world, these would all be all zeros because the server sends it’s timing exactly every 5 seconds, and there’s no noise or difference in that. But, just like with all internet traffic, those timestamps we receive don’t go through right away. There will always be some minor fluctuations due to ping, and we just calculated what those were, what a coincidence. All we have to do is average our 10 numbers out and we are left with the estimate ping from the last 50 seconds. We store this value along with our original offset for and update them when we have new data.

With all that set and done, we can now calculate the exact central time whenever we need it. We can simply do this by taking the local timestamp of the User, add the offset and then subtract our average ping and just like magic, we have our central time without any delay!

Now that we finally all can agree about what time it is (I know, real childhood fights) we can start using it to do what we want, play music!

Let’s say that our host (or Minecraft Server) sends a command to start a song. It would send the source, properties and the central timestamp at which the command was originally executed. So our client would receive something like

JSON packet send by the server

The User receives the packet and quickly notices that it is supposed to use the create media logic. Finally, it says, “Something I was trained for”.

The first thing we want to do is check how late we are at the party. We calculate the current central timestamp, which is 1567071550 and subtract the start instant from the packet. We are left with 325, which means that we received the command 325 milliseconds late. We can account for this by loading the sound and not starting from the beginning but 325 milliseconds in. This means that the sound for this User might start a tiny bit later, but is still in sync with everyone else. Like starting later but with a small boost to catch up.

This same principle is used for voice chat, to keep everyone’s clock in sync. If you want to read more about how the voice chat is built from the ground up you should keep an eye out on my since there will be a follow-up article on that topic.

And that’s it!

After trying multiple solutions, this one has proven to be the most reliable across the board no matter of timezone or link speed. The final solution was quite simple for the big problem that it solved and was a lot of fun to develop.

Thank you for reading this first blog-post-thingy-ish about the inner workings of OpenAudioMc, let me know what you think!

Software developer for the dutch VPRO