Networking in Unity

The ever-changing standards of network support in Unity has made me more apprehensive about this working on this feature than any other one. For two prior projects, I used UNet, but that's been deprecated for a while. For a game jam, we used Photon, and for Play the Knave, we had a custom solution implemented as a .NET plugin. Granted, there we only really used localhost connections to let the game and KinectDaemon talk to each other and upload game files to the server.

Before jumping in, I spun my wheels for a bit researching current solutions. Netcode for GameObjects doesn't seem quite ready, but I had heard good things about Mirror as a spiritual successor to UNet. But, I hit some issues that couldn't be resolved after a couple days of debugging, and I wasn't that deep into integration, so I jumped ship and am trying out Fish Networking. Special thanks to this Google Sheet for helping me navigate the options.

So far, it's been promising! I've been able to get two editors running and connected to each other with player movement being properly broadcast. At the moment, I'm testing how well it works with a VR headset. Good news - the Quest 2 connects to an Editor session and we can see each other's movement! Bad and funny news - the Quest player can locally control both their avatar and the other player's. There's also a bit of a strobing issue on the Editor side; the VR player's movement is being shown on the networked player, but the position data is intermittently popping in and out of place. I've not dug too deeply into the configuration to find the cause, but it's nice to make rapid progress. Soon, I'll need to start experimenting with dedicated servers and headless builds to have a server instance running on Solaire. I've also got my eye on the plugin Dissonance (a namely homage to Discord?) for voice chat, which supports Fish Networking as a transport.

Prototypes and tests

Prior to any of this in Unity, I spent some time over winter break experimenting with peer-based networking in the browser. Different ball game - as is demonstrated when you look at Unity networking plugins and every single one has caveats and workarounds listed for WebGL builds - but nothing too crazy. These tests were basically "how do I get 2D player data shared between multiple instances using divs and some client-side JS?" And I built a few different versions, each with client-side JS:

  • PHP middleware, MariaDB storage (lol)
  • PHP middleware, PHP session storage (forcing all clients to access the same session with a server-side predetermined ID)
  • Node middleware using socket.io
  • Node middleware using PeerJS
  • Node middleware using geckos.io

I treated these as a warmup for (and distraction from) Unity-side networking. There's something very satisfying about VS Code, a terminal, and a browser for your dev env. And let us never forget the incredibly powerful browser DevTools 🙏🏼. It did bring up some interesting questions:

  • For small groups ( < 10 players) in collaboratory/social VR, do we want peer-to-peer networking or a client-server arrangement?

Peer networking is appealing - a server is still needed to match players together, but then something (e.g. WebRTC) handles communication between clients afterward. But it gets worse with increased players, and there's no central authority in cases of logging or abuse.

  • If we use client-server, do we want to allow players to be hosts, or should we use a dedicated server?

The default for Mirror and Fish is client-server, and out of the box it allows a player to be both the host and a client. There's also support for headless builds (Unity dedicated servers), which will be helpful for persistence.

  • With networked play, how should we approach performance recording?

Initially, the project is set up to record one player's movement and voice to disk storage during a performance. This works fairly well on Quest 2, which is currently the most critical platform (I'm not as worried about disk performance on desktop VR). But I don't know how well it will scale with multiple players at a time. Granted these recordings have stayed fairly small, and there's room for some optimization in audio compression and delta transform tracking. Additionally, the recording system for movement data depends on tagged components with unique ids - these ids will also need to be synced across the network before recording begins to avoid crossover issues. So player 1's set of IDs is the same on all ends of the network.

Those are all reasonable problems with solutions I can more or less imagine before committing to code. The trickier one is below:

Timing time

  • Given that the network introduces lag, and synchronization is important for performances, how do we rectify the lag with the performance?

By default, each client (besides the host-client) has a bit of lag. Here's a scenario:

  • Players P1 and P2, where P1 is the host,
  • P1's lag: 0 ms and P2's lag: 25 ms (a generous measure on a good day for Spectrum)
  • P1 picks a 2-player scene and clicks start. A 3-second countdown begins before the lines start playing.

The "Start scene" action can be decorated as a remote procedure call so that all connected clients run the same code with the same data. But whether they should run the command at the same actual time is less clear.

  1. There's always a 3-second countdown before a scene begins.
  2. P1 clicks start.
  3. Each connected player knows their latency.

What about offsetting a player's start time by their latency? After the start scene command is received, P2 could begin the scene after 2.975 seconds instead of 3. From P1's perspective, P2's lines and movement come in right on time. But for P2, P1's data would be twice as late, having started the scene early and still having to account for their lag. If P2 starts after 3 seconds and P1 starts after 2.975 seconds, then it's the opposite problem. If both players start the scene "on time", then P1 will feel P2's 25 ms lag, but P2 would seemingly get P1's data right on time!

I'm amazed whenever I play social VR games that handle this problem well, VRChat being at the top of the list. I need to revisit my Gaffer on Games for further reading.

In Record Time

We can record performance data, but how should that work in the networked version?

  1. Each client records their own performance and just that.
  2. Each client records performances of all players.
  3. The server records all performances.

Option 1 is quite appealing - locally the data is all on-time, and consolidating the files after the scene ends should give a reproduction that is fairly faithful to the actual timing. You'll bear witness to lag still - P1 will physically react 25 ms too late to P2's gesture - but it's a close match to how things "should be" for a finished recording.

Option 2 can help reveal differences between experiences for debugging, if it's not too much strain on the disk I/O for the Quest 2.

Option 3 is a nice choice as well, if the server-that-is-also-a-client can keep up with the quests. But if #2 works, then this should too? It's also a good option for a dedicated server and keeping a log of player behavior.

Knowing me, I will most likely ignore the recording challenges until networking is working as I like it, then return to recording and break many things in the process – the fixes of which will in turn break some things on the networking side, and so it oscillates until a steady state is reached. Ah, agile software development. It's too ambitious to expect all of these issues to be worked out before the semester starts again in a couple of weeks, but hopefully we'll have enough in-motion to build on and flesh out by then.