VRChat and VR performances

As you can probably tell from my last few posts, I've been digging into Unity code to handle UI accessibility between desktop and XR platforms. Although there have been some great strides forward, there's nothing like forward progress to remind you of how much remains ahead. With that in mind, there's something that's been on my mind for a while and a tweet I recently saw really captured it:

That thread is worth a look-through. I've always been a sucker for game games. You know what I mean: games that are really a platform for players and creators to make insane stuff: LittleBigPlanet, Second Life, Roblox, Garry's Mod, etc.. VRChat is no exception.

A few years ago, late one night, I hosted a public room called "late nite computering" in Bigscreen Beta. A friend and I were just screen sharing dumb gifs and webms when a guy joined and chatted about his work in VR. That guy turned out to be Ron Millar, whose name I recognized from my old Blizzard game manuals. I gushed about reading the WarCraft II manuals over and over and poring over the art work, and got to hear anecdotes about his time there and Chris Metzen, so that was a childhood meet-your-hero moment come-to-life that I'm still not completely over.

Gushing aside, he told us about the work he had been doing on VRChat, and talked about how exciting the platform development was at that time (and still is). We spoke about social VR experiences we had thus far: Bigscreen was and truly is a comfortable feeling. It can feel immensely cozy having spatialized audio with head and hand gestures from the folks sharing a space with you. IMO, Bigscreen's advantage was having fixed seating with teleportation, so the distance between players is a feature. Ron talked about about what VRChat had to offer: all of the above, plus highly-customized avatars, player-created games and content, live events, concerts, and more.

Temptations

As a platform, VRChat solves quite a lot of problems that I'm anticipating with my own VR project:

  • Network connectivity.
  • So many avatar choices.
  • Interactive props!
  • Cross-platform support: desktop mode (KBM and gamepad), PC VR, Quest, and more. The more platform support, the better.
  • Voice chat between players. With spatialized audio!
  • Hosted rooms, sharing contexts and instances.
  • Social features: friends, invites, avatar/voice muting.
  • Content creation and sharing.
  • Plugin support for lots of nice Unity things: IK solvers, rich text, video playback (even with subtitle files!)
  • Honestly decent documentation!

So rather than try to reinvent enough wheels to build a semitruck, it's very tempting to consider VRChat as the platform for shared virtual performances. A self-contained solo-dev project simply won't be able to beat it in terms of audience, never mind the features listed above.

There's always a but...

However, there are always terms and conditions. For starters, the use of Unity components in building a VRChat world is restricted to a whitelist. Makes sense - arbitrary component support from Unity would probably be a nightmare. So, the nice karaoke text player component I made recently? The one that can make super smooth text like this?

0:00
/
Top: karaoke text component with smooth letter fill. Bottom: karaoke text component with discrete letter fill.

That's probably not an option. Which is sad, because it's made using the TextMeshPro component, which is supported, but it's also using its own shader and script to control the shader's uniforms, which doesn't look like it's supported. I might be wrong, but that seems to be the case.

There's also the future plans for this project. I've been working on making performance recordings, which essentially contain layered movement and audio data from each player (or from the same player multiple times) synchronized for playback. I've done this before with Kinect data, but our synchronization setup was painful and prone to drift - even for a single performance's video and audio!

For the movement data in VR, this would actually be fewer points than needed to record Kinect data (20 or 25 joint positions and rotations per player) since it primarily comes down to the avatar's root transform, plus the head and hand transforms. Saving these values and replaying them on a rig set up with FinalIK is – OK, stuff like that is always a bit of a pain, especially when it comes to setting up a generalizable prefab that can basically be spawned in and told to replay the movement – within reach. Even for desktop players, having the player's root position and rotation saved will allow the IK system to approximate the other bits: locomotion, orientation, head gaze. Granted the VR players will steal the show with more natural-looking movement, it's still exciting to think about! And it just doesn't seem possible with VRChat.

Capturing captivating performances

Like FinalIK, AVPro is another external tool supported by VRChat. In their capacity, I believe it's mainly used for audio/video playback. But it also has components for capturing audio and video. I've used these before to record the screen performance for Play the Knave. From the look of it, AV capture isn't on the supported list for VRChat. So if players want to record a performance in-game, they have to do it themselves. Granted OBS, Radeon Record, and ShadowPlay have all been excellent when I've used them. But there's a few reasons why we want to go with a game-based replay file (i.e., record the players' movements and voice), as opposed to a screen-based capture approach:

  1. Screen recordings from first-person VR perspectives are not especially compelling. It feels great to play a game in VR, but it feels less than great to watch someone else's first-person VR experience, especially as a recording. This is why some VR streamers make the extra effort to setup external cameras, use depth sensing or green screens to filter out the background, and synchronize an in-game camera's position with the external camera to make more compelling screen recordings. Beat Saber is great for this, and I believe Serious Sam VR: The Last Hope supports this as well, to name a couple.
  2. To develop the previous point: a screen recording's perspective is quite limited. Although AI+ML can do insanely cool things with video, it sure would be nice to have a customizable camera position for a scene replay.
  3. To develop the point even further: a screen recording's audio is also finite. Yes, the game may have spatialized audio – players at your side sound like they're at your side – but if we want to move the camera around for a replay, we have to make the audio respond in turn.
  4. Here's a new point: file size! A screen recording produces a video, which is subject to encoding choices (resolution, audio/video bitrate, framerate, etc.) for optimal file size. We can't get around paying the cost of recording audio, but certainly a performance's lossless[1] movement data will be considerably smaller than a high-resolution screen recording. Remember: we just have to save a few positions and rotations per frame per player, rather than wh pixels per frame.
  5. Another benefit based on file size: performance! Much kinder to the player's machine (or how about the server instead?) to task them with recording the movement data rather than record and encode a video.

Points 5 and 6 actually lend themselves well to the idea of running a server for multiplayer. You've gotta send movement and voice data between players anyway, so why not have the server just write that down as it happens? Ethics of data collection aside for a moment, this practice can be in the best interest of social VR server admins. You can review these performances to verify reports of abusive players and hold them accountable. I'm trying and failing to find a rant about this issue from someone I follow (I wanna say Doc Ok?). But the issue is an old one that repeatedly appears across multiple platforms:

I'll acknowledge that moderation is a tricky thing. But in the shared spaces that VR can provide, it's probably in your best interest to assume that everything you say and do can be subject to playback at a later time. As a kid, the advice frequently given in school was along the lines of "don't write anything you wouldn't want your mom to see on the front page of tomorrow's newspaper." So that's how old I am. Incidentally, VRChat does have a reporting process in place and they highly recommend including - you guessed it! - video recordings of the behavior being reported.

This post is getting longer than I expected, and I have more to say on this part in particular, so I'll wrap this up and shuffle the follow-up on game recordings to a new post.

In short

Overall, I'm keeping my eye on VRChat as a choice of platform when it comes to <secret VR collaborative performance project>. The component restrictions most likely make it a non-option for my specific needs, but it's hard to argue with a successful platform and its community. Still, it's nice to see so much that does work in one system. Inspiration, at the very least.

[1] Lossless movement data insofar as the hardware's tracking frequency dictates.