Oculus have released a dedicated Unity plugin which aims to automatically detect and transform an audio stream from speech into virtual character lip movements automatically.

It seems that the substantial investment in research and development at Oculus over the last couple of years is leading to some welcome, if somewhat unexpected developments to come out of their labs. At Unity’s 2016 Vision VR/Summit, the company have unveiled a lip sync plugin dedicated to producing lifelike avatar mouth animations, generated from analysing an audio stream.

The new plugin for the Unity engine analyses a canned or live audio stream, such as microphone captured live voice chat, into potentially realistic lip animations for an in-world avatar.

The system processes the audio stream and created a set of values called ‘visemes’. For those not familiar with the term (and I certainly wasn’t until just now), a viseme is a “gesture or expression of the lips and face that corresponds to a particular speech sound”, according to the documentation accompanying the Plugin release. The practical upshot of incorporating the functionality then, is that your game-world counterparts, whether human or AI controlled, will look as if they’re actually saying what your hearing. At least in theory. The plugin’s documentation further elaborates:

Our system currently maps to 15 separate viseme targets: sil, PP, FF, TH, DD, kk, CH, SS, nn, RR, aa, E, ih, oh, and ou. These visemes correspond to expressions typically made by people producing the speech sound by which they’re referred, e.g., the viseme sil corresponds to a silent/neutral expression, PP appears to be pronouncing the first syllable in “popcorn,” FF the first syllable of “fish,” and so forth.

Whilst there are multiple use cases for this technology, along with the recent release of Oculus Social for the Gear VR platform, you have to think that the company’s focus on ‘selling’ the idea of useful human interactivity in virtual reality is pretty strong, given the how tangential this release is to core virtual reality technologies.

You can grab the new plugin for Unity 5 right here. And if any developers out there have a chance to try it out, we’d love to see some examples of it in action.

This article may contain affiliate links. If you click an affiliate link and buy a product we may receive a small commission which helps support the publication. See here for more information.


Based in the UK, Paul has been immersed in interactive entertainment for the best part of 27 years and has followed advances in gaming with a passionate fervour. His obsession with graphical fidelity over the years has had him branded a ‘graphics whore’ (which he views as the highest compliment) more than once and he holds a particular candle for the dream of the ultimate immersive gaming experience. Having followed and been disappointed by the original VR explosion of the 90s, he then founded RiftVR.com to follow the new and exciting prospect of the rebirth of VR in products like the Oculus Rift. Paul joined forces with Ben to help build the new Road to VR in preparation for what he sees as VR’s coming of age over the next few years.
  • Ashley Pinnick

    I definitely would love to be able to add lip sync to my VR game, if I get it working nicely I’ll send some screen capture!

  • Full_Name

    Since Oculus seems to focus on front facing cameras, you would think they could also use that as input to capture the users mouth. Shouldn’t be too hard figuring out where it is considering it is right below the HMD….

  • Sven Viking

    (I realise this is an old post, but it was advertised at the bottom of another article.)

    I see they got Elizabeth Holmes to model for (and voice?) their avatar.