Les Borsai from Dysonics, specialists in spatial audio, ponders whether spatial, 3D audio’s time has finally come with the advent of consumer ready virtual reality and what more needs to be done to keep it evolving.

borsai_2Guest Writer Les Borsai

Les Borsai is VP of Business Development at Dysonics, a leader in VR audio capture, creation, and playback solutions. Previously, he held executive positions at Avalon Attractions (now Live Nation) and later at Bill Silva Management and his own Modern Artist Management.

It’s no secret that audio has been the red-headed stepchild of the VR industry. That video has been the child hogging all the attention is obvious to even casual observers. True immersion engages all senses, not just the visual. Industry professionals serious about true VR integration need to step up their audio game. Consumers deserve the full experience. Anything less is a glorified tech-demo posing as the greatest revolution in entertainment and the enterprise since the introduction of color film.

We are not the only ones who have been advocating for audio. Forbes and Engadget have raised their voices in the past several weeks, highlighting VR audio offerings at CES and other high profile events. Wired has also weighed in, explaining why spatial audio is such a big deal for Google Cardboard.

Nvidia's VRWorks Audio Brings Physically Based 3D GPU Accelerated Sound

We applaud the mainstream tech press joining a wave that Road to VR and others in the VR specialist media have known for quite some time. But we wonder what else can be done to move VR audio technology forward in a real and tangible way.

Several suggestions come to mind:

1) Better VR audio capture

Immersion starts with being able to grasp the truest representation of the experience. Imagine the revenue possibilities for concerts and other live environments.

Dysonics’ 360 Degree ‘RondoMic’ and GoPro 360 Camera Array

This is not a specialized technology for sound connoisseurs. This is technology for everyone who wants to experience sound as a vital, living thing. We’re talking a device that captures true spherical audio, for pristine reproduction over headphones. You’ll hear sounds change relative to your head movement— from left to right, up to down, and near to far—exactly as you’d hear them if you were there live. This interactive audio element becomes a game-changer for live-captured immersive content, adding a critical layer of contextual awareness and realism.

The incorporation of motion restores the natural dynamics of sound, giving your brain a crystal-clear context map that helps you pinpoint and interact with sound sources all around you. These positional audio cues that lock onto the visuals are vital in extending the overall virtual illusion and result in hauntingly lifelike and compelling VR content.

2) Better VR audio editing tools

We need good VR audio tools. Tools that are powerful, yet easy-to-use: What Photoshop was to digital images, what Final Cut Pro was digital video. Content creators need a streamlined stack that will take them from raw capture to finished product. This is the only way we’re going to get content that’s more compelling and fully interactive.

Dolby Atmos and Audio Objects for Narrative 360 VR

Most content creators just don’t have the skill set or time to focus on audio. That’s why the company that wins in this space needs to craft a solution that is modular and easy-to-use. The complete stack would look something like: An 8-channel spherical capture solution for VR, plus post-processing tools that allow content creators to pull apart original audio, placing sounds around a virtual space with customizable 3D spatialization and motion-tracking control.

See Also: Oculus Rift ‘Crescent Bay’ is Designed for Audiophiles – Here’s Why that’s Important for VR
See Also: Oculus Rift ‘Crescent Bay’ is Designed for Audiophiles – Here’s Why that’s Important for VR

3) Better VR audio for games

Gaming and VR are two juggernaut markets that will see enormous overlap in 2016. Big advancements in VR audio will come with the creation of plugins for the major gaming engines (Unity or Unreal, for example), giving developers the power to deliver immersive audio to the next generation of VR games.

Audio-realism is critical to gaming. Imagine playing a VR shooter and hearing an enemy helicopter behind you, overhead in the distance. You look up and to the right to the right to find the sound becoming more and more centralized (just as it would in real life), allowing you to quickly pinpoint the helicopter’s location. You take cover as bullets whiz by you left and right in amazing surround-sound detail.

Even the most subtle of audio cues (gained from the incorporation of motion and lifelike spatialization) allow you to interact with sound sources all around you. So by activating arguably our most important sense, overall immersion goes up considerably, as does natural reaction time.

The Opportunity

In a recent whitepaper Deloitte predicts that VR is poised to have its first billion-dollar year. Growth will be attributed to use across multiple applications, with gaming providing a core activity driven by tens of millions. The numeric breakdown is $750 million in hardware sales; the remainder in content.

Adding Haptic Feedback through Inaudible Bass with SubPac

This is no minor milestone. It’s time for VR audio to shine.

This article may contain affiliate links. If you click an affiliate link and buy a product we may receive a small commission which helps support the publication. See here for more information.

  • I couldn’t agree more. Binaural audio is crucial to the sense of presence and I look forward to improvements in the workflow for this. Do you think this will require a new file format based on MP4 but the spacial/directional sound data embedded?

    • Rainabba

      Adrian, I do not represent Dynsonics to be clear, but they were kind enough to use one of our productions as a test-bed and to give us an opportunity to explore so I’m speaking based on my interpretation of conversations with the founder, not inside knowledge. That said, my understanding is that what gets delivered to the device is a proprietary, encoded stream that contains all the; channels and meta-data needed to allow the sdk to render a binaural image (approximately anyway) on the fly based on positional information fed to the sdk. The signal is of no use without the sdk unlike the approaches taken by Dolby and DTS where you can decode individual channels and they are processed from there by a positional engine. The big upside (and I say this as a true fan of audio, high-res, binaural, etc..) is Dysonics in theory does the best job of rendering the equivalent of real binaural in 360 degrees, rather than a totally artificial (if well done) positional approximation like Dolby Headphone or DTS Headphone-X. We’re looking at all 3 (and more really) and each has advantages and weaknesses depending on your workflow and needs.

      • Mathew Scheiner

        But couldn’t it be said that if sound sources were mixed in the 11 or 12 channel format of dts hp-X beforehand, their placement would be precise?

        • Rainabba

          Only as precise as the audio rendering engine. True Binaural recordings are already infinitely more precise though. Think about it like this: 3D graphics have come to the point where with enough time and resource, we can render scenes that cannot be differentiated from the real world until certain elements are introduced (like human faces). What if you didn’t have all the time/resource in the world (like on mobile) AND you had to have human faces? You can get close with real-world textures (audio samples), but a single photo will look INFINITELY more realistic. The same way we see very subtle things visually that are hard to even quantify, realism and in particular, positional cues in audio are very subtle so the approach Dysonics takes is more akin to a stereo photo than a 3D rendered scene. They do have to fill the gaps as you move between the pairs, but lets assume that do that subjectively as well as DTS or Dolby; the rest of the time they are virtually perfect (photo vs rendering). The thing is, that’s really important when you want to capture the real world. If you’re a producer making a hollywood blockbuster, that may not be what you want or need at all AND (not talking headphones now), when you’re mixing true surround for a theater, positional rendering because WAY less important and that’s why DTS/Dolby have been so successful. The popularity of headphones and now the undeniable rise of VR put a new emphasis on binaural (or at least fake HRTF representations) audio and it’s a lot harder :)

          • Mathew Scheiner

            Excellent point!

          • Malcolm Hawksford

            Very interesting discussion. To communicate a 3D soundfield requires each participant to employ listener specific HRTFs together with the ability for individual head rotation, this is critical for VR-style headphone listening. The following paper may be of interest as it describes how a symmetrical microphone array can be configured to embed multiple, user-specific HRTFs where each selected listening axis is set electronically.

  • REP

    Well…when im in Oculus Cinema, it better give me that true surround theater sound…or else im going to be disappointed with whatever spatial technology they incorporated into VR.

  • DonGateley

    This requires more than just capturing binaural audio. It is capturing the full first order (2D or 3D) sound field at the mic so as to obtain a binaural stream oriented properly in space from real time processing of that full sound field stream. That is very challenging both in terms of the amount of data that must be captured and conveyed with the image as well as the real-time processing thereof.

  • yag

    Well Oculus didn’t wait for this guy to work on VR audio…

  • Paul Doornbusch

    It’s not like high quality VR audio has not existed for a good 15+ years – I’ve been working on ambisonic and wakefield-synthesis (WFS) VR audio for almost 20 years and it works very well (refer to Place-HAMPI if you want an example currently being shown).

    The popularity of personal VR headsets means that the binaural rendering will probably be more important in the future. However, the capture side of the problem is well understood best solved with various ambisonic microphone arrays such as the standard Soundfield units, the upcoming Sennheiser one and the Eigenmic. It is silly and sad that people think they need to reinvent this.

    More accurate binaural rendering will make a positive difference, even though there is good research that we re-learn synthesised HRTFs. There is the new headphones on Kickstarter (OssicX I think) that claim to adjust the HRTF to individual listeners – that sort of thing will probably work well in a couple of years and it may be a genuinely significant advance (I don’t know).

    I do know that old school (e.g. 5.1 / 7.1 / 10.2 / N.M) film audio production techniques are not suitable for VR audio production. You need to use object-spatialisation and to separate the rendering form the production process. Amos and the Barco system have taken the tip from ambisonics and WFS in this regard, but have not really gone far enough.