Oculus Audio SDK impressions ces 2015 (2)Oculus’ latest feature prototype VR Headset, code-named Crescent Bay, was out in force at this year’s CES, blowing minds left, right and centre. But whilst debate rages over precisely what optics and display it contains, Crescent Bay’s audio has been somewhat sidelined. Here’s why Oculus’ work on building a dedicated audio pipeline with high-end hardware matters.

‘VR Audio’ – Oculus’ Latest Inititive to Improve Immersion

Oculus’ objectives at CES 2015 was fairly clear; 1) Get the new Oculus Rift ‘Crescent Bay’ onto as many people’s heads as possible and 2) Talk up VR Audio, the company’s term for their 3D positional audio pipeline. It seems to have been ‘mission accomplished’ on both counts judging by the sheer amount of mainstream press the company has received this year – all talking about 3D positional audio and their incredible experiences with Oculus’ latest hardware.

But whilst CB’s newly integrated headphones have drawn some superficial attention by the media and community at large, it may not be immediately obvious just how seriously Oculus is taking what you, the player, hear whilst in virtual reality. Also, if you’ve yet to experience truly effective positional audio whilst in VR (or any gaming experience for that matter), you may not realise that attaining ‘presence’, the industry term for psychological immersion, may rely so heavily upon it.

The Hardware

Oculus’ public mission for providing top-notch VR Audio began with their announcement at Oculus Connect that the company was to license Maryland University startup VisisonicsRealspace 3D Audio engine for inclusion in their SDK. This means that every developer  creating experiences for the Oculus Rift will have immediate access to a set of APIs that allow them to take advantage of 3D positional audio without the need to seek out proprietary solutions. In theory, lowering the barrier of entry for great 3D sound in games.

AR 'Mario Kart' Ride to Open at Japan's Upcoming Super Nintendo World

The Crescent Bay demos given at CES this year included updated demo’s featuring this new positional audio. VR Audio itself will probably make an appearance in a beta SDK release within the next few months – with a full release appearing before the consumer Rift ships. VR demo’s including RealSpace3D audio are however out there to try right now (see link below).

See Also: A Preview of Oculus’ Newly Licensed Audio Tech Reveals Stunning 3D Sound

Oculus’ newest publicly announced feature prototype Crescent Bay, also announced at Oculus Connect in September, embodied their VR Audio mission by including a pair of integrated headphones – a design that looked initially as a slightly incongruous afterthought. As it turns out however, the headphones are the end of a chain of steps Oculus have taken to provide the purest and most accurate audio possible.

Example of a USB Digital to Audio Converter in use
Example of a USB Digital to Audio Converter in use

Feeding those custom designed, integrated headphone drivers is a series of stages designed to cut electrical noise and keep the audio as clean as possible. Soundcards are bypassed with an external, inline DAC (Digital to Analogue Converter) taking pure digital audio information from the PC and converting it into analogue signals ready for amplification. Analogue audio is then sent through a dedicated amplifier stage, producing signals eventually interpreted by the HMD’s integrated, custom tailored headphones.

The above methods are well known to those seeking a cut above the standard in sound presentation from electronic devices. Far from representing high-end audiophile snake oil, bypassing on-board DACs and amplifiers found in every day consumer electronics, which generally designed and built purely with cost and not fidelity in mind, can yield some really impressive improvements in quality.

Facebook Slashes Price of Oculus Rift S to $300
The Crescent Bay prototype, sporting new custom-tailored earphones
The Crescent Bay prototype, sporting new custom-tailored earphones

The aim here, Oculus tells us, is to provide the cleanest, flattest response from the audio hardware pipeline as possible. The headset’s audio hardware becoming a blank canvas on which to convey a developer’s chosen audio design. At the same time, the audio path is predictable, so algorithms used to calculate and position audio cues in virtual space should also behave as expected. Oculus have essentially produced a reference design, a standard target against which developers can create.

The pipeline also allows Oculus to bypass any headaches caused by individual soundcard drivers, many of which include troublesome default equalisation (‘enhanced’ bass or treble), not to mention unwanted DSP (Digital Signal Processing – such as adding echo) effects often found lurking in the audio stack, provided by less discerning soundcard manufacturers. In many ways, this approach is similar to ‘direct mode’, Oculus’ attempt to cut out the middle-man in the Windows GPU driver stack to deliver the lowest latency possible to the HMD’s display.

All this means that, in theory, should Oculus follow this design through with their eventual consumer product, every user that buys an Oculus Rift at retail has the best possible chance at top quality and effective 3D positional audio right out of the box.

A Few Words on HRTFs

As with most of the incredible things our brains do for us, the way we perceive the world through sound is taken for granted. But the subtle detection of reflections and distortion that sound suffers on its way to our ears provide us with critical spatial information.

An example of HRTF and ITD in action.
An example of HRTF and ITD in action. [Credit: music.columbia.edu]
HRTF (Head-related transfer functions), is a somewhat unhelpful sounding acronym which refers to methods your brain uses to detect audio delays in the environment and property changes that your brain uses to judge relative distance between it and the sounds source. ITD (Inter-aural Time Delay: the correlation of time delay between sounds reaching each of our ears) and HRTF are used by our brains to build a surprisingly accurate aural landscape of the world. Unsurprisingly, emulating these cues can be extremely effective at convincing the brain it’s somewhere it’s not.

'Solaris Offworld Combat' Squad Update Coming Soon, Includes New Map & Gameplay Balancing

No HRTF is created equal however. As everyone’s head shape is subtly different, your brain is attuned specifically to it. Which raises interesting questions for VR Audio’s implementation. Will there be a calibration step that allows you to provide a 3D model of your head to ensure those audio reflection and occlusion is calculated accurately? I suspect not, but it’s likely that calibrating HRTFs for each user in virtual reality will be important in the future, as every other aspect of VR becomes more and more realistic.

From my own personal experience, I’ve come closer to achieving presence using spatialised 3D audio VR demos than any other so I’m heartened and impressed at the efforts being made by Oculus to ensure its use is not only supported but positively encouraged in future virtual reality content.

Combining high quality, custom components at every stage in Crescent Bay’s audio pipeline, Oculus has provided a way to ensure its vision for compelling and ultimately presence-enhancing 3D positional audio can be delivered to the consumer. Assuming these measures make their way into the consumer version, the Oculus Rift could wind up being the best sounding device in the household.

This article may contain affiliate links. If you click an affiliate link and buy a product we may receive a small commission which helps support the publication. See here for more information.

  • leoc

    Latency has to be a big concern as well. Things like Rocksmith illustrate that audio latency on PC can be a real problem: it can be difficult or impossible to eliminate a noticeable delay between striking the guitar strings and hearing a note play through your speakers or headphone. That kind of lag would really hurt VR positional audio too, especially when the user moves his head.

  • Neuromute

    Those integrated headphones look like the Sennheiser PX-100

  • Don Gateley

    This also implies that so long as the actual transducer in the integrated ‘phones is linear (i.e. no harmonic or IM distortion), which is not difficult to achieve with today’s magnet and materials technology, then with DSP they can make the frequency response of the pipeline all the way to a representative eardrum whatever they choose to design or emulate. For example they could be made to sound identical to a $350 AKG K702 just by some good measurement and the inclusion of a DSP kernel to convolve within the pipeline. (I’ve done and am doing measurement based work that fully demonstrates the efficacy of this.)

    Sounding like an existing device, however, is hardly optimal and not what they probably will do. Since they can give the entire path to the eardrum an arbitrary response, why not find the one (or several) that the most critical ears find the most ideal. That’s not an easy research task in and of itself but probably worthwhile in the scheme of things and certainly of great academic interest.

    There is even more that can be done with this but I’ll stop here.

  • japes98

    The guys over at Earmark Labs (www.earmarklabs.com) are doing some interesting stuff with personalized HRTFs.


    Great article! HRTFs are a great option for 3D spatial audio. To be honest, though, I was extremely disappointed by Crescent Bay’s audio. Spatial cues can be obliterated from a signal that’s using even the best HRTFs if the dB and EQ aren’t quite right. As a result, external room noise is a big factor. It was fairly quiet in the room where they demoed Crescent Bay at Oculus Connect, but those headphones had sad grip strength. I was hoping it was just because it was a new prototype at the time, but they don’t seem to have fixed it yet as of CES. When I demoed Crescent Bay, I had to press the headphones firmly against my ears to hear the spatial cues at all, and even then they were faint compared to what they could be. The hardware needs to provide more than just flat response. Noise cancellation needs to be their end game.

    • Don Gateley

      The response that you got when you held them tight can be achieved nearly identically with DSP while they are in normal contact. If that’s the sound they want delivered. My take on your experience is that they simply weren’t loud enough to compete with whatever cues remained incident from the room.

      I agree for all kinds of reasons that external sound must be isolated or cancelled but when dealing with things that are so sensitive to differential frequency response it is going to be very hard to get sufficient and consistent feedback control across the whole spectrum of sound and fits of interest. They are going to have to close off the room from both your eyes and your ears and do so physically not electronically if sound levels are to remain comfortable.

      Nothing that simply sits on your ear will do it for the difficult job of coordinating visual with auditory stimulus to achieve the immersion we and they want. The job of interpolating within an HRTF space using position data is trivial compared to the physical and empirical problems of ‘phone and coupling design with all the variables involved.

      In-ear buds can provide a much easier and more consistent solution but not everyone likes those things in their ears (and of course they would not be suitable for any demo room unless they were use-once and toss.)