The Basics of Surround Sound Part 4

What is 2-Channel Immersive Audio?

When it comes to surround sound audio, the question has always been “What do I do with all of these speakers?” Even in a professional production environment, having the space to put a baseline 5.1 speaker system can be a major pain — and more than likely leaves the hope of a surround sound mix/playback system dead in the water for the surround curious.

That is until virtual reality and gaming began taking over the world and became the biggest customer for immersive mixes.

Looking for ways to bring more realistic and immersive sound to their respective visual experience, VR and game sound designers have been at the forefront of binaural surround mixes — simply using regular stereo headphones to convey a 360° immersive mix.

Examples of a binaural mic set up.

How Binaural Audio Works

Let’s start our understanding of binaural by just stopping and listening to where you are right now. So go ahead, just stop everything, and listen.

Unless you’re in the country or an isolated studio, you must be hearing something, right? For example, as I write this, I’m in Berlin hearing a baby cry outside, a phone ringing from the floor below, and someone dropped something onto the floor above me.

Hearing all of this is called “localization.” It’s our ability to differentiate the direction and distance of sound, as it comes at us from our environment — simply from our two ears. This gives humans and animals the ability to tell if a wild boar is charging us from the rear or if a bobcat is ready to drop down on us from high up in the trees.

Binaural recordings makes use of these differences in amplitude and time cues that arrive at our two ears by using two mics (often omni) which are spaced at ear distance, often with a dummy head or other human-like acoustic barrier placed between them. When played back on headphones, the original experience of directionality cues will be preserved, giving us an increased sense of “immersion,” and the feeling of actually being there.

Ambisonics

Ambisonic technology differs from traditional binaural recording in that it uses a closely-spaced, multiple mic array to pick up these amplitude and time differences, recording them onto 4, 8, or 16 tracks. These tracks are then played back through a hardware or, more likely, software matrix to mathematically decode the tracks back into a multichannel or 2-channel binaural playback environment — i.e. your stereo headphones.

The idea behind the ambisonic recording format is that a single multi-mic array can be used to pick up the overall sound and depth of an acoustic recording in a way that can be altered in playback after-the-fact by using a matrix plug-in to best fit the overall sonic width, depth, orientation, and clarity of the program material and playback configuration.

An example of a mic setup for ambisonic recording.

HRTF

Now, here’s where the going gets just a bit more complicated. HRTF (head-related transfer function) is a response that characterizes how your ear receives a sound from a point in space. As sound hits your ears, the size and shape your head, ears, ear canal, density of the head, nasal, and oral cavities, all transform the sound and affect how it is perceived, creating delays that boost some frequencies and attenuate others.

These delays and small changes in EQ work together to create cues that allow our brains to interpret where a sound in our environment is coming from. These complex interactions between our ears and brain can be captured mathematically as a "head-related impulse response," or HRIR.

By applying mathematical filtering, these impulse responses — when using headphones — can be applied to the left and right channels in such a way that allows your brain to be tricked into perceiving sounds as coming from multiple directions in your natural 360° listening environment.

Lastly, through the addition of reverb sounds can be placed at various, and changing, distances. This allows for possible dynamic changes in sound, panning, and distance as the listener’s headphone position changes in real time — all adding up to an interesting and enhanced listening experience.

What Does All of This Mean?

If all of this sounds very different from our traditional idea of recording and mixing sound and/or music on a DAW over speakers, you’d be right. This is a brave, new technology that makes use of physical, perceptual, and technological studies that are simultaneously cutting-edge and 100 years old. As you might expect, the various applications for the field of immersive audio are viewed as being the greatest in gaming and 360° video production … while some of us see possible uses in immersive music production, as well.

As immersive audio is still very much in its infancy, it’ll be interesting to see what the coming years will bring to the table for this new technology. As I see it, here are a few of the possible advantages and drawbacks to this emerging field:

The technology and how HRTF plug-ins integrate into a workstation are just barely walking up to the starting gate, let alone in the race. It’s all so new, that many of the software plug-ins are difficult to understand and use within a DAW session. Quite simply, much of the technology is being driven by people in tech and not by audio pros. As a result, there is often a deep disconnect between the developers and the users. Not a good thing.
Most of the folks who are most wanting to get into the field of immersive audio come from a tech or visual background and therefore have a limited understanding of audio production from a professional standpoint. However, this is also an opportunity for students and pros alike who are interested in pushing the technological boundaries of mixing immersive sound with vision.
Much of immersive audio requires a true blend of artistry, plus production and tech chops. Many pros in the audio field have a very limited understanding of how to truly make use of binaural and/or ambisonic production techniques, let alone HRTF mixing technology and techniques. Those who have a true understanding of how to mix the new with the old are a very rare breed, indeed.
An HRTF mix simply won’t translate easily over speakers. I say easily, because I’m not truly convinced that it’s impossible, just difficult to achieve. Suffice to say, for most situations, we’ll be looking at two, separate mixes — one for traditional stereo and one for immersive stereo over headphones. A messy situation to say the least.
HRTF mixes generally don’t sound good yet. Like I said, the idea of doing an HRTF mixing from within a DAW is in its infancy. There’s absolutely no magic bullet for pulling a successful HRTF mix off. So artistry, skill, and passion will almost certainly be the most important ingredients towards making a killer 2-channel immersive mix.

What Does All of This Mean?

If all of this leaves you scratching your head in relative confusion, you’re not alone. However, there’s a lot of money in the gaming and VR world that’s currently being thrown at these technologies. So much so that it will eventually have an impact on the pro audio community. Just how great an impact is yet to be seen — and heard.

— David Miles Huber

David Miles Huber is a four-time Grammy-nominated producer and author of the industry-standard text “Modern Recording Techniques.” His latest music and collaborations can be heard at davidmileshuber.com.