Downmixing 5.1 to Stereo
The holiday record that I’m preparing for iTrax was recorded Shawn Murphy at 96 kHz/24-bits. He mixed the project to stereo at 88.2 and then downconverted to 44.1 for the CDs. The mastering was done on the 44.1 CD version. Then he mixed the project in 5.1 at 96 kHz/24-bits. It was also mastered and transferred to a hard drive. It’s been waiting for someone to offer it as a download and Jim Self, the producer of the project, found me. But I want to offer it at 96 kHz/24-bits in stereo as well…or at least at 88.2 kHz/24-bits.
Jim checked with Shawn and was told that there isn’t a high-resolution stereo mix. So I have to create one. The options are as follows: go back to the original Pro Tools multitrack masters and remix the project to stereo at 96 kHz/24-bits. This would open up a whole can of worms…matching the CD mix, artistic choices, additional costs, and the mastering would have to be redone. Another choice would be to upsample the 44.1 mastered CD into a high-resolution version. This happens more than you might imagine and despite what others may tell you it’s a completely useless exercise. The fidelity doesn’t change after the original recording. The third and final method would be to downmix the 5.1 masters into stereo. That’s the road we’ve decided to go down.
What is downmixing? It’s the process of taking a multichannel track and creating a stereo (or mono) version of the that track by reallocating the CENTER, LEFT and RIGHT SURROUND and LFE channels to either the LEFT or RIGHT FRONT channels. It’s a process that happens all the time in your optical disc player. When I prepare a project for commercial release, I have to include downmix “coefficients” for the Dolby encoded audio just in case the 5.1 mix has to be played back through a 2-channel stereo system. Once these coefficients are dialed into the metadata of the associated file, the player will automatically detect them and mix and distribute the audio channels according to these parameters.
The CENTER channel won’t have a center speaker in a stereo rig, so it get equally divided and sent to the LEFT and RIGHT FRONT speakers. But because it’s now coming out of two channels instead of a single speaker, the level has to be attenuated by 3 dB. The 3 dB drop is necessary to keep the perceived volume the same. This attenuation is built in to traditional PAN POTs on recording consoles as well…we want the same volume as we pan a signal between the left and right. The LEFT and RIGHT SURROUND channels are reduced by about 10 dB and combined in to the corresponding LEFT or RIGHT FRONT channel. And finally, the LFE is sent to the stereo channels and attenuated by a few dB. These parameters are adjusted to personal taste. The amount of information in the LEFT and RIGHT SURROUNDs does affect how much you want directed to the front channels but the numbers I’ve mentioned are pretty typical.
So that’s what I’m doing to the tracks of the “Tis The Season TUBA Jolly” project. The stereo mix will be exactly the same timbre as the 5.1 surround mix and maintain the same balance of instruments. And I get to hum along with my favorite Christmas music once again.
That is an awful lot of work, but I’m glad you take the time and effort. I’m sure it will be appreciated by all that hear it.
A related issue is “how good are conversions?” If the original had been mastered at 88.2, I would naturally prefer it over it being converted to 96/24. This is a reason why I prefer 88.2 being the cutoff for HD-Audio; but I’m a follower of JAS now.
This does matter consider Transparent’s release of Haydn: http://transparentrecordings.downloadsnow.net/haydn-in-america Does there comment make sense:
“Provenance: Haydn in America was originally recorded to 8824 PCM (8824 is our short hand for 88.2kHz and 24-bit sampling). The 8824 WAV files are the original digital file generation sent to us. The DSF and FLAC files are considered second generation and made from conversions using our Blue Coast conversion methods. DSF and FLAC will offer the convenience of metadata that the WAV files will not.
“After several blindfold tests, it is our opinion that the 8824 wav files sound the best, followed by DSF and after that the FLAC 8824. The difference is minimal. We suggest you purchase files for your best performing home DAC. The DAC will make more difference than the file type.”
I purchase FLAC 8824, then converted them to AIFF.
I have no problem with 88.2 kHz/24-bit PCM as a capture and release format. The only reason to choose these specs is because the project is headed for a CD release….which is not important to me. I would prefer to capture at 96 kHz/24-bits and then release FLAC files with the metadata. This algorithm is lossless and is not a “second generation” copy. It is a losslessly encoded metatdata rich encode. You can always decode it back to AIF or WAV if you prefer.
Downconverting to DSD in a DSF file format is a bad idea. You’d be throwing away information as compared to the PCM version…but there’s a sound to DSD that some people like. I concur with the statement that the DAC is more important than the format…as long as you stay with PCM.
Mark, as always your engineering makes absolute complete sense. And Shawn Murphy is one of my favorite engineers, so this is a project I’ll be watching for.
I’m glad you mentioned that in the downmixing process there’s some latitude for taste and interaction with each individual mix. However, I can’t imagine a time when downmixing the rear channels at -10dB would even be conceivable without substantially altering the effect of the total mix. Maybe that’s desirable from time to time, but not with anything I can think of! Several years ago I mixed a 5.1 master of classical music for video, and the mastering engineer applied the coefficients for me (thank you very much), reducing the rear channels by merely 3dB. The end result was a regrettably dry mix because ambiance and reverb were no longer in the “sweet spot” of balance. (Maybe I should’ve had them louder in the surround mix.) If 3dB could do that, I can only imagine that 10 would be akin to dumping them altogether! So I’m curious: where have you encountered rear channels mixed so aggressively to justify so much reduction?
In my experience, it depends on the individual tracks and the amount of musical material in the rear speakers. I’ve set up coefficients for downmixes of the Allman Brothers, Bad Company, and others (live concert DVDs) and the -10 dB works. It means that the audience cheering and applause doesn’t drown out the music.
In the case of a studio recording like the Christmas project, Shawn prepared a “surround light” style 5.1 mix. The only thing in the read channels is room ambiance…no tubas. If I had moved those two channels into the front speakers the level of reverberation would be much too high. This was all digital reverb…not the actual sound of the room. I tried to match the sound of the stereo recording but using real 96 kHz/24-bits.
Thanks, Mark, that makes sense and is good to know!
The process on how the music was recorded and mastered in the first place is what is confusing to me.
Why is everything downgraded BEFORE the mastering and mixing is done?
Why not just master and mix everything at the highest quality possible,
then after that it can me downconverted to match the needs of CD and streaming codecs?
I really don’t know much about these things, but I do try to follow what is happeningin the HRA industry.
It just baffles me why so many people couldn’t care less about the fidelity and quality of their own music,
as well as the studios and producers responsible for the process that makes the product from beginning to end.
I asked the very same question. Why did they capture all of the sessions at 96/24, mix and master the 5.1 version at 96/24 but follow a completely different signal path for the stereo CD. I can only imagine that the mastering engineer had a preferred piece of equipment or capability that wouldn’t work at full 96 kHz/24-bits or 88.2. Their procedures were surprising.