Level Expectations
I downloaded a few files from Pentatone Records a couple of weeks ago. The label, based in the Netherlands, has a great reputation for capturing and releasing classical music using DSD. Their titles are also available as downloads from their site. I’ll put up some spectrograms in a future post but I thought I would share some thoughts about audio levels and how relative and absolute amplitudes are handled.
The subject came up because the 96 kHz/24-bit and 44.1 kHz/24-bit downloads of a Mozart album were very different with regards to levels. standard resolution file (the 44.1 kHz one) was 6 dB hotter than the high-resolution file at 96 kHz. Additionally, neither one came close to using the entire bit depth available or dynamic range of the 24-bit PCM format. The original was captured using DSD 64 and then converted to PCM.
When questioned about the level difference and the process of production from DSD to PCM, the representative from Pentatone responded:
“We always take the original masters used to produce the SA-CD [NOTE: the DSD 64 files] versions and the CD layer. NOTHING is changed to retain the best possible quality. For the FLAC version we use the Weiss Saracon as sample rate converter, known to be the best around, to convert from DSD to 96kHz 24bit stereo and surround.
The 96kHz versions are converted to FLAC files to reduce server space and download time. This is a lossless process. When all files are available they get packed with extra metadata and MP3 files in a .ZIP container. I hope this answers your question and I hope you will enjoy the downloaded files.”
The sound of tracks was very nice and the spectra, as you would expect, didn’t extend past 22-23 kHz due to the noise shaping and filtering applied during the conversion. But the levels were obviously lower than I expected.
The maximum level of the files that I downloaded peaked at around -7 dB. Why would the mastering engineer adjust the levels to use the largest number of PCM bits? Could it be that the native DSD files were simply transferred without concern for the overall level? I could imagine that this was the case if they didn’t have the ability to adjust the levels of the native DSD stuff.
The response from Pentatone didn’t specify how or whether they did any editing or other processing to the original recording. Perhaps the tracks are kept in the DSD format without any level or EQ applied.
In the world the PCM, we can use a process called “normalize” to make sure that the highest amplitude sample just reaches the 24-bit limit. It’s a two-pass system that scans the multitude of samples looking for the max value. If that value is 3.85 dB from the 24-bit limit then every sample in the file will be increased by 3.85 dB. The relative amplitudes are maintained but the over level is raised. This is standard operating procedure for commercially released recordings.
But Pentatone didn’t do this AND the 44.1 version was louder than the 96 version, which is very curious. I still don’t know why.
I first noticed this problem when I finally got the raw DSD files from my SACDs. When you play these DSD files back converted to PCM the levels are about 6 db lower. I understand that DSD has a max level spec that is 6 db below that of PCM. So when converting from DSD to PCM you need to add compensation for this. Problem is you can’t add 6 db compensation uniformly because some DSD recordings have not observed the 6 db spec. and you run the risk of clipping if you just automatically add 6 db. I guess the mastering engineers at Pentatone applied too much compensation to the PCM conversions. Whether they did that intentionally, I don’t know.
But the levels of the two PCM files at 44.1 and 96 kHz were different. Something was missed by their engineers.
Thanks for the write-up and observations as usual, Mark.
As per the Scarlet Book specifications on SACD (DSD), the nominal maximum amplitude is supposed to be -6dB below PCM’s maximum 0dBFS. Of course not all SACD’s follow this specification.
By default, Saracon does a +6dB gain during the conversion. I suspect that when Pentatone did the transcoding, they must have left it at default +6dB on the 16/44 version and turned it off (0dB) for the 24/96 version… Perhaps this is what happens when a company is DSD-centric and can’t be bothered to do a 2 step conversion to maximize the PCM signal.
On a somewhat related note; it’s unfortunate that Channel Classics sells Saracon-converted DSD64 to 24/192 PCM at a higher price when all the frequencies above 40kHz have been filtered out.
Thanks for the clarification and additional information. This seems like a reasonable explanation. It makes no sense to me to record in DSD and then convert to PCM…isn’t the whole reason for DSD to avoid the “problems” associated with PCM?
It is very rare indeed that anyone would make two masters, one in DSD and one in PCM. It happens on rare occasions, but Pentatone is invested in DSD because they believe it best suits there needs, most especially since Pentatone is a multi-channel based label. They chose SACD rather than DVD-audio back in those days, and have been proven correct in that choice. Beyond multi-channel, DSD is believed by many to sound better, and this is the view held by Polyhymnia, which does the recording for Pentatone.
With that being the case, how else would you suggest they provide their music to both those who can and cannot playback DSD? Like I said at the beginning, two completely separate masters is pretty much out of the question due to economy. The master file is in DSD. Everything comes from that master. If they didn’t provide PCM versions, then the distribution would be limited to those with SACD or DSD playback capabilities.
The same can be said for many a label that records in DSD. They believe DSD offers the best representation of their work, and the DSD files are made available to those who can play them back. Then PCM conversions are offered for everyone else.
I think your point would only be valid if they recorded in DSD, but solely released in PCM. That isn’t the case, though. When I download from Pentatone, I always buy the DSD master file. And it sounds glorious. Some of the best classical high resolution audio you can ever buy.
Recording engineers choose to record using PCM or DSD, you’re absolutely right. But the post production process is where things favor PCM or analog because there are no tools to do many of the required things in native DSD. So Pentatone and others have to choose another format for reverberation, editing, fading, EQ adjustments, mastering, and more. My understanding is that most use DXD to accomplish the post production stages….which is a high sampler rate format of PCM. In fact, 2L records at 352.8 kHz and then does all of the work in PCM prior to downconverting to PCM and DSD. I think Morten at 2L has made the right choice in terms of delivering to DSD and PCM customers. A DSD 64 master specs out a little better than a standard CD but not by much.
I understand why a company would issue DSD versions of the their masters…there’s more buzz and money there. But there isn’t more fidelity as compared to a high-resolution PCM file at 96 kHz/24-bits.
Yes, this is a problem indeed. The 24/96 and 24/192 files offered by Channel Classics are identical. Buying the 192khz version is a waste of money and space. Even if actual recording had no captured frequencies above 48khz, you could still take advantage of the increased impulse response. Sadly, though, the filtering not only eliminates captured frequencies, it effectively eliminates any advantage in impulse response. A fact often overlooked.
Mark, while I agree with you that “standard operating procedure for commercially released recordings” is to normalize to the 24 (or 16) bit limit, you need to include the scope of the normalization. For much of popular music where the dynamic range varies from very loud to even louder and the peaks are at jet engine loud, you can get away with normalizing each track individually, but for genres such as classical the differences in dynamic range between movements is an essential part of the music. So to preserve the dynamics of the entire composition you must normalize across the all the tracks for that composition. Additionally, dynamic range differences between different compositions must also be preserved. For example, this past weekend I attended a concert where both Chopin’s Piano Concerto No. 1 and Janacek’s Sinfonietta were performed. The Chopin concerto loudness peaks at about fortissimo (ff), and the Janacek Sinfonietta loudness peaks easily exceed triple forte (fff). At the concert, I expected to and did hear the dynamic differences between these two pieces. Likewise, if I play recordings of these two pieces consecutively, I expect to hear the dynamic differences between them. In order to preserve these dynamic differences, the Chopin concerto would have to be normalized to a lower peak level than the Janacek Sinfonietta.
It is possible that Pentatone is normalizing across their entire classical music catalog. I have a downloaded Pentatone sampler that includes both 96/24 5.0 FLAC and MP3 2.0 files. The following is a peak analysis for each channel of each track in this sampler.
FLAC 5.0 96/24 Channels MP3 2.0 Channels
Track 1 2 3 4 5 Max 1 2 Max
1 -12.36 -13.22 -22.17 -19.18 -20.54 -12.36 -12.79 -12.34 -12.34
2 -3.85 -6.98 -10.96 -13.45 -13.19 -3.85 -4.56 -6.42 -4.56
3 -16.79 -14.78 -24.70 -22.79 -23.95 -14.78 -17.30 -14.65 -14.65
4 -11.01 -12.33 -11.83 -15.02 -16.32 -11.01 -10.14 -11.32 -10.14
5 -12.53 -10.96 -12.32 -16.97 -15.92 -10.96 -11.56 -11.74 -11.56
6 -4.48 -6.88 -3.92 -7.83 -9.21 -3.92 -3.71 -3.94 -3.71
7 -5.54 -6.94 -5.16 -13.59 -13.61 -5.16 -7.38 -6.77 -6.77
8 -4.88 -5.53 -5.79 -9.33 -9.90 -4.88 -3.94 -6.11 -3.94
9 -5.11 -3.27 -2.52 -9.57 -9.20 -2.52 -4.08 -3.88 -3.88
10 -2.96 -3.35 -7.91 -3.02 -4.00 -2.96 -0.64 -1.69 -0.64
11 -2.98 -5.92 -2.91 -6.11 -7.56 -2.91 -2.86 -3.56 -2.86
12 -3.85 -2.24 -3.16 -11.06 -11.85 -2.24 -1.87 -1.81 -1.81
13 -9.27 -10.10 -8.36 -15.84 -18.84 -8.36 -8.35 -7.14 -7.14
14 -1.35 -0.61 -0.30 -4.62 -1.76 -0.30 -0.33 -0.13 -0.13
15 -16.47 -14.69 -18.15 -19.28 -24.09 -14.69 -16.01 -15.85 -15.85
Track Titles
01 Schumann Waldszenen, Eintritt
02 Schumann Waldszenen, Jaeger auf der Lauer
03 Schumann Waldszenen, Einsame Blumen
04 Tchaikovsky Symphony No 3, Scherzo
05 Bach Concerto for oboe and violin, Adagio
06 Corelli Concerto Op 6-4, Allegro
07 Mendelssohn Piano Trio No 1, Scherzo
08 Tchaikovsky Symphony No 2, Andantino marziale
09 Beethoven Sonata No 21, Rondo
10 Korngold Violin Concerto Op 35, Finale
11 Blake Clarinet Concerto Op 329a, Round Dance
12 Rachmaninov Prelude Op 3 No 2 in C-sharp minor
13 Schubert Piano Quintet Trout , Andante
14 Shostakovich Symphony No 1, Allegro
15 Saint-Saens Symphony No 2, Adagio
The peaks range from -14.78db to -0.30dB for the FLAC 5.0 files and -14.65dB to -0.13dB for the MP3 2.0 files. This does seem to be consistent with normalizing across the entire catalog. For this download the MP3 2.0 peaks are consistent with the FLAC 5.0 peaks. Also, the dynamic range for these recordings is about 70dB, 16 bits would have been enough.
On listening, most of these 5.0 tracks are very engaging with a very wide soundstage that wraps around on the sides in a manner that is similar to what you hear from an orchestra floor seat in a good concert hall. They are good examples of the benefits of 5.0 over 2.0.
For the Mozart download, I don’t know why the 44.1/24 peaks are 7dB higher than the 96/24 peaks. It may have been unintentional. I find -7dB to be about right for a Mozart concerto when compared to a Shostakovich symphony, so my question is why they raised the level on the 44.1 file.
Sorry about the table format in the reply, I would be happy to provide the shreadsheet or a pdf.
You’re absolutely correct and I can accept the necessity to structure the gain across an entire album…but what does that mean for individual downloads? We can’t adjust levels for individual tracks as measured against a louder piece that may not be heard during that listening session.
Hi Mark,
I was reading a post in Linkedin about a new DSD recording available ( A new record! 48 tracks of DXD (352.8 kHz / 24bit) used to record Mahler’s First Symphony with the Oslo Philharmonic http://ow.ly/udhot ). When asked about the reason for such DSD higher sample rate, the answer from the Software Products Manager was
“…you are right that in terms of frequency content, the question of whether we can hear, or even feel above 20kHz is still debated. But, what is very much helped by higher sample-rate productions is the frequency response. Using higher and higher sample rates means that more representative transients can be captured digitally.
Have a look at this diagram here:
http://www.merging.com/resources/img/products/pmx/dsdresponse_big.png
you can see that as you move up in terms of FS, you develop a clearer and clearer representation of a transient sound.”
But, I am afraid that, as usual, this can only be part of the truth… Mark, what are your thoughts about that?
Thank you in advance for your comments and, just in case, sorry if I posted this matter in the wrong thread.
Best regards,
Jorge
Jorge, thanks for the comments. First, the DXD format is NOT DSD but actually PCM at a very high sample rate. I honestly don’t understand the quote. It makes no sense at all. I should write about the effect of ultra high (or even regular high) sample rates do nothing to give you more transients. The move ton 352.8 kHz is a numbers game AND so that the resulting recordings can be easily downconverted to DSD 64 and PCM.
“in the world the PCM, we can use a process called “normalize” to make sure that the highest amplitude sample just reaches the 24-bit limit. It’s a two-pass system that scans the multitude of samples looking for the max value. If that value is 3.85 dB from the 24-bit limit then every sample in the file will be increased by 3.85 dB. The relative amplitudes are maintained but the over level is raised. This is standard operating procedure for commercially released recordings.”
With the assumption that this is standard operation I would disagree. The matter of peak level versus sound level impression is important. Within all the formats; CD, SA-CD, stereo and surround and files stereo and surround, I would keep in mind a certain consistent sound level setting (volume) for the user. This would make you consider the program source material to keep in mind to what level to master to.
Perhaps in pop everybody wants everything as loud as possible and therefore normalizing might be the rule, but for classical I would want peak levels to be clean and contain as much dynamics as I would like within what I could get.
So a big Wagner opera scene with huge sound levels and dynamics would actually be mastered to peak 0 in order to keep the quality and, as much as possible, the volume impression. To listen to that with a “normal” volume setting and then play a normalized harpsichord recording would be really a shock. Therefore a harpsichord recording should not be mastered to peak 0. (or normalized when changing from DSD SA-CD master file to pcm file).
For the relative levels of individual tracks we take into account the peak levels of the whole (SA)CD or (SA)CD’s if it is a box set. That automatically rules out normalizing each individual track, as when you would play the whole program (not unheard of in classical music…) it would give you strange surprises.
We always convert the DSD files to 32 bit PCM and then produce the PCM files from that for the download files, using the peak level that works best for the desired playback impression. If you normalize a 24 bit file it does nothing to improve the dynamic range and the potential quality improvement the increase in dynamic range could offer.
32 bit DSD/PCM converted files contain all the dynamic information of the original DSD file with headroom. Levelling to anything within peak -10 would keep enough dynamic range to contain all the inherent audio qualities that are within the PCM file.
In other words a PCM file in 24 bit, compared to a 16 bit version that has 6 dB higher peak level, still contains (high resolution!!) tons more dynamic information way down than the 16 bit. So for sound quality reasons I might agree that to achieve the most from a 16 bit 44.1 file format (as for CD) using the available dynamic range is important, this is not such a big deal with the 24 bit files.
I’m not sure we’re on the same page here. I never said that normalizing should be applied to each individual track…although on commercial tunes it does happen and worse. As a mastering engineer of many classical projects for own label and others, it is common and necessary to normalize the entire record to take advantage of the 16 or 24 bit words. I don’t know of any engineer that would leave a peak less than the maximum of the system to “match” the levels of other recordings that might be at other levels or consist of different material. That’s a guessing game, at best. Maybe working through a boxed set makes sense too. We’re trying to create an even program level throughout. Very clear.
You’re correct that dynamic range is at its best in the source recordings. This is the place to use 32-bits (if you have converters that go that far). Using 32-bits after something is recorded doesn’t help. The dynamic range “in band” of a DSD 64 tracks doesn’t exceed 24-bits of PCM.
You lost me at the last paragraph when you said that “a PCM file in 24 bit, compared to a 16 bit version that has 6 dB higher peak level, still contains (high resolution!!) tons more dynamic information way down than the 16 bit.” Maybe you can explain this. The only difference between a 24-bit and 16-bit sound file is the potential for the 24-bit file to have a lower noise floor (or greater signal to noise ratio). I don’t grasp the concept of “more dynamic information way down”. Each bit of both systems is still only giving you roughly 6 dB of dynamic range.
To focus on the last comment (it is midnight here in Moscow, and luckily they return us an hour): My point exactly about the 32 bit conversion format from DSD, it will contain all the dynamic range from the original DSD file and more, and can then be used to create the “chosen” peak level limit, with respect to playback level of the source material (i.e. high full scale dynamics of Wagner or a low scale and lower dynamic range of harpsichord) to a 24 bit PCM file containing still the potential dynamics for any playback system capable of the best.
When you then consider the 16 bit file you lose 8 bits of added dynamic range, where 6 dB equals approximately 1 bit of word length. Within the 16 bit (even with good noise shaped dither) you would want to use the whole dynamic range for the quality of your original high res recording. (Or as much as possible as I still would not want to normalize a harpsichord recording for CD, as the perceived loudness would be way too high).
Adding 8 bits is the difference when using a 24 bit word length PCM file. Then the potential dynamic range does not interfere with the quality of a peak level at -6dB (or even a bit lower) levelled file from the 32 bit conversion. Allowing with high resolution to not be panicky about peak level to 0dB, thus making for more equal loudness effect of different types of acoustical music (realise that the mike pre-amplification for a harpsichord recording may be up to 15 dB’s higher than for a Wagner recording with the same microphones to achieve the same loudness level, let alone the same peak level, where the Wagner will still peak higher).
I wanted to explain that we do take into account the limits within the formats to deliver the best quality for each format. PCM 16 bit is the lowest in that sense. Levelling for sound quality reasons is then much more important than for a 24 bit file. With the added bonus (sound quality wise) of the sampling rate of 96K allowing for a wider frequency response, which for many reasons is important to the sound quality. That is another discussion though and not at its place here.