Filters: Part II
Thanksgiving is behind us…it was a great day here in SoCal in spite of an hour delay on Pacific Coast Highway heading up to Ventura due to a traffic accident and a closed PCH near Trancas due to a water main failure on the way back. I guess I should stick to the Freeways.
I started talking about audio filters earlier this week. There are all sorts of audio filters and they’re useful at various stages in the audio production and reproduction path. I started the discussion by talking about the use of High Pass Filters during a live session to remove unwanted low frequencies. There are other analog filter types used during recording and mixdown sessions but I thought I would jump to the conversion side of the equation and talk about the need for LPF (Low Pass Filters) when converting analog to digital.
The Nyquist-Shannon Theorem deals with the conversion of a continuous signal into a numeric sequence. The following is Shannon’s version of the theorem:
If a function x(t) contains no frequencies higher than B cps, it is completely determined by giving its ordinates at a series of points spaced 1/(2B) seconds apart.
The essence of this statement of the theorem says that a frequency can be “completely determined” by a sampling frequency that is two times the original frequency value. The key here is the “completely determined” part. Yes, there are benefits to using sample rates that are higher…even much higher…than 2 times the highest frequency that you want to capture…but they aren’t absolutely necessary. But it does mean that you absolutely cannot present the conversion system with frequencies that are higher than the Nyquist Frequency (the sampling frequency divided by 2). This results in aliasing…a bad thing in digitizing audio.
I had a push pull with a person over at Gibson’s forum about this very issue. Craig Anderton wrote a piece on the world of high-resolution audio. His focus was on the benefits of making new recordings using higher sample rates. The comment is quoted below:
“There’s a theoretical reason 96Khz works so well. You need to sample any given waveform at minimum five times for truly accurate reproduction: Start (at the initial attack or zero amplitude); at peak positive amplitude; at the zero-cross point, at maximum negative amplitude, and end (final zero-cross). If the highest we can hear is 20,000 hertz, five samples of each wave (“hertz”) = 100,000 samples or 100KHz. Not many can hear 20,000 hertz, so 96K is usually considered close enough. The actual figure of 96Khz is used due to the mathematical nature of binary expansion.”
This completely misses the key point of the theorem. I used to believe similarly but revised my position after reading more carefully the Nyquist information. After I pointed this out to the writer, he responded:
“What’s known as the Nyquist-Shannon Theorem describes the MINIMUM number of samples to provide an INTERPOLATION of the analog signal (and a simple sinusoidal signal at that). For accurate REPRODUCTION, a greater sample rate is necessary. Some purists would argue that five samples would be grossly inadequate for a complex modulated envelope.”
His capitalizing of MINUMUM is ignores the “completely determined” statement in the theorem. The fact is that a sampling rate of 2x the highest frequency can absolutely reconstruct a complex musical sound. I didn’t want to get into a debate with him so I left the issue hanging.
But there’s an important issue that is required as we move from the theoretical to the real world. We need to have ideal LPFs with very steep slopes that maintain correct phase. Is this possible? Yes, it is.
To be continued…
As I read the opening of Anderton’s reply, I sensed the good Baron JBJ Fourier spinning at 2f rpm in his grave. Anderton’s “. . and a simple sinusoidal signal at that” evokes the presence of overtones, so his illustrative waveform contains higher frequencies that analysis would capture, and Fourier could pass the sampling baton to Nyquist. Dr Aix, our Doctor Mirabilis, thanks for your site. I open it each dawn to wake my sleepy cerebri. Cheers, James Marchment in Darkest Australia
Hi, Mark!
The sampling theorem states if a function contains no frequencies higher than B cps it is completely determined by giving its ordinates at a series of points spaced 1/(2B) seconds apart. I look at that in context of Fourier’s theorem which states a periodic function if sufficiently continuous can be expressed as sum of a series of sine or cosine terms… . The square wave is a complex function. I figure it is a good substitute for for the complex output of a musical instrument. Fourier’s theorem says that a square wave consists of an infinite set of odd order harmonics. My electronics instructor told me up to the 9th harmonic was enough to approximate a square wave. Let’s say we want to pass a 5,000 cps square wave through the sampling theorem. If the sample rate is 10,000 cps I think the output will look like a sine wave. If the sampling frequency is 90,000 cps, I bet it will approximate a square wave. That’s because the 5 kHz square wave is really the sum of the following sine waves: 5 kHz, 15 kHz, 25 kHz, 35 kHz, and 45 kHz. We must successfully sample all of those sine waves to accurately make the 5 kHz square wave. And isn’t that the goal of high resolution audio? To approximate complex musical events to create accurate output? Take a second look at what Craig is saying…I think it supports our cause.
Thanks for the comment. Your analysis is correct. We need to be able to accurately capture and reproduce frequencies up to our hearing threshold…including complex waveforms like square waves. Craig, I believe has got it…it’s the Tonto commenter that is messing with the facts.
This is becoming annoying. Mark read this.
Apologies, I mean Alex S read this.
Also these words of Anderton’s: “The actual figure of 96Khz is used due to the mathematical nature of binary expansion.”.
[a] What’s a sampling frequency got to do with binary expansion?
[b] Even if it did, binary expansion would lead us to sampling frequencies of 65.5 kHz or 131 kHz! Not 96.
Actually, these are not Anderton’s words…they are one of his readers comments, which is why I pushed back.
Nyquist theorem says that at least two samples are needed to accurately reconstruct a signal, as mentioned above. Only two samples per period only happen at Nyquist frequency, which is half of sampling rate (fs/2). However, there is no complex signal at this frequency, as the signal has to be bandlimited, according to the Nyquist theorem – here we are at your filter topic Mark. There is no a complex signal at exactly Nyquist frequency, only a sine wave as all other frequencies above fs/2 are filtered, respectively have to be filtered, in order to avoid aliasing. For all other signals below fs/2, more than two, better much more than two samples are given per period. The signal will be 100% reconstructed by multiplying the sample with sinc function. Therefore, in order to decide what sampling frequency and word length is necessary to capture and reconstruct the entire frequency spectrum, waveforms and dynamic of music signal, frequency bandwidth and dynamic are the parameters we have to look at. There is a lot of misunderstanding and wrong statements in the industry and further with music lovers at the end of the chain, simply by not knowing and/or understanding the sampling theory e.g. Nyquist-Shannon theorem.
Craig Anderton is right: you need more than two samples to reconstruct a complex sound and also with two samples only a sine wave can be reconstructed. He only does not come to the simple point that Nyquist exactly describes this, by saying that the very highest frequency in the music signal must be sampled at least twice – which means all others below fs/2 are sampled more than twice -, but the very highest frequency (overtone / harmonic) is a sine wave.
Given these facts, 24/96 is a perfect frame to capture the entire spectrum music provides. We can record ultrasonics till 48kHz (and there is quite nothing above this point) with dynamic of 144dB. Also transients are no problem to reconstruct accurately. Transients are tailored by very high frequency elements in the music signal. We have a relation between frequency spectrum and raise time (e.g. transients) of a signal. With other words, if we see transients, we also see the corresponding high frequencies in a spectrum analyzer. Thus, we have to check: does the chosen sample rate allow to capture these frequencies according to the Nyquist theorem.
This is very well put…thanks very much for the additional clarification. The essence for everyone to understand is that Nyquist-Shannon is dealing with the highest partials of a sound (that was allowed through the LPF at the ADC stage), which turns out to be a sine wave NOT a complex wave.
Once again the confusion arises because of the assumption that more samples per waveform equals more accurate interpolation (and therefore reconstruction) of the waveform. There’s just a complete misunderstanding (well actually no understanding) of the mathematics behind waveform reconstruction. I recommend this article to get a better understanding of how audio sampling works.
Thanks Dave…good read.