Beyond Dither: Noise Shaping And Emphasis
In dithering a signal, we have seen that we trade an obtrusive distortion for a low level hiss, the total power of the conversion error of which is largely the same. Noise Shaping makes another trade to further improve fidelity by distributing the quantization error to less noticeable parts of the spectrum, although the total power is unchanged. Noise shaping and dither used together will redistribute both the dither and the quantization noise, leaving our most sensitive range of hearing free from the noise. No discussion of Noise shaping would be complete without a brief discussion of psychoacoustics. Human auditory perception is not linear. We hear signals of the same amplitude but different frequencies at different subjective volumes. Human hearing is most sensitive between 500 Hz and 5,000Hz (5KHz). This also happens to be centered on the range of human speech and many musical instruments. The principle of noise shaping suggests that if we shift much of the dither to the high end of the spectrum where human hearing is least sensitive, sound in the most sensitive range of our hearing, where much of our desired musical program material also resides, will have a much lower noise floor than with flat dither. This lower noise floor can be much closer to that of a 20 bit system for example. When we listen to the resulting program, we will perceive it to sound very much like an actual 20 bit recording while not noticing the increased noise floor in the higher frequencies. A popular bit reduction scheme based on the noise shaping process is Sonys Super Bit Mapping. There are many other noise shaping curves and several pieces of equipment that perform the task. Some curves tend to work better for some music, while others excel with different program material. Pre- and de-emphasis are old news in vinyl record cutting and FM radio broadcasts. Normally, the energy in the treble range of recorded music is lower than in the midrange or low end. This is exploited to improve the perceived dynamic range by boosting (pre-emphasizing) high frequencies to more fully occupy the available space. On playback, the signal is de-emphasized, restoring the original frequency response of the program and also reducing noise and distortion from the original recording chain. Pre-emphasis is supported in the CD standard, and was used early on to combat poor converters. It is, for the most part, unused currently. Some modern recording techniques leave no high frequency headroom for the process, and many mastering houses might forget to use the appropriate flag when preparing the master. The pre-emphasis curve specified in the red book standard requires 9db of headroom at 15KHz which makes it difficult to use in many of the current pop recordings. It is unfortunate, as the potential benefits of some kind of pre-emphasis are significant and relatively easily attainable.
What Does The Future Hold for Resolution ?
So what do we really need to provide a transparent audio channel? How many bits? How high should the sampling frequency be? First we need to agree on what constitutes transparency in reproduction. This is the toughest part yet. There are so many differing opinions. For the purposes of this discussion, lets say that 120db dynamic range is what we are looking for (well deal with frequency response later). This means that if you turn on your system, the music level can go from the threshold of hearing up to 120db. This is louder than most rock concerts, and getting close to the threshold of pain (past my threshold quite honestly!). The system required to play back music without distortion or compression at 120db with a uniform frequency response would be quite a system indeed, and one that most people couldnt have, nor would they want to listen at 120 db even if they did! Besides, at that level, the non-linearity of our hearing mechanism makes greater levels seem pointless. OK, lets pick 120db for purposes of this discussion. Frequency response we will deal with later. Weve already established that dither is absolutely critical to the transparent reproduction of sound using a PCM digital audio system. Measuring the noise floor of a dithered 16 bit signal (assuming 120db to be full scale) clearly shows the noise to be within our aural acuity. Even an 18 bit word with flat dither measures within our ability to hear. A 20 bit word places the noise below the average persons ability to hear it. So is it that simple? Is 20 bits the answer? Well, not necessarily. We need to remember that some people have a slightly lower threshold of hearing than the average. We also have to remember that on successive DSP calculations, the dither required to prevent truncation distortion for the DSP process (whether it be mixing, gain change, EQ, compression etc.) will add a bit more noise to the signal on each operation. The most sensitive listener may be able hear the noise on a project that has undergone several DSP processes - which is not uncommon in todays current practice. So should we jump up to 24 bits to be safe? There are several problems with this approach. Most important is that the majority of DSP systems in use are 24 bits and maintaining transparency of a 24 bit word in a 24 bit DSP system performing non-trivial operations (EQ, gain change etc.) is next to impossible. If you remember several years back, there was an outcry about needing 24 bit internal resolution to perform DSP on 16 bit audio files destined for CD. Well, the same rules apply with 24 bit data. You should work in at least a 32 bit DSP environment, or preferably more. Some processors are 40, 48, or even 64 bit, but the vast majority are 24 bit, as are most transfer channels. This creates a significant roadblock for the time being. The other important factor is that it is doubtful that Analog to Digital converters will ever attain a 144db signal to noise ratio in our lifetimes, so the 24 bit converter will be kept busy conveying its own input noise. Lastly, there are data requirements to worry about, not just in storage, but in data rates if we want to be able to consider surround information. Lastly, delivering 24 bit data to the consumer will likely result in truncation by the DA converters of many players. Someone recently observed (I think it was Stephen St. Croix) metaphorically that making the road larger to accommodate an inferior driver is not as good as having the driver simply learn to drive properly!
So if 20 bit is close, but may not be enough in some situations, and 24 bit is too much, what do we do? Remember all of that discussion on noise shaping and pre-emphasis? Noise shaping alone could probably solve our problem in a 20 bit system, but noise shaping and pre-emphasis together, when the noise shaper is designed to take advantage of the characteristic of the pre-emphasis curve, can deliver the equivalent of 24 bit performance. There are some who even predict that an almost 23 bit subjective resolution is attainable with a 88.2 or 96 KHz, 16 bit carrier using complimentary noise shaping and pre-emphasis. One advantage of the higher sample rates is that dither can be placed even further up the spectrum, thus creating noise shapers that are even more effective than in 44.1 or 48KHz systems.
It is important to mention that as DSP architecture improves,
transmission speeds and storage space increase, these concerns
will likely not restrict us in the immediate future, but understanding
the principles and potential limitations will still allow us to
most effectively utilize our available resources. The DVD-video,
and soon to be released DVD-Audio certainly support 24-bit audio,
and well they should, but we still may be called on to save storage
space and processing overhead (the DVDs bit-budget) on some
large-scale DVD projects, and an understanding of how and where
to cut back without negatively affecting quality will separate
the successful from the frustrated. This makes some of these seemingly
unimportant questions vert important and relevant. Another powerful
weapon in this war will be the MLP lossless packing from Meridian,
which we will discuss later.
This brings us to the question of sample rates, but youll have to wait until the third installment for that as well!
Copyright 1997 Jay Frigoletto