It seems that everywhere you look these days is a discussion of more bits and higher sample rates in digital audio systems. Most people agree that the red book CD standard of 16 bit 44.1 KHz PCM audio is a bottleneck to transparent sound reproduction. However, there is much disagreement about just how to accomplish this transparency. There is also a plethora of "pop-knowledge" floating around, and uninformed, however well meaning, opinions clouding the issues at hand. Questions include how many bits do we need? How high should the sampling rate be? Should we use a PCM carrier at all? Do we need surround? Is data compression acceptable? What about dither, noise shaping, and emphasis? Well, let's see if we can explore some of these issues and clear up some misconceptions.
I guess we should decide whether to use PCM at all before we decide about the sampling rates and quantization resolution of a PCM system. Some people suggest that bitstream recording offers several advantages. Sony and Phillips are making a stand on Sony's DSD (Direct Stream Digital) process as the preferred method of delivering high quality sound to consumers. The basic advantages attributed to bitstream recording (basic bitstream theory - not specifically Sony's DSD method) are that the simple, oversampled 1-bit converters bypass the need for digital filtering in the AD and DA stages. Also, it is said to provide superior archiving, which is what Sony's DSD was originally designed for. Sony has used it as an in house proprietary format for archiving purposes.This seems to me to be a pretty difficult sell, as all audio systems and DSP methods are designed to handle PCM at this time, and switching over to bitstream would require the cooperation of a bunch of companies who aren't particularly well known for agreeing on new standards. Also, the storage requirements make anything more than a stereo signal very difficult in terms of data rate and available space. For these reasons, whether bitstream delivers superior sound or not, I doubt it will catch on as the new high definition digital audio carrier.
This leaves us with PCM. Before arguing for the future standards, let's explore some misconceptions of our current standard. Many people assume that PCM cannot resolve detail smaller than the LSB (least significant bit) or time intervals smaller than the sampling rate. The commonly reported 96db dynamic range of PCM would imply that signals below that would be truncated, and in early systems (and sadly, still in many systems today) they indeed were. The way to combat this is with dither. Dither is an often misunderstood buzzword. Most people hear it mentioned when talking about "dithering down" from your 20 bit or 24 bit AD converters to the 16 bit CD standard. It was around long before marketing people realized it's value as a buzzword, and before 20 bit converters were available. Dither is a small amount of random noise that is added to a signal that can prevent truncation of signals below the LSB, thus increasing the perceived dynamic range of a system. Undithered signals contain truncation distortion that manifests itself as artificial harmonics that result in the "grainy" or "granular" sound often attributed to poorly implemented digital audio systems.
Imagine a sine wave that has a level below a single quantization level. If the tone is centered on that quantization step, it will be converted to a single DC value, thus wiped out. However, if it is centered between two steps, it will be converted into a square wave of the original frequency. A square wave is composed of undesirable odd harmonics, and these harmonics will now be contained in your signal. It would have been better if the wave had just been obliterated as in the first scenario. If you add dither to the signal, it will cross several steps instead of just one, thus reducing the artificial harmonics and replacing them with a small amount of wideband noise. When measured, the total power of the distortion is largely unchanged, but we have traded a very unpleasant distortion for a low level hiss, and preserved our original low level signal. It has been asserted that we can hear a -110db signal in a well designed 16 bit system that has conventionally thought to be limited to reproducing -96db signals. While many people understand how dither can improve the perceived dynamic range of a system, fewer understand the effect it can have on the time resolution. A truncated 16 bit signal is limited to the sampling period divided by the number of quantization levels. This translates to 34.6 picoseconds for red book CD audio. Application of correct dither can have a similar impact on the temporal resolution as it does on the dynamic resolution. Thus, we can squeeze more out of the existing CD standard than is commonly thought if we use the correct dither in not only the AD and DA steps, but in all the intermediate DSP processes. This is not to say that we don't need to upgrade the existing standard, as we most certainly do. It does, however, begin to call into question just how much more we need in terms of bit depth and sample rates. It seems that proper practice may deliver superior quality from less than some are proposing.
Copyright 1997 Jay Frigoletto