DVD, 5.1, and Codec Primer

WHAT DO ALL THESE LETTERS MEAN?
A Primer on Surround Sound, DVD, & A/V Compression Codecs

With the proliferation of surround systems in consumers' homes, and the success of the DVD format, there is increased interest in surround sound for music releases. Although much material is available in various surround formats, and attempts have been made to bring multi channel sound to the masses, it has never met with general consumer acceptance. Even the material that is currently available is not well known to the consumer. Consumers were not ready for Quad and it failed, and not enough homes had surround systems at the time when Pro-Logic mixes started to be released. The other problem with the passive matrix surround formats is that the fidelity for music leaves something to be desired in all but the main left and right channels. The 21st century could change all that. Never has familiarity with surround or an installed base of multi channel consumer audio systems been greater. Add to that the impending release of DVD-A and SACD in surround, and the time has never been better to introduce multi channel music to the masses. With all these acronyms flying around it’s easy to get confused. There is a lot of information to digest before tackling surround production. Before any of us jumps head first into surround sound, I think it’s important to briefly define some of the terms, concepts, acronyms, and abbreviations that you will come into contact with. They are not in alphabetical order, rather, loosely grouped by subject.

DVD Basics

DVD: Commonly referred to as Digital Versatile Disc. It was originally conceived as Digital Video Disc, but that moniker was soon abandoned to embrace all the other media and data applications. In reality, the letters don’t officially stand for anything anymore, but Digital Versatile Disc is the most common reference. The term DVD encompasses several sub categories as you will see below. It is a format that can hold a variety of data for media and computing applications and has storage capacities in it’s different versions between 7 and 27 times greater than CD. A single sided, single layer disc (DVD-5) can hold 4.7 G Bytes; 1 side, 2 layer (DVD-9) 8.5 G Bytes; 2 side, 1 layer (DVD-10) 9.4 G Bytes; 2 side, 2 layer (DVD-18) 17 G Bytes.

DVD-V: DVD-Video. This is the first DVD format to be released and is mainly for movies, although stereo PCM audio can be recorded to it at 24 bit resolution and 96kHz sample rate. It also supports 16 bit 48 kHz PCM audio, but not the CD standard 44.1 kHz sample rate. The video is encoded using MPEG-2 compression, and the audio can be PCM if mono, stereo, or Dolby Pro-Logic, and is typically Dolby Digital if in 5.1 surround, It can alternately support DTS and MPEG surround audio as well. You’ll have to see below to catch up with all those acronyms!

DVD-A: DVD Audio. This is primarily for audio, although there is provision for graphics and video content as well. This supports PCM stereo audio up to 24 bit, 192 kHz, and 5.1 surround audio in combinations of resolutions and sample rates from 16 to 24 bits and from 48kHz to 96kHz sampling. Due to bandwidth limitations, it can only deliver a full 6 channels of 24/96 audio when using MLP, Meridian Lossless Packing.

DVD-ROM: This is similar to CD-ROM in that it is a "read only" and you can put nearly any data you wish on it, typically personal computer software.

DVD-R: This is a recordable "write once" format. Most computer DVD-R drives won’t make DVD-V discs.

DVD-RAM, and DVD-RW: These are competing for the rewriteable market.

DLT: Digital Linear Tape. This is the standard format for delivering a DVD master to the pressing plant. The tape format is somewhat similar to 8mm exabyte, but it can hold up to 70 GB. It can also be used as a computer data backup format.

UDF: Universal Disk Format. All DVDs are recorded using this file system. It can be thought of as similar to the ISO-9660 format for CD-ROMs in that it allows a wide variety of data to be written on the disc and read by a variety of platforms. It allows properly equipped PCs with DVD drives to read DVD-A and DVD-V discs in addition to DVD-ROMs. The thing that separates a DVD-V from a DVD-A or one of the other DVD formats is the manager and files written on the disc. You can put a DVD-V manager and directory and a DVD-A manager and directory on the same Disc and the DVD-V player will read only the files that correspond to it, while the DVD-A player will read only it's own unique files. This allows a DVD-A to include a DVD-V compatible Dolby Digital version of the program to provide some backwards compatability with legacy DVD hardware.

SAMG: Simple Audio Manager. This is similar to the TOC (Table Of Contents) on a CD. Every DVD-A has a SAMG which can list up to 314 tracks for simple track based navigation.

AMG: Audio Manager. More complex discs use this manager which supports grouping the tracks into different playlists among other things. Players that support DVD-A's additional capabilities will typically ignore the SAMG and use the AMG instead.

VMG: Video Manager. This is the manager that DVD-V players refer to.

AUDIO_TS: This is the main directory that contains all of the data for a DVD-A disc.

VIDEO_TS: This is the directory containing DVD-V data.

AOB: Audio Object. This is the basic unit for audio data containing PCM and optionally AC-3 tracks.

VOB: Video Object. This is the basic unit for video data containing MPEG-2 video and some additional data.

Codec Basics

Codec stands for compressor/decompressor. There are several codecs that will compress media or data before storage, reducing the space it occupies and time it takes to be read, and decompress it again at the user’s end restoring it to a useable state. Some can compress an audio, video, image, or data file by a huge factor, and some by just a little bit. The two basic kinds are "lossy" and "lossless". Lossless compression gives you back exactly what you put into it after decompression. It should be bit for bit identical. An example of this is MLP, or zip files on a computer. These compression schemes also tend not to be able to offer the large savings that lossy compression formats can. Lossy compression schemes are those that have to estimate certain things when decompressing, so your data doesn’t come back exactly the same. The goal is to have you not be able to tell the difference, but your data is not 100% bit for bit accurate anymore. Also, upon close scrutinization, the difference is usually discernable. In JPEG image compression, you can see what appear to be little squares if you look closely. With MPEG video, you can sometimes see motion artifacts. With MP3 audio files, you can hear a degrading of your audio quality. The more a codec compresses, the more likely you are to experience losses of quality. However, some methods, when used appropriately and conservatively, can provide excellent results that even avid listeners would be hard pressed to tell apart. Many examples of audio codecs use what is called "perceptual coding". The idea is that the codec eliminates information that the human ear is unlikely to hear based on known psychoacoustic principles. Certain low level sounds are masked by louder sounds, so some of these codecs make the largest reductions in those areas that are less able to be heard by the average person. Obviously, some of these sound better than others. Dolby AC-3, DTS, and MP3 are just a few examples of this kind of compression. Lossy compression schemes are much more common for media as the savings are much larger, and the loss of quality can be quite difficult to detect when used appropriately. Lossless compression is necessary for data because executable code will not run unless it is bit for bit accurate. Audiophiles also prefer lossless compression for music because there is no degradation in quality from the original.

AC-3: This is one of Dolby’s digital audio codecs. It’s usually referred to as Dolby Digital in consumer circles, but AC-3 by itself is the codec used by many Dolby products including Dolby Digital, DolbyFax, Dolby SR-D, and others. There is also an older codec called, you guessed it, AC-2.

AC-2: Audio Compression version 2. This lower quality Dolby codec is a predecessor to AC-3 and is typically not used in professional production applications.

MPEG: Moving Picture Experts Group. This encompasses a wide range of standards for audio and video compression. MP3 and MPEG-2 AAC are some audio components of it, while the original MPEG is common for web based video and MPEG-2 is the standard video compression for DVD.

MP3: This has been made popular by people downloading music from the internet without paying for it. Some erroneously call it MPEG 3, but it actually has nothing to do with MPEG 3. It is, in fact, a part of MPEG 1, more specifically, MPEG audio layer 3.

AAC: Advanced Audio Coding, or MPEG-2 AAC. This newer algorithm is a substantial improvement to MP3 and seems a likely successor for web based audio downloading. But for the present, MP3 is still king in that domain. AAC is also implemented into the new MPEG-4 standard (beginning to be called MP4 for convenience, although this is not the technically correct term).

WMA: Windows Media Audio. This is another contender for downloading and streaming, and with Microsoft behind it, WMA will certainly have some exposure.

DTS: This is a format from a company called Digital Theater Systems, hence the DTS name. It is one of the better sounding lossy compression methods. It can support higher bit depths than CD, and the compression is lighter as compared to MP3, or even Dolby’s high quality AC-3 algorithms. This, of course, means that it takes a little more space and bandwidth than some, but it is still a significant savings.

ATRAC: Adaptive Transform Acoustic Coding. This was developed by Sony for the MiniDisc format. It reduces the data to about 1/5 of it’s original size by dividing the signal into three subbands which are then transformed into the frequency domain and grouped into nonuniform bands according to psychoacoustics and quantized on the basis of dynamic sensitivity and masking.

ATRAC-3: see above. This codec is now independently available for web based applications.

MPEG-2: There is an audio and video specification to the MPEG-2 standard. The audio portion was originally selected as the standard audio codec for European DVD releases, though now all regions are free to use any desired method in the overall spec. It can handle 7.1 audio as well as 5.1 audio, giving it one feature that Dolby Digital and DTS lack. MPEG-2 video is made up of three kinds of frames: I-frames, B-frames, and P-frames. I-frames contain all of the pixel information for a video frame while P-frames are predictive based on the nearest previous I frame or P-frame. B-frames are bi-directional and reference both a previous and subsequent frame when calculating the compressed frame data. I-frames are your backbone, while P- and B- frames rely on the assumption that not much changes from frame to frame except in cases of excessive motion. The two types of MPEG-2 video encoding are described in the following headings.

CBR: Constant Bit Rate. This chooses a bit rate and remains at that rate for the entire MPEG-2 encoding of the video.

VBR: Variable Bit Rate. This method results in smaller files and better quality, although it takes longer. You need to make an initial pass to allow the encoder to see where the most difficult passages are, and then during the final encoding, it devotes more resources to those passages while not wasting as many large I-frames on more static images.

MLP: Meridian Lossless Packing. Up to this point, we have been talking about lossy compression schemes. MLP is lossless, meaning that what you put in is exactly the same as what you get back. This has been adopted as one of the standards for DVD-A. You can use PCM if you like, but to be able to fit a 24/96 5.1 surround mix (that’s 6 full channels of 24 bit 96 kHz audio) into the available bandwidth for DVD-A, MLP is necessary. MLP compresses at roughly a 1.85:1 ratio, or not quite a reduction in half.

HDCD: High Definition Compatible Digital. This is a technique developed by Pacific Microsonics, who was recently bought by Microsoft. There have been about 4,000 titles produced. It is encoded onto a standard CD that will play in a standard player. It will even play without the decoder, but you won’t realize the advantages, and may even have slightly worse than usual performance, although the compatibility is a big plus. It is just for stereo work, and when decoded, provides 20 bit sound and proprietary filtering that improves the sound considerably as compared to a plain CD.

SBM: Super Bit Mapping. This is a wordlength reduction technique developed by Sony. It uses noise shaped dither to reduce 20 and 24-bit files to the 16 bit CD standard while avoiding truncation distortion. It preserves some of the subjective resolution of the 20 or 24 bit source on the CD. Some people erroneously think of this as a codec, even though no decoder is necessary. This, and other dither based wordlenth reduction techniques are not codecs at all, rather a small amount of broadband noise applied during signal processing to avoid truncation distortion. Moving some of the dither energy to parts of the spectrum where the ear is less sensitive helps to preserve some of the low level detail from the higher resolution source. A more in depth discussion of this can be found in the first two Tech Talk installments.

UV-22: This is Apogee’s wordlength reducer. It moves most of the dithering energy into the very high part of the spectrum in what is sometimes called a "near-Nyquist" implementation. It accomplishes the same thing as SBM, to reduce wordlength while avoiding truncation distortion and preserving subjective resolution.

POW-R: Psychoacoustically Optimized Wordlength Reduction. This was developed by cooperation of audio manufacturers Lake DSP, Weiss, Millennia Media, and Z-Systems under the name Pow-r Consortium. It is regarded one of the finest examples of wordlength reduction. There are three POW-R types, the first being a near-Nyquist implementation, the second, a 5th order noise shaper, and the third being a 9th order noise shaper.

TPDF: Triangular Probability Density Function. White noise is the most commonly used "flat" dither. TPDF describes white noise dither with a peak amplitude of plus or minus 1 LSB (Least Significant Bit).

General 5.1, Audio, and Movies

5.1: spoken as "five point one". This refers to a surround sound format consisting of 5 full range channels and one LFE channel (see the next entry). It consists of Left, Center, Right, Left Surround, Right Surround, and LFE. The next, somewhat less common, discrete surround format is 7.1. This is reminiscent of, but not exactly the same as the old 70mm soundtracks. The current 7.1 has L, C, R, LS, RS, and LFE like 5.1, but adds 2 more speakers behind the screen between the center and left, and between the center and right. These are referred to as LC and RC, or Left Center and Right Center. On large screens, this allows better tracking of dialog and effects, and more creative options for the mixers and director. The original 70mm 6 channel soundtrack had five across the front like 7.1, but no subwoofer and only mono surround. Some people have also suggested 10.2 systems that have stereo LFE and more channels around the room, and possibly a height channel above the audience. This is not a standard yet. It’s worth mentioning, however, that some IMAX films currently do use a height channel.

LFE: Low Frequency Effects. Typically you use a subwoofer in this application, the "point one" of 5.1. The channel does not necessarily have to be band limited, as in the option to use it as a height channel in DVD-A, but when used as an LFE, it is of course for low frequency information. It is a discrete channel, not a crossover network to derive the low frequency information from the main program. However, in most end user systems, bass management is in use, which actually does divert the sub bass from the other 5 channels to the subwoofer along with the discrete LFE channel information.

ITU: International Telecommunications Union. In surround sound, the ITU spec is referred to when talking about setting up speakers for 5.1. The ITU’s guidelines are widely used and a very good place to start. To briefly summarize, they state that all speakers should be an equal distance from the listening position, with the center straight ahead, the Left and Right 30 degrees out forming a 60 degree arc across the front, and the surrounds at roughly 110 degrees. While on the subject of setting up your speakers, there are guidelines for reference levels as well. Although there is a small room spec that calls for a 79db reference, the most common is 85db for all 5 main speakers, and 89db for the subwoofer. Some Dolby certified film stages alter this slightly and reference the surrounds to 82db. You would use pink noise and measure using C weighting and slow response. This is by no means a complete guide to setting up the monitoring in your room, but it’s the basics to get you started.

THX: Tomlinson Holman eXperiment. Tomlinson Holman and Lucasfilm developed these guidelines for standardization of cinema sound systems and theater acoustics. There was later developed a standard for home playback systems, and also a professional small room standard as well. The idea is that if your mix room and the playback theater are THX certified, the sound will be experienced by moviegoers as intended by the mixers and director by closely matching the sound of mix stage and movie theater.

TMH: Tomlinson Holman’s new company working on surround sound issues.

PCM: Pulse Code Modulation. This is the sampling technique that is most familiar to us as it is the basis for the CD and for most uncompressed digital audio that we come into contact with. For the CD, we take 44,100 samples of the audio every second (44.1 kHz sample rate) and quantize it into one of the 65,536 steps in 16 bits of resolution. You definitely want to read the other three installments of "Tech Talk" if you want more information about this.

DSP: Digital Signal Processing. Every time you process digital audio in any way, perhaps by using EQ, dynamic range compression, or even volume adjustments, you are performing mathematical computations. The particular algorithms, dither practices involved, and other issues greatly affect the quality of the processing and resultant sound. Also, it takes computer power to do these calculations, so DSP ability is limited by how much horsepower your computer or dedicated DSP processor has.

DSD: Direct Stream Digital. This is another digital recording technique developed by Sony, originally for their internal archival purposes. It is a 1-bit recording process using a 2.8224 MHz sampling rate. It is the process used for SACD. The one bit records whether the waveform is rising or falling, as opposed to defining the exact position as in a multi-bit system like PCM. There is an 8 bit version in use for professional applications that allows proper dithering during signal processing, but this is unimportant to the end user who is delivered a finished 1-bit master of excellent audio quality. Completely different DSP methods are required for DSD as opposed to PCM.

SACD: Super Audio CD. Developed jointly by Sony and Phillips, this format competes with DVD A as the next generation of better than CD quality audio delivery to consumers. While DVD has advantages in flexibility, more extras, and easy surround sound implementation, some feel that SACD, using the DSD recording process, sounds better than even 24/96 PCM audio. How it compares to192 kHz audio is less clear at the present, though some feel that PCM, by it’s nature, will always be at a disadvantage. There is clearly not consensus on this point, however. PCM advocates point out many disadvantages to the DSD system as compared to PCM, so we’ll have to wait and see. It is clear, however, that both DVD-A and SACD are a major leap forward from CD. It should be mentioned that SACD also has provision for including some visual extras, but not to the extent and elegance of the DVD implementation. 5.1 surround sound has been included in the goals for the SACD format, as has a red book compliant layer to provide backwards compatibility with CD players. Unfortunately, these two options are unavailable at the moment and are proving difficult to provide practically, though I imagine that eventually they will work out the kinks in one way or another.

Dolby Surround: This is primarily a consumer format that adds a mono, band limited surround channel that is matrixed into the stereo audio and passively decoded at playback. The playback is compatible with stereo if you don’t have a decoder.

Dolby Pro-Logic: This is another matrix approach to surround, however you add a center channel to the matrix for a total of 4 channels. They are not fully discrete, nor are the extra channels full bandwidth. This is similar to the "Dolby Stereo" or "Dolby SR" film release formats.

Dolby SR: To music recordists and mixers it means the "Spectral Recording" noise reduction system for analog tape. To film mixers it means an LCRS surround mix matrixed into the stereo signal, and using Dolby SR noise reduction.

Dolby Stereo: An earlier 4 channel matrixed surround format using Dolby A type noise reduction.

LCRS: Left, Center, Right, Surround. The 4 channels in the passive matrix systems of Dolby Pro-Logic, Dolby Stereo, and Dolby SR.

LT RT: Left Total, Right Total. This describes the encoded LCRS surround matrix recorded to two channels, to distinguish it from a conventional stereo (Left, Right) mix recorded on two channels.

Dolby SR-D: This 35mm film release format has both the analog Dolby SR soundtrack and a Dolby Digital 5.1 soundtrack. If the digital soundtrack becomes unreadable or is damaged, the playback will automatically switch to the analog soundtrack. When the digital track is readable again, it will switch back. Also, in theaters that don’t support digital sound, the analog track can be read by standard Dolby SR equipped rooms.

Dolby Digital: the 5.1 format that uses AC-3 compression. On DVD releases, you can have mono, stereo, or 5.1 soundtracks with AC-3 compression recorded on the "Dolby Digital" portion of the soundtrack. The data rate is variable, with 640kbps being common on film, but 384kbps being common in DVD-V releases.

Dolby EX: This is a 6.1 system that has the same channels as a 5.1 system, but adds a center rear channel. Star Wars: The Phantom Menace was the first to use this. Some consumers receivers are offering this option now, with the center rear channel matrixed into the stereo surrounds.

DTS: Digital Theater Systems. Above, we mentioned DTS in the sense of it being a codec. As a codec, it compresses 6 (5.1) channels of 20 bit 48kHz audio into a data rate of 1.4Mbps for roughly a 3:1 compression ratio. It also supports a 754kbps rate for DVD. In the theater, it is actually encoded onto a CD that plays back in sync with the film and is decoded into a very high quality 5.1 soundtrack. It’s debut was with the release of Jurassic Park. The third thing that you need to know about DTS is that they have a large catalog of music CDs that play back on regular CD players with digital outs. When the digital output is fed into a DTS decoder, of which several DTS equipped consumer receivers and processors exist, you get a high quality 5.1 music format. Many people don’t realize that there is a viable 5.1 music format available today with a catalog of a couple hundred popular CDs available.

DTS-ES: This version of DTS is a 6.1 system with a center rear channel, similar to Dolby-EX.

SDDS: Sony Dynamic Digital Sound. This is a high quality film release format that, like Dolby SR-D, has the analog stereo and digital surround soundtracks on the 35mm print. But that’s where the similarity ends. How it is placed on the print is different, the codec is different, and Sony’s SDDS supports up to 8 (7.1) discrete channels. The other difference between SDDS and the other two popular competing digital surround formats, Dolby SR-D and DTS, is that there is no consumer version of SDDS.

There are many issues facing an audio professional who is considering getting into surround production, either for music, film, DVD, internet, or multi-media. Whether you are recording, mixing, editing, or mastering, there is a lot of information that you need to be comfortable with before you can succeed in surround sound. Although this collection is a good start, it is by no means an exhaustive list or in-depth manual. Hopefully it will give you a well-rounded introduction and good foundation on which to build the pursuit of your goals. Above all, remember to be creative and enjoy yourself.