Compression Refresher
Apr 1, 2004 12:00 PM, By Steve Mullen
Understanding MPEG-2 encoding.
![]() There are two types of compression technology: intra-frame compression, which compresses each frame to an individual packet of information; and inter-frame compression, which is much more complex and requires memory of previous frames. |
At a recent NAB, Sony proclaimed it was an “MPEG-2 World.” Perhaps Sony was a bit early. And perhaps it should have added at least a footnote on MPEG-4 and Windows Media 9. But on the whole, its slogan is proving prescient. MPEG-2 is used as the delivery codec for DBS and digital cable. ATSC also employs MPEG-2. And how can we fail to take note of two new wonders — the $50 DVD player and the $350 DVD recorder.
The real news is the rapid utilization of MPEG compression in video acquisition. Sony began with Beta SX, which it evolved to IMX. Both use MPEG-2. At NAB 2003 Sony introduced HDCAM SR, based on MPEG-4. And, at the low end, Sony has been selling two MPEG-2 formats: MicroMV and “DVDcam.”
Panasonic and Hitachi have also been selling MPEG-2 based DVD camcorders. With Panasonic's adoption of SD memory cards as a recording media, Panasonic has introduced a consumer MPEG-2 camcorder; JVC has introduced its pair of HDV-format, HD camcorders; and now Sanyo has a solid-state MPEG-4 camcorder that shoots 640×480 at 30fps.
From an abstract level we can group all compressed technologies into two broad classes. Formats that use inter-frame compression and those that use intra-frame compression. When intra-frame compression is employed, each frame is compressed to an individual packet of information. To retrieve the image, only this single packet is required. Because each frame is processed individually, the technology required to implement a codec is relatively simple. The fundamental variation between all intra-frame formats is the data rate utilized: DV, DVCAM, and DVCPRO at 25Mbps; DVCPRO50 at 50Mbps; DVCPRO HD at 100Mbps; and HDCAM at 140Mbps. Low rates allow only 4:1:1 sampling while higher rates allow better chroma sampling. (See “Uncompressed Digital Video” in the March 2004 issue.) With these two parameters set, the compression ratio naturally falls out.
The simplicity of the intra-frame codec makes it possible to both compress and decompress inter-frame formats on a PC. In fact, today's realtime NLEs typically decompress and render multiple streams simultaneously. Codec simplicity also makes it possible for chips to be manufactured at a very low cost — although manufacturers charge dramatically more for their high data-rate formats. Some additional cost is due to the complex head assemblies used in wide-bandwidth camcorders and VCRs. A high data rate is achieved by writing two parallel tracks of data to tape. Another source of extra cost is incurred when writing density is doubled to achieve another two-fold increase in data rate. Nevertheless, there is amazingly little fundamental difference between DVCAM and HDCAM or DVCPRO and DVCPRO HD.
The technology employed in inter-frame compression is considerably more complex. This complexity comes from three factors. First, both encoding and decoding involve memory of previous frames. Second, pixels are intelligently grouped into objects — no simple task. And, third, the motion vector of objects is calculated and tracked over time. While the first factor is responsible for some degree of data reduction (compression), it is the latter two factors that lead to dramatic bandwidth reductions. If you want a simple way to understand the difference between MPEG-1, MPEG-2, and MPEG-4, consider that each generation incorporates more “smarts.”
Thankfully, the intricacies of encoding and decoding — the deep math — don't need to be understood by those of us in the video industry. There are, however, two considerations we do need to grasp. First, the greater the bit reduction we want, the more complex the codec circuitry will be, and thus the greater cost. (And, the larger the image, the more bits there are to reduce!) Fortunately, over an incredibly short time, any level of complexity becomes dramatically cheaper. An exotic MPEG-4 codec will, in a short period, be only a commodity part. We should expect, therefore, that within a few years, all the interest we have now in MPEG-2 products will be replaced by a fascination with MPEG-4 products.
Second, chip complexity fundamentally influences chip size, heat generation, and power consumption. These three factors determine whether a codec can be practically employed in a camcorder. Once again, technology will enable manufactures to shrink die size, which in turn will reduce power consumption and heat generation. Within a few short years, very powerful MPEG-2 and MPEG-4 inter-frame codecs will be able to be used in small, low-cost HD camcorders.
Inter-frame compression's most primitive element is an “I” frame. This frame comes from one video frame and has been inter-frame MPEG-compressed. It really is like a single compressed DV frame, except that it offers slightly better picture quality.
You've probably already noticed that while DV quality is typically expressed as a compression ratio, MPEG-2 quality is defined in megabits per second. Before moving on, let's sort this out. Because DV, et al., are recorded to tape, the bit rate has to be constant. To allow for different quality levels, two SD bit rates were chosen: 25Mbps (a 5:1 compression ratio) and 50Mbps (a 3.3:1 ratio). To handle HD, two much higher bit rates were chosen: 100Mbps and 140Mbps. MPEG-2 was initially carried by optical disc. Here, there was no need for a constant bit rate, so the level of quality was dialed in at the time of encoding. Also, initially, only SD-sized images were encoded. Thus it became easy to compare quality by looking at the specified bit rate. Such quality estimates have never been really valid because, for example, a sophisticated two-pass MPEG-2 encoder can yield great picture quality at low bit rates. Nevertheless, for SD, a bit rate of 9Mbps is considered to carry a very high-quality image. For HD, a rate of 19Mbps is the norm. (See the “Future of the DVD” sidebar.)
Once an I-frame has been encoded, “P” and “B” frames can be generated. Each P-frame is generated from the previous I-frame and a subsequent video frame. Simply put, a P-frame carries the difference between the initial video frame and a subsequent video frame. However, it makes more sense to see a P-frame as containing the information needed to recreate a video frame in conjunction with the information in the previous I-frame. In most but not all cases, a P-frame is much smaller than an I-frame. Thus, a sequence of IPP frames carries the information for three video frames using far less data bits than if intra-frame (DV) were used.
A B-frame is even more efficient than a P-frame. A B-frame contains the small amount of information needed to reconstitute a video frame when combined with information from a past I- or P-frame and/or a future I-or P-frame. Utilizing B-frames, therefore, results in a significant reduction of data. You would be justified in asking how, during playback, a future I- or P-frame can be employed to recreate a video frame when it isn't yet available. During the encoding process, encoded frames are held in a buffer and output in an order that a B-frame is always preceded by the I- and/or P-frames required. For example, if the sequence of frames is IBBP — where both B frames depend on the initial I-frame and final P-frame — the output sequence will be IPBB.
Once an I-frame is encoded there will be some number of B- and/or P-frames encoded before another I-frame is encoded. For example, IBBPBB. After these frames, a new I-frame is encoded, yielding a sequence: IBBPBB IBBPBB IBBPBB. The length of a sequence is called a GOP — Group of Pictures. (Note: It is possible to have video encoded to only I-frames, although the result will have a very high data rate.)
A complete change of scene in the video stream poses a difficult problem for MPEG-2 encoding. Ideally, every scene change should generate an I-frame. However, if this I-frame came very soon after an earlier I-frame, the amount of data output from the encoder would spike — possibly beyond that which can be handled by the recording system.
A compromise is a GOP that is allowed to vary slightly in length. When a GOP of length 15 is used, the GOP length may be allowed to vary between nine frames and 15 frames. The vast majority of GOPs will be 15 frames long, but shorter GOPs may be output when there are abrupt changes in the incoming video.
GOP lengths can be divided into two categories: short or long. Long GOPs typically carry one-half second of playback video, e.g., 15 frames. Using long GOPs maximizes codec efficiency. It is more difficult, however, to edit long GOP video. Therefore, a short GOP can be used. The HDV format records a six-frame GOP from the CCD.
Let's look more closely at the sequence: IBBPBB I. In this sequence, the final two B-frames require information from the previous P-frame and the following I-frame. Where dependencies cross GOP boundaries, the encoder is said to be generating Open GOPs. Unfortunately, during editing if the GOP with the I-frame is deleted, the two B-frames will lose the information nessary to recreate the fifth and sixth video frames in the first GOP. Therefore, if editing is to be performed, Closed GOP encoding is used where the last B-frames have only a backward dependency.
Closed GOPs make editing possible, but not easy. Consider these two GOPs to be spliced together at the points indicated by the bold symbol: IBBPBB and IBBPBB. Cutting the outgoing GOP after the B results in the loss of the P-frame that's needed to recreate the third video frame. Likewise, cutting the incoming GOP at B causes the loss of the P-frame that's needed to recreate what will be the fourth video frame. Editing MPEG-2, therefore, requires the capability to generate replacement GOPs that meet three requirements: they must carry the required number of video frames; they must have valid lengths; and they must not overload the recording system with excessive data. A realtime MPEG-2 NLE can generate and preview the necessary GOPs on-the-fly.
Equally important is the ability to edit encoded audio so that a cut results in neither a gap nor a glitch. Gaps of up to a second are all too common when you try editing MPEG-2 with standalone DVD recorders, as well as with DVD camcorders.
As mentioned earlier, when recording is made to tape, the data rate must remain constant, as both the tape speed and head speed are constant. This requires an encoder to generate a fixed data rate no matter the image variations within the incoming series of video frames. When an encoder accomplishes this, it is said to be using Constant Bit Rate (CBR) encoding. In reality, even when using CBR encoding, it is necessary to have a buffer to smooth the data rate even more prior to recording. When CBR encoding is not required, Variable Bit Rate (VBR) can be used. Over time, VBR encoding is more efficient.
Encoded video frames are assembled into a Video Elementary Stream. A header several bytes long precedes each stream of data bytes. This header defines the nature of the data that will follow — for example, its frame size, frame rate, and aspect ratio. You may notice when working with MPEG-2 that it can take awhile before an image becomes available. During your wait, software is searching the byte-stream, examining header information. This is very different than with DV, where the parameters are mainly stored at the head of the file.
Digital audio can be encoded using several technologies. One common method is to use “MPEG-1 Layer II” encoding — a precursor to MP3 audio. AC-3 (Dolby Digital 2.0 stereo or 5.1 surround sound) and DTS are other encoding schemes. And of course, the digital data can be sent as PCM data. From any source of digital audio, several Audio Elementary Streams (each with the appropriate header bytes) can be generated.
Elementary audio and video streams can be packetized using two techniques. Where the streams will be carried on a highly reliable media such as optical disc, a Program Stream is utilized. DVDs, therefore, use Program Stream packets. For this reason, the vast majority of software MPEG-2 codecs and players play back only Program Streams.
Broadcast and tape-based delivery systems are prone to dropouts. The solution is a more robust packet that enables recovery from dropouts. Specifically, a Transport Stream consists of a sequence of fixed-size transport packets of 188 bytes. Each packet comprises 184 bytes of payload plus a 4-byte header. While most software DVD players cannot read Transport Streams, it is a simple task to enhance them to do so. Hopefully, we will soon see both the QuickTime and Windows players upgraded.
It is a straightforward task to convert a Program Stream to a Transport Stream. When doing so, it is not necessary to perform decoding and encoding, thus the process is a rapid one. Likewise, it is a simple task to enhance an encoder to selectively output Program and Transport Streams. One hopes the various encoders for both Macs and PCs will be enhanced soon. Of course, to handle HD, the encoders also need to be upgraded to work with high-resolution images (up to 1920×1080) and high temporal rate (up to 60fps) video.
Earlier I mentioned that while MPEG-2 was the current inter-frame codec of interest, many other types of inter-frame encoding will follow. While this may inflict economic hardship if the successive products come to market in too quick of a succession, each product generation brings with it further efficiencies. Efficiencies that will make RAM-based HD recording possible.
For the most part, if you come to an understanding of MPEG-2 encoding and decoding, your understanding will migrate to the next generations of inter-frame codecs. Bon voyage.
To read the results of Barry Braverman's MPEG-2 encoder shootout, see MPEG-2 Encoder Shootout.
Sidebar
Future of the DVD
TESTS SEEM TO INDICATE THAT HIGH HD QUALITY can be achieved at 9Mbps using Windows Media 9 encoding. This means any device capable of recording and playing a red-laser DVD can handle HD using WM9 encoding. By the time you read this, V Inc. should be selling the Bravo D3. This $349 DVD player can play back high-definition video recorded to DVD-R/RW and DVD+R/RW media. HD video is encoded on a PC using WM9 and then burned using any DVD burner. This development is so significant that unless Apple licenses the WM9 codec for Final Cut Pro — as Avid has done — it stands the risk of being hurt in the new HDV marketplace.
Sony, Panasonic, and JVC support 19Mbps, MPEG-2 HD data rate recorded to optical disc using their Blu-ray (blue-laser) technology. (Not the same blue-laser technology used by Sony for its XDCAM format.) A single layer supports up to 27GB of user recording. Toshiba and NEC are promoting an alternate blue-laser, MPEG-4 and WM9-based HD recording technology called HD-DVD. The Toshiba/NEC technology grafts a blue-laser to a red-laser and so, it is claimed, makes HD-DVD devices very cheap. A single-layer supports up to 20GB of user recording. MPEG-4 allows much lower data rates (7Mbps to 12Mbps) to be employed so the lower disc capacity doesn't cause user recording time to suffer. However, unless MPEG-2 HD broadcasts are transcoded to MPEG-4 (which requires costly components), a two-hour movie cannot be recorded in a set-top recorder. Betamax redux?
feedback
To comment on this article, email the Video Systems editorial staff at vsfeedback@primediabusiness.com.
Continue the discussion on “Crosstalk” the Millimeter Forum.


Multimedia
Blogs
Forum
Affordable HD
Whitepapers
Advertisers
Blogcast
Millimeter







