introduction multimedia
[] readme preface 1 2 3 4 5 6 7 appendix checklist powerpoint resources director

talk show tell print

codecs

Back to the everyday reality of the technology that surrounds us. What can we expect to become of networked multimedia? Let one thing be clear

compression is the key to effective delivery

There can be no misunderstanding about this, although you may wonder why you need to bother with compression (and decompression). The answer is simple. You need to be aware of the size of what you put on the web and the demands that imposes on the network. Consider the table, taken from  [Codecs], below.

mediauncompressedcompressed
voice 8k samples/sec, 8 bits/sample64 kbps2-4 kbps
slow motion video 10fps 176x120 8 bits5.07 Mbps8-16 kbps
audio conference 8k samples/sec 8bits64 kbps16-64 kbps
video conference 15 fps 352x240 8bits30.4 Mbps64-768 kbps
audio (stereo) 44.1 k samples/s 16 bits1.5 Mbps128k-1.5Mbps
video 15 fps 352x240 15 fps 8 bits30.4 Mbps384 kbps
video (CDROM) 30 fps 352x240 8 bits60.8 Mbps1.5-4 Mbps
video (broadcast) 30 fps 720x480 8 bits248.8 Mbps3-8 Mbps
HDTV 59.9 fps 1280x720 8 bits1.3 Gbps20 Mbps

You'll see that, taking the various types of connection in mind

(phone: 56 Kb/s, ISDN: 64-128 Kb/s, cable: 0.5-1 Mb/s, DSL: 0.5-2 Mb/s)

you must be careful to select a media type that is suitable for your target audience. And then again, choosing the right compression scheme might make the difference between being able to deliver or not being able to do so. Fortunately,

images, video and audio are amenable to compression

Why this is so is explained in  [Codecs]. Compression is feasible because of, on the one hand, the statistical redundancy in the signal, and the irrelevance of particular information from a perceptual perspective on the other hand. Redundancy comes about by both spatial correlation, between neighboring pixels, and temporal correlation, between successive frames.

statistical redundancy in signal


  • spatial correlation -- neighbour samples in single frame
  • temporal correlation -- between segments (frames)

irrelevant information


  • from perceptual point of view

B. Vasudev & W. Li, Memory management: Codecs


The actual process of encoding and decoding may be depicted as follows:

codec = (en)coder + decoder



  signal  -> source coder   ->  channel coder    (encoding)
  
  signal  <- source decoder <-  channel decoder  (decoding)
  

Of course, the coded signal must be transmitted accross some channel, but this is outside the scope of the coding and decoding issue. With this diagram in mind we can specify the codec design problem:

codec design problem


From a systems design viewpoint, one can restate the codec design problem as a bit rate minimization problem, meeting (among others) constraints concerning:

  • specified levels of signal quality,
  • implementation complexity, and
  • communication delay (start coding -- end decoding).

compression methods

As explained in  [Codecs], there is a large variety of compression (and corresponding decompression) methods, including model-based methods, as for example the object-based MPEG-4 method that will be discussed later, and waveform-based methods, for which we generally make a distinction between lossless and lossy methods. Hufmann coding is an example of a lossless method, and methods based on Fourier transforms are generally lossy. Lossy means that actual data is lost, so that after decompression there may be a loss of (perceptual) quality.

model-based


waveform-based


Leaving a more detailed description of compression methods to the diligent students' own research, it should come as no surprise that when selecting a compression method, there are a number of tradeoffs, with respect to coding efficiency, the complexity of the coder and decoder, and the signal quality.

tradeoffs

  • coding efficiency -- compression ratio
  • coder complexity -- memory, power requirements, ops/sec
  • signal quality -- bit error probability, signal/noise, ...
In practice this means that when we select a particular coder-decoder scheme we must consider whether we can guarantee

issues in compression selection

and to what extent we are willing to accept
that is lossy output. Another issue in selecting a method of compression is whether the (compressed) For particular applications, such as conferencing, we should be worried about And,with regard to the many existing codecs and the variety of platforms we may desire the possibility of
to achieve, for example, exchange of media objects between tools, as is already common for image processing tools.

compression standards

Given the importance of codecs it should come as no surprise that much effort has been put in developing standards. Without going into details, we list a number of these standards below.

standard-based codecs

  • JPEG -- ISO/IEC 10918-1, ITU-T (T.81)
  • MPEG
    • ISO 11172 (up to 1,5 Mbps) -- MPEG-1
    • ISO 13818 ITU-T H.262 -- MPEG-2
  • H3.20 -- for ISDN-like environments
  • ITU-T H.261 -- P x 64 standard (rate in kbs, p=1..30)
  • H.324 -- video conferencing for GSTN, 26kbps/sec
In the last decade of the previous millenium great progress has been made in finding efficient encodings for audio and video. I assume that most of you have heard of MP3 (the infamous audio format), and at least some of you should be familiar with MPEG-2 video encoding (which is used for DVDs).

Now, from a somewhat more abstract perspective, we can, again following  [Codecs], make a distinction between a pixel-based approach (coding the raw signal so to speak) and an object-based approach, that uses segmentation and a more advanced scheme of description.

pixel-based standards

  • MPEG-1, MPEG-2, H3.20, H3.24

object-based codec(s)

  • MPEG-4 -- segmentation-based DFD (Displaced Frame Difference)
As will be explained in more detail when discussing the MPEG-4 standard in section 3.2, there are a number of advantages with an object-based approach. There is, however, also a price to pay. Usually (object) segmentation does not come for free, but requires additional effort in the phase of authoring and coding.

MPEG-1

To conclude this section on codecs, let's look in somewhat more detail at what is involved in coding and decoding a video signal according to the MPEG-1 standard.

MPEG-1 video compression uses both intra-frame analysis, for the compression of individual frames (which are like images), as well as. inter-frame analysis, to detect redundant blocks or invariants between frames.

The MPEG-1 encoded signal itself is a sequence of so-called I, P and B frames.

MPEG-1



    IBBPBBIBBPBBI... 
    IBBPBBPBBPBBI...
  

frames


  • I: intra-frames -- independent images
  • P: computed from closest frame using DCT (or from P frame)
  • B: computed from two closest P or I frames
Finally, decoding takes place as outlined below.

decoding


  • first I, then P, and finally B
When an error occurs, a safeguard is provided by the I frames, which stand on themselves.

Subsequent standards were developed to accomodate for more complex signals and greater functionality.

alternatives to MPEG-1


  • MPEG-2 -- higher pixel resolution and data rate
  • MPEG-3 -- to support HDTV
  • MPEG-4 -- object-based, ...
  • MPEG-7 -- content description
We will elaborate on MPEG-4 in the next section, and briefly discuss MPEG-7 at the end of this chapter.

research directions -- digital video formats

In the online version you will find a brief overview of digital video technology, written by Andy Tanenbaum, as well as some examples of videos of our university, encoded at various bitrates for different viewers.

What is the situation? For traditional television, there are three standards. The american (US) standard, NTSC, is adopted in North-America, South-America and Japan. The european standard, PAL, whuch seems to be technically superior, is adopted by the rest of the world, except France and the eastern-european countries, which have adopted the other european standard, SECAM. An overview of the technical properties of these standards, with permission taken from Tanenbaum's account, is given below.

system spatial resolution frame rate mbps
NTSC704 x 480 30 243 mbps
PAL/SECAM 720 x 576 25 249 mbps

Obviously real-time distribution of a more than 200 mbps signal is not possible, using the nowadays available internet connections. Even with compression on the fly, the signal would require 25 mbps, or 36 mbps with audio. Storing the signal on disk is hardly an alternative, considering that one hour would require 12 gigabytes.

When looking at the differences between streaming video (that is transmitted real-time) and storing video on disk, we may observe the following tradeoffs:

item streaming downloaded
bandwidth equal to the display rate may be arbitrarily small
disk storage none the entire file must be stored
startup delay almost none equal to the download time
resolution depends on available bandwidth depends on available disk storage

So, what are our options? Apart from the quite successful MPEG encodings, which have found their way in the DVD, there are a number of proprietary formats used for transmitting video over the internet:

formats


Quicktime, introduced by Apple, early 1990s, for local viewing; RealVideo, streaming video from RealNetworks; and Windows Media, a proprietary encoding scheme fromMicrosoft.

Examples of these formats, encoded for various bitrates are available at Video at VU.

Apparently, there is some need for digital video on the internet, for example as propaganda for attracting students, for looking at news items at a time that suits you, and (now that digital video cameras become affordable) for sharing details of your family life.

Is digital video all there is? Certainly not! In the next section, we will deal with standards that allow for incorporating (streaming) digital video as an element in a compound multimedia presentation, possibly synchronized with other items, including synthetic graphics. Online, you will find some examples of digital video that are used as texture maps in 3D space. These examples are based on the technology presented in section 7-3, and use the streaming video codec from Real Networks that is integrated as a rich media extension in the blaxxun Contact 3D VRML plugin.



[] readme preface 1 2 3 4 5 6 7 appendix checklist powerpoint resources director
eliens@cs.vu.nl

draft version 1 (16/5/2003)