introduction multimedia

[] readme preface 1 2 3 4 5 6 7 appendix checklist powerpoint resources director

codecs

Back to the everyday reality of the technology that surrounds us. What can we expect to become of networked multimedia? Let one thing be clear

compression is the key to effective delivery

There can be no misunderstanding about this, although you may wonder why you need to bother with compression (and decompression). The answer is simple. You need to be aware of the size of what you put on the web and the demands that imposes on the network. Consider the table, taken from [Codecs], below.

media uncompressed compressed

voice 8k samples/sec, 8 bits/sample 64 kbps 2-4 kbps

slow motion video 10fps 176x120 8 bits 5.07 Mbps 8-16 kbps

audio conference 8k samples/sec 8bits 64 kbps 16-64 kbps

video conference 15 fps 352x240 8bits 30.4 Mbps 64-768 kbps

audio (stereo) 44.1 k samples/s 16 bits 1.5 Mbps 128k-1.5Mbps

video 15 fps 352x240 15 fps 8 bits 30.4 Mbps 384 kbps

video (CDROM) 30 fps 352x240 8 bits 60.8 Mbps 1.5-4 Mbps

video (broadcast) 30 fps 720x480 8 bits 248.8 Mbps 3-8 Mbps

HDTV 59.9 fps 1280x720 8 bits 1.3 Gbps 20 Mbps

media	uncompressed	compressed
voice 8k samples/sec, 8 bits/sample	64 kbps	2-4 kbps
slow motion video 10fps 176x120 8 bits	5.07 Mbps	8-16 kbps
audio conference 8k samples/sec 8bits	64 kbps	16-64 kbps
video conference 15 fps 352x240 8bits	30.4 Mbps	64-768 kbps
audio (stereo) 44.1 k samples/s 16 bits	1.5 Mbps	128k-1.5Mbps
video 15 fps 352x240 15 fps 8 bits	30.4 Mbps	384 kbps
video (CDROM) 30 fps 352x240 8 bits	60.8 Mbps	1.5-4 Mbps
video (broadcast) 30 fps 720x480 8 bits	248.8 Mbps	3-8 Mbps
HDTV 59.9 fps 1280x720 8 bits	1.3 Gbps	20 Mbps

You'll see that, taking the various types of connection in mind

(phone: 56 Kb/s, ISDN: 64-128 Kb/s, cable: 0.5-1 Mb/s, DSL: 0.5-2 Mb/s)

you must be careful to select a media type that is suitable for your target audience. And then again, choosing the right compression scheme might make the difference between being able to deliver or not being able to do so. Fortunately,

images, video and audio are amenable to compression

Why this is so is explained in [Codecs]. Compression is feasible because of, on the one hand, the statistical redundancy in the signal, and the irrelevance of particular information from a perceptual perspective on the other hand. Redundancy comes about by both spatial correlation, between neighboring pixels, and temporal correlation, between successive frames.

statistical redundancy in signal

spatial correlation -- neighbour samples in single frame
temporal correlation -- between segments (frames)

irrelevant information

from perceptual point of view

B. Vasudev & W. Li, Memory management: Codecs

The actual process of encoding and decoding may be depicted as follows:

codec = (en)coder + decoder

signal -> source coder -> channel coder (encoding) signal <- source decoder <- channel decoder (decoding)

Of course, the coded signal must be transmitted accross some channel, but this is outside the scope of the coding and decoding issue. With this diagram in mind we can specify the codec design problem:

codec design problem

From a systems design viewpoint, one can restate the codec design problem as a bit rate minimization problem, meeting (among others) constraints concerning:

specified levels of signal quality,

implementation complexity, and

communication delay (start coding -- end decoding).

compression methods

As explained in [Codecs], there is a large variety of compression (and corresponding decompression) methods, including model-based methods, as for example the object-based MPEG-4 method that will be discussed later, and waveform-based methods, for which we generally make a distinction between lossless and lossy methods. Hufmann coding is an example of a lossless method, and methods based on Fourier transforms are generally lossy. Lossy means that actual data is lost, so that after decompression there may be a loss of (perceptual) quality.

model-based

LPC, polynomial fitting, object-based

waveform-based

lossless:
- statistical -- gilbert, hufmann
- universal -- arithmetic, pattern matching
lossy:
- spatial & time domain -- delta modulation
- frequency domain
  - filter-based -- subband, wavelet
  - transform-based -- fourier, DCT

Leaving a more detailed description of compression methods to the diligent students' own research, it should come as no surprise that when selecting a compression method, there are a number of tradeoffs, with respect to coding efficiency, the complexity of the coder and decoder, and the signal quality.

tradeoffs

coding efficiency -- compression ratio

coder complexity -- memory, power requirements, ops/sec

signal quality -- bit error probability, signal/noise, ...

In practice this means that when we select a particular coder-decoder scheme we must consider whether we can guarantee

issues in compression selection

resilience to transmission errors

and to what extent we are willing to accept

degradations in decoder output, (lossy)

that is lossy output. Another issue in selecting a method of compression is whether the (compressed)

data representation -- allows for browsing & inspection.

For particular applications, such as conferencing, we should be worried about

the interplay of data modalities -- in particular, audio & video.

And,with regard to the many existing codecs and the variety of platforms we may desire the possibility of

transcoding to other formats -- (interoperability),

to achieve, for example, exchange of media objects between tools, as is already common for image processing tools.

compression standards

Given the importance of codecs it should come as no surprise that much effort has been put in developing standards. Without going into details, we list a number of these standards below.

standard-based codecs

JPEG -- ISO/IEC 10918-1, ITU-T (T.81)
MPEG
- ISO 11172 (up to 1,5 Mbps) -- MPEG-1
- ISO 13818 ITU-T H.262 -- MPEG-2
H3.20 -- for ISDN-like environments
ITU-T H.261 -- P x 64 standard (rate in kbs, p=1..30)
H.324 -- video conferencing for GSTN, 26kbps/sec

In the last decade of the previous millenium great progress has been made in finding efficient encodings for audio and video. I assume that most of you have heard of MP3 (the infamous audio format), and at least some of you should be familiar with MPEG-2 video encoding (which is used for DVDs).

Now, from a somewhat more abstract perspective, we can, again following [Codecs], make a distinction between a pixel-based approach (coding the raw signal so to speak) and an object-based approach, that uses segmentation and a more advanced scheme of description.

pixel-based standards

MPEG-1, MPEG-2, H3.20, H3.24

object-based codec(s)

MPEG-4 -- segmentation-based DFD (Displaced Frame Difference)

As will be explained in more detail when discussing the MPEG-4 standard in section 3.2, there are a number of advantages with an object-based approach. There is, however, also a price to pay. Usually (object) segmentation does not come for free, but requires additional effort in the phase of authoring and coding.

MPEG-1

To conclude this section on codecs, let's look in somewhat more detail at what is involved in coding and decoding a video signal according to the MPEG-1 standard.

MPEG-1 video compression uses both intra-frame analysis, for the compression of individual frames (which are like images), as well as. inter-frame analysis, to detect redundant blocks or invariants between frames.

The MPEG-1 encoded signal itself is a sequence of so-called I, P and B frames.

MPEG-1


    IBBPBBIBBPBBI... 
    IBBPBBPBBPBBI...

frames

I: intra-frames -- independent images
P: computed from closest frame using DCT (or from P frame)
B: computed from two closest P or I frames

Finally, decoding takes place as outlined below.

decoding

first I, then P, and finally B

When an error occurs, a safeguard is provided by the I frames, which stand on themselves.

Subsequent standards were developed to accomodate for more complex signals and greater functionality.

alternatives to MPEG-1

MPEG-2 -- higher pixel resolution and data rate
MPEG-3 -- to support HDTV
MPEG-4 -- object-based, ...
MPEG-7 -- content description

We will elaborate on MPEG-4 in the next section, and briefly discuss MPEG-7 at the end of this chapter.

research directions -- digital video formats

In the online version you will find a brief overview of digital video technology, written by Andy Tanenbaum, as well as some examples of videos of our university, encoded at various bitrates for different viewers.

What is the situation? For traditional television, there are three standards. The american (US) standard, NTSC, is adopted in North-America, South-America and Japan. The european standard, PAL, whuch seems to be technically superior, is adopted by the rest of the world, except France and the eastern-european countries, which have adopted the other european standard, SECAM. An overview of the technical properties of these standards, with permission taken from Tanenbaum's account, is given below.

system spatial resolution frame rate mbps

NTSC 704 x 480 30 243 mbps

PAL/SECAM 720 x 576 25 249 mbps

Obviously real-time distribution of a more than 200 mbps signal is not possible, using the nowadays available internet connections. Even with compression on the fly, the signal would require 25 mbps, or 36 mbps with audio. Storing the signal on disk is hardly an alternative, considering that one hour would require 12 gigabytes.

When looking at the differences between streaming video (that is transmitted real-time) and storing video on disk, we may observe the following tradeoffs:

item streaming downloaded

bandwidth equal to the display rate may be arbitrarily small

disk storage none the entire file must be stored

startup delay almost none equal to the download time

resolution depends on available bandwidth depends on available disk storage

So, what are our options? Apart from the quite successful MPEG encodings, which have found their way in the DVD, there are a number of proprietary formats used for transmitting video over the internet:

formats

Quicktime, introduced by Apple, early 1990s, for local viewing; RealVideo, streaming video from RealNetworks; and Windows Media, a proprietary encoding scheme fromMicrosoft.

Examples of these formats, encoded for various bitrates are available at Video at VU.

Apparently, there is some need for digital video on the internet, for example as propaganda for attracting students, for looking at news items at a time that suits you, and (now that digital video cameras become affordable) for sharing details of your family life.

Is digital video all there is? Certainly not! In the next section, we will deal with standards that allow for incorporating (streaming) digital video as an element in a compound multimedia presentation, possibly synchronized with other items, including synthetic graphics. Online, you will find some examples of digital video that are used as texture maps in 3D space. These examples are based on the technology presented in section 7-3, and use the streaming video codec from Real Networks that is integrated as a rich media extension in the blaxxun Contact 3D VRML plugin.

[] readme preface 1 2 3 4 5 6 7 appendix checklist powerpoint resources director

eliens@cs.vu.nl

draft version 1 (16/5/2003)

system	spatial resolution	frame rate	mbps
NTSC	704 x 480	30	243 mbps
PAL/SECAM	720 x 576	25	249 mbps

item	streaming	downloaded
bandwidth	equal to the display rate	may be arbitrarily small
disk storage	none	the entire file must be stored
startup delay	almost none	equal to the download time
resolution	depends on available bandwidth	depends on available disk storage

codecs

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('3-1.html#slide-'); } document.write('3-1-compress'); if (!slidemode) document.write('.html');document.write('>');

compression methods

issues in compression selection

compression standards

standard-based codecs

object-based codec(s)

MPEG-1

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('3-1.html#slide-'); } document.write('7-standards'); if (!slidemode) document.write('.html');document.write('>');

research directions -- digital video formats