topical media & game development

talk show tell print

adapted with permission from A. Tanenbaum video technology


Internet Video

Showing a movie over the Internet is not easy.  Since we are a university, we are not content with explanations like "It can't be done."  We encourage our students to question everything, so it is only fair that we start out explaining why showing a movie over the Internet is so hard. In this document we will try to explain a little bit about digital video technology and politics to explain what the problems are and why the solutions are so complicated. The reader is presumed to (almost) have a bachelor's degree in computer science or engineering or the equivalent industrial experience so we will not define the basic terms like "pixel' and 'Mbps." It goes without saying that this story is highly simplified and many details have been omitted.

A brief Introduction to Video Standards

Television and movies work by painting a series of still images on the screen at a rate of 24-30 frames/sec. At this speed, the eye perceives them as continuous motion rather than as a slide show because it takes a few tens of milliseconds for an image on the retina to fade out. This effect is known as "persistence of vision."  Audio is transmitted as a separate channel, unrelated to the video, so it must be synchronized with the video later.  The audio signal must be presented at a high rate. A gap of even 1 msec is noticeable.

Primitive black-and-white televisions were invented by Philo Farnsworth and Vladimir Zworykin in the late 1920s. They were involved in a huge patent fight for years over priority (and the royalties paid by all television set manufacturers).  Color television is produced by simultaneously projecting three monochrome images onto the screen at once, one each in red, green, and blue. In the early 1950s, the U.S. government set up the National Television Standards Committee to standardize the signal so that all stations and all sets would be compatible. The NTSC standard is still used in the U.S., as well is in the rest of North America, South America, and Japan. The NTSC system works by painting the screen 30 times a second with a frame consisting of 525 scan lines, only 480 of which are visible; the rest are too high or too low and fall outside the defined picture area. The entire signal, including the audio, falls within a 6 MHz bandwith broadcast channel.

Several years later, television came to Western Europe. The Europeans decided that 30 frames/sec was overkill (Hollywood films use only 24 frames/sec). Instead they invented a system with better spatial resolution (625 scan lines of which 576 are visible) and a lower frame rate (25 frames/sec).  When it came to the color encoding system, a split developed, with the Germans choosing the PAL (Phase Alternating Line) system and the French choosing the SECAM (SEquentiel Couleur Avec Memoire) system. The reason for the split was the desire of the French to protect their domestic television set manufacturers from foreign competition. France is the only country in Western Europe to use SECAM; the rest use PAL. Asia, Africa, and most of the rest of the world use PAL, which is technically the best of the systems.

Eastern Europe got television later still. The then-Communist controlled governments chose the SECAM system in order to prevent East Germans from watching West German (i.e., PAL) television. This is how television standards are developed.

Video Compression

The digital versions of NTSC and PAL/SECAM have the following characteristics (Note: B = Byte and b = bit):

System Spatial resolution Frame rate Mbps
NTSC704 x 480 30 243 Mbps
PAL/SECAM 720 x 576 25 249 Mbps

It should be clear to anyone with knowledge of the current Internet that sending a real-time data stream in excess of 200 Mbps over the net is not going to fly for quite a few years.  In fact, just recording such a high-bandwidth signal on tape inside a camcorder is also impossible. For this reason, all digital camcorders using the MiniDV system do compression inside the camera on the fly.  Each image is compressed individually in a way similar to how JPEG images are compressed. First the luminance (intensity) and chrominance (color) signals are separated, and for each one the image is divided into 8 x 8 pixel blocks. These blocks are then run through a kind of Fourier transform to get them into frequency space. The high frequency coefficients are then discarded, which affects the sharpness. The more coefficients that are thrown away, the worse the sharpness but the better the compression. The remaining coefficients are encoded using Huffman encoding.  At the receiving end, the reverse process takes place. The net result of this compression is that the video signal can be reduced to about 25 Mbps. When the audio, time code and certain other information is added, the gross output rate is about 36 Mbps. This is the data rate that is written onto the DV tape and read into a computer using the IEEE 1394 (FireWire) interface.  Even with this compression, a 1-hour movie occupies 12 gigabytes of disk space.

A better system is to observe that many video frames are nearly identical to their predecessors. Thus by comparing a frame to its immediate predecessor and only specifying the 8 x 8 blocks that differ from the corresponding ones in the previous frame, even more compression can be achieved.  An even better approach is when a block fails to match exactly, see if a nearby block matches.  This situation occurs whenever the camera pans across the scene. These comparisions takes too long to do in real time, so camcorders just use intraframe compression but not interframe compression at present (but no doubt will one day).  For a movie that has been stored on a computer in DV format, however, the computer can do the interframe compression. Even if it takes 10 hours to compress 1 hour of video, the compression is only done once but the movie will be viewed many times (unless it is a very bad movie).  An international standard for this interframe compression has been developed. It is called MPEG after the modestly named committee that wrote the standard: the Motion Picture Experts Group.

Actually, MPEG is a family of standards. MPEG-1 uses an image size equal to about one quarter of the standard video frame to gain more compression at the expense of lower resolution. MPEG-1 for NTSC is 320 x 240 pixels; PAL/SECAM is 352 x 288 pixels.  MPEG-2 is intended for high-quality television. It is the standard used on DVDs. Unfortunately, it uses so much bandwidth that it is a not a serious candidate for use over the Internet yet.

Digital Audio

Audio requires less bandwidth than video, but the ear is much more sensitive than the eye, so slight distortions are much more easily detectable. True audiophiles argue that using a $1000 cable to connect the turntable to the amplifier gives appreciably better sound than using a $50 cable. Nobody argues this for video. CD quality stereo sound requires sending two audio tracks sampled at 44,100 Hz with a 16-bit sample. This rate requires a 1.4 Mbps channel just for the audio. Thus it is not possible to even send a pure CD-quality audio signal over a 56 kbps modem. It is not even close. Part of the MPEG standard specifies an audio compression scheme that reduces the audio bandwidth needed by a factor of 10. This scheme, called MP3, is widely used on the Internet for exchanging audio files.

Streaming vs. Downloading

There are two possible modes of acquiring video over the Internet
  • Watching it in real-time as it streams in over the wire
  • Storing the video as a file on the hard disk and then playing it later off the disk

In the first mode, the Internet bandwidth must match the display rate.  If the display rate requires, for example, 512 kbps, the network connection must be able to supply that bandwidth continuously.  Usually the receiver buffers a few seconds of video before starting to avoid having to stop the video if the network is ever too slow, but the buffer can hold only so much video.  If it ever empties, the film stops.

In the second mode, the entire video file is downloaded to the local hard disk in advance. Then it is played off the hard disk. Assuming the disk has a high enough bandwidth, then it is possible to display a high quality movie even over a slow connection. The properties of streaming vs. downloaded video are summarized below:

Item Streaming video Downloaded video
Bandwidth required Equal to the display rate May be arbitrarily small
Disk storage required None The entire file must be stored
Startup delay before viewing Almost none Equal to the download time
Resolution Depends on available bandwidth Depends on available disk storage

Video Formats

The first commercial digital video system for personal computers, Quick Time, was introduced by Apple in the early 1990s. It was intended for producing and viewing video locally, reading and writing the hard disk. A few years later, Real Networks devised a scheme, Real Video,  intended to transmit video over the Internet. When it became clear that video was going to become important, Microsoft invented its own proprietary encoding scheme, Windows Media.  Thus we now have four competing and incompatible systems out there: MPEG, Quick Time, Real Video, and Windows Media, in addition to the NTSC, PAL, SECAM split, giving 12 combinations in all.  Fortunately, NTSC vs. PAL affects only the image size, so different versions are not needed for the U.S. and Europe. However, each of the schemes is capable of encoding at various bit rates, depending on whether the target is a 56 kbps modem, a 112 kbps U.S. ISDN line, a 128 kbps European ISDN line, a LAN, a cable modem, or ADSL.  For all of them, more compression means a smaller file. A smaller file requires less bandwidth when streamed and less delay and disk storage when downloaded in advanced and played later.

All in all, it should now be clear why sending video over the Internet is not easy at present. Thus for our movies, we have provided multiple formats and multiple compression choices. It is recommended that you choose your favorite system (Quick Time, Real Video, or Windows Media and try the highest bandwidth version we have supplied.  If that leads to errors and dropped frames, try the next highest rate and so, until you find one that works well. Alternatively, just give up on streaming and download one of the files via FTP for playback later.

see Video at VU




(C) Æliens 04/09/2009

You may not copy or print any of this material without explicit permission of the author or the publisher. In case of other copyright issues, contact the author.