topical media & game development
adapted with permission from A. Tanenbaum video technology
Internet Video
Showing a movie over the Internet is not easy. Since we are
a university, we are not content with explanations like "It can't be done."
We encourage our students to question everything, so it is only fair that
we start out explaining why showing a movie over the Internet is so hard.
In this document we will try to explain a little bit about digital video
technology and politics to explain what the problems are and why the solutions
are so complicated. The reader is presumed to (almost) have a bachelor's
degree in computer science or engineering or the equivalent industrial
experience so we will not define the basic terms like "pixel' and 'Mbps."
It goes without saying that this story is highly simplified and many details
have been omitted.
A brief Introduction to Video Standards
Television and movies work by painting a series of still images on the
screen at a rate of 24-30 frames/sec. At this speed, the eye perceives
them as continuous motion rather than as a slide show because it takes
a few tens of milliseconds for an image on the retina to fade out. This
effect is known as "persistence of vision." Audio is transmitted
as a separate channel, unrelated to the video, so it must be synchronized
with the video later. The audio signal must be presented at a high
rate. A gap of even 1 msec is noticeable.
Primitive black-and-white televisions were invented by Philo Farnsworth
and Vladimir Zworykin in the late 1920s. They were involved in a huge patent
fight for years over priority (and the royalties paid by all television
set manufacturers). Color television is produced by simultaneously
projecting three monochrome images onto the screen at once, one each in
red, green, and blue. In the early 1950s, the U.S. government set up the
National Television Standards Committee to standardize the signal so that
all stations and all sets would be compatible. The NTSC standard is still
used in the U.S., as well is in the rest of North America, South America,
and Japan.
The NTSC system works by painting the screen 30 times a second with
a frame consisting of 525 scan lines, only 480 of which are visible; the
rest are too high or too low and fall outside the defined picture area.
The entire signal, including the audio, falls within a 6 MHz bandwith broadcast
channel.
Several years later, television came to Western Europe. The Europeans
decided that 30 frames/sec was overkill (Hollywood films use only 24 frames/sec).
Instead they invented a system with better spatial resolution (625 scan
lines of which 576 are visible) and a lower frame rate (25 frames/sec).
When it came to the color encoding system, a split developed, with the
Germans choosing the PAL (Phase Alternating Line) system and the French
choosing the SECAM (SEquentiel Couleur Avec Memoire) system. The reason
for the split was the desire of the French to protect their domestic television
set manufacturers from foreign competition. France is the only country
in Western Europe to use SECAM; the rest use PAL. Asia, Africa, and most
of the rest of the world use PAL, which is technically the best of the
systems.
Eastern Europe got television later still. The then-Communist controlled
governments chose the SECAM system in order to prevent East Germans from
watching West German (i.e., PAL) television. This is how television standards
are developed.
Video Compression
The digital versions of NTSC and PAL/SECAM have the following characteristics
(Note: B = Byte and b = bit):
System |
Spatial resolution |
Frame rate |
Mbps |
NTSC | 704 x 480 |
30 |
243 Mbps |
PAL/SECAM |
720 x 576 |
25 |
249 Mbps |
It should be clear to anyone with knowledge of the current Internet
that sending a real-time data stream in excess of 200 Mbps over the net
is not going to fly for quite a few years. In fact, just recording
such a high-bandwidth signal on tape inside a camcorder is also impossible.
For this reason, all digital camcorders using the MiniDV system do compression
inside the camera on the fly. Each image is compressed individually
in a way similar to how JPEG images are compressed. First the luminance
(intensity) and chrominance (color) signals are separated, and for each
one the image is divided into 8 x 8 pixel blocks. These blocks are then
run through a kind of Fourier transform to get them into frequency space.
The high frequency coefficients are then discarded, which affects the sharpness.
The more coefficients that are thrown away, the worse the sharpness but
the better the compression. The remaining coefficients are encoded using
Huffman encoding. At the receiving end, the reverse process takes
place. The net result of this compression is that the video signal can
be reduced to about 25 Mbps. When the audio, time code and certain other
information is added, the gross output rate is about 36 Mbps. This is the
data rate that is written onto the DV tape and read into a computer using
the IEEE 1394 (FireWire) interface. Even with this compression, a
1-hour movie occupies 12 gigabytes of disk space.
A better system is to observe that many video frames are nearly identical
to their predecessors. Thus by comparing a frame to its immediate predecessor
and only specifying the 8 x 8 blocks that differ from the corresponding
ones in the previous frame, even more compression can be achieved.
An even better approach is when a block fails to match exactly, see if
a nearby block matches. This situation occurs whenever the camera
pans across the scene. These comparisions takes too long to do in real
time, so camcorders just use intraframe compression but not interframe
compression at present (but no doubt will one day). For a movie that
has been stored on a computer in DV format, however, the computer can do
the interframe compression. Even if it takes 10 hours to compress 1 hour
of video, the compression is only done once but the movie will be viewed
many times (unless it is a very bad movie). An international standard
for this interframe compression has been developed. It is called MPEG after
the modestly named committee that wrote the standard: the Motion Picture
Experts Group.
Actually, MPEG is a family of standards. MPEG-1 uses an image size equal
to about one quarter of the standard video frame to gain more compression
at the expense of lower resolution. MPEG-1 for NTSC is 320 x 240 pixels;
PAL/SECAM is 352 x 288 pixels. MPEG-2 is intended for high-quality
television. It is the standard used on DVDs. Unfortunately, it uses so
much bandwidth that it is a not a serious candidate for use over the Internet
yet.
Digital Audio
Audio requires less bandwidth than video, but the ear is much more sensitive
than the eye, so slight distortions are much more easily detectable. True
audiophiles argue that using a $1000 cable to connect the turntable to
the amplifier gives appreciably better sound than using a $50 cable. Nobody
argues this for video. CD quality stereo sound requires sending two audio
tracks sampled at 44,100 Hz with a 16-bit sample. This rate requires a
1.4 Mbps channel just for the audio. Thus it is not possible to even send
a pure CD-quality audio signal over a 56 kbps modem. It is not even close.
Part of the MPEG standard specifies an audio compression scheme that reduces
the audio bandwidth needed by a factor of 10. This scheme, called MP3,
is widely used on the Internet for exchanging audio files.
Streaming vs. Downloading
There are two possible modes of acquiring video over the Internet
- Watching it in real-time as it streams in over the wire
- Storing the video as a file on the hard disk and then playing it later off the disk
In the first mode, the Internet bandwidth must match the display rate.
If the display rate requires, for example, 512 kbps, the network connection
must be able to supply that bandwidth continuously. Usually the receiver
buffers a few seconds of video before starting to avoid having to stop
the video if the network is ever too slow, but the buffer can hold only
so much video. If it ever empties, the film stops.
In the second mode, the entire video file is downloaded to the local
hard disk in advance. Then it is played off the hard disk. Assuming the
disk has a high enough bandwidth, then it is possible to display a high
quality movie even over a slow connection. The properties of streaming
vs. downloaded video are summarized below:
Item |
Streaming video |
Downloaded video |
Bandwidth required |
Equal to the display rate |
May be arbitrarily small |
Disk storage required |
None |
The entire file must be stored |
Startup delay before viewing |
Almost none |
Equal to the download time |
Resolution |
Depends on available bandwidth |
Depends on available disk storage |
Video Formats
The first commercial digital video system for personal computers, Quick
Time, was introduced by Apple in the early 1990s. It was intended for producing
and viewing video locally, reading and writing the hard disk. A few years
later, Real Networks devised a scheme, Real Video, intended to transmit
video over the Internet. When it became clear that video was going to become
important, Microsoft invented its own proprietary encoding scheme, Windows
Media. Thus we now have four competing and incompatible systems out
there: MPEG, Quick Time, Real Video, and Windows Media, in addition to
the NTSC, PAL, SECAM split, giving 12 combinations in all. Fortunately,
NTSC vs. PAL affects only the image size, so different versions are not
needed for the U.S. and Europe. However, each of the schemes is capable
of encoding at various bit rates, depending on whether the target is a
56 kbps modem, a 112 kbps U.S. ISDN line, a 128 kbps European ISDN line,
a LAN, a cable modem, or ADSL. For all of them, more compression
means a smaller file. A smaller file requires less bandwidth when streamed
and less delay and disk storage when downloaded in advanced and played
later.
All in all, it should now be clear why sending video over the Internet
is not easy at present. Thus for our movies, we have provided multiple
formats and multiple compression choices. It is recommended that you choose
your favorite system (Quick Time, Real Video, or Windows Media and
try the highest bandwidth version we have supplied. If that leads
to errors and dropped frames, try the next highest rate and so, until you
find one that works well. Alternatively, just give up on streaming and
download one of the files via FTP for playback later.
see Video at VU
(C) Æliens
04/09/2009
You may not copy or print any of this material without explicit permission of the author or the publisher.
In case of other copyright issues, contact the author.