codecs and standards
learning objectives
After reading this chapter you should be able to
demonstrate the necessity of compression,
to discuss criteria for the selection of codecs
and mention some of the alternatives,
to characterize the MPEG-4 and SMIL standards,
to explain the difference between MPEG-4 and MPEG-2,
and to speculate about the feasibility of a semantic
multimedia web.

Without compression and decompression, digital information delivery would be virtually impossible. In this chapter we will take a more detailed look at compression and decompression. It contains the information that you may possibly need to decide on a suitable compression and decompression scheme (codec) for your future multimedia productions.
We will also discuss the standards that may govern the future (multimedia) Web, including MPEG-4, SMIL and RM3D. We will explore to what extent these standards allow us to realize the optimal multimedia platform, that is one that embodies digital convergence in its full potential. Finally, we will investigate how these ideas may ultimately lead to a (multimedia) semantic web.
compression is the key to effective delivery

media | uncompressed | compressed |
voice 8k samples/sec, 8 bits/sample | 64 kbps | 2-4 kbps |
slow motion video 10fps 176x120 8 bits | 5.07 Mbps | 8-16 kbps |
audio conference 8k samples/sec 8bits | 64 kbps | 16-64 kbps |
video conference 15 fps 352x240 8bits | 30.4 Mbps | 64-768 kbps |
audio (stereo) 44.1 k samples/s 16 bits | 1.5 Mbps | 128k-1.5Mbps |
video 15 fps 352x240 15 fps 8 bits | 30.4 Mbps | 384 kbps |
video (CDROM) 30 fps 352x240 8 bits | 60.8 Mbps | 1.5-4 Mbps |
video (broadcast) 30 fps 720x480 8 bits | 248.8 Mbps | 3-8 Mbps |
HDTV 59.9 fps 1280x720 8 bits | 1.3 Gbps | 20 Mbps |

(phone: 56 Kb/s, ISDN: 64-128 Kb/s, cable: 0.5-1 Mb/s, DSL: 0.5-2 Mb/s)
images, video and audio are amenable to compression
statistical redundancy in signal
- spatial correlation -- neighbour samples in single frame
- temporal correlation -- between segments (frames)
irrelevant information
- from perceptual point of view
B. Vasudev & W. Li, Memory management: Codecs

codec = (en)coder + decoder
signal -> source coder -> channel coder (encoding)
signal <- source decoder <- channel decoder (decoding)
|

codec design problem
From a systems design viewpoint, one can restate
the codec design problem as a bit rate minimization problem,
meeting (among others) constraints concerning:
- specified levels of signal quality,
- implementation complexity, and
- communication delay (start coding -- end decoding).

tradeoffs
- resilience to transmission errors
- degradations in decoder output -- lossless or lossy
- data representation -- browsing & inspection
- data modalities -- audio & video.
- transcoding to other formats -- interoperability
- coding efficiency -- compression ratio
- coder complexity -- processor and memory requirements
- signal quality -- bit error probability, signal/noise ratio

- pixel-based -- MPEG-1, MPEG-2, H3.20, H3.24
- object-based -- MPEG-4

MPEG-1 video compression uses both intra-frame analysis, for the compression of individual frames (which are like images), as well as. inter-frame analysis, to detect redundant blocks or invariants between frames.

frames
- I: intra-frames -- independent images
- P: computed from closest frame using DCT (or from P frame)
- B: computed from two closest P or I frames
GigaPort
- optical network technologies - models for network architecture, optical network components and light path provisioning.
- high performance routing and switching - new routing technologies and transport protocols, with a focus on scalability and stability robustness when using data-intensive applications with a high bandwidth demand.
- management and monitoring - incident response in hybrid networks (IP and optical combined) and technologies for network performance monitoring, measuring and reporting.
- grids and access - models, interfaces and protocols for user access to network and grid facilities.
- test methodology - effective testing methods and designing tests for new technologies and network components.

system |
spatial resolution |
frame rate |
mbps |
NTSC | 704 x 480 |
30 |
243 mbps |
PAL/SECAM |
720 x 576 |
25 |
249 mbps |

item |
streaming |
downloaded |
bandwidth |
equal to the display rate |
may be arbitrarily small |
disk storage |
none |
the entire file must be stored |
startup delay |
almost none |
equal to the download time |
resolution |
depends on available bandwidth |
depends on available disk storage |

formats
Quicktime, introduced by Apple, early 1990s, for local viewing;
RealVideo, streaming video from RealNetworks; and
Windows Media, a proprietary encoding scheme fromMicrosoft.
Examples of these formats, encoded for various bitrates
are available at Video at VU.
standards
- XML -- eXtensible Markup Language (SGML)
- MPEG-4 -- coding audio-visual information
- SMIL -- Synchronized Multimedia Integration Language
- RM3D -- (Web3D) Rich Media 3D (extensions of X3D/VRML)

"Perhaps the most immediate need for MPEG-4 is defensive.
It supplies tools with which to create uniform (and top-quality)
audio and video encoders on the Internet,
preempting what may become an unmanageable tangle
of proprietary formats."
MPEG-4
a toolbox of advanced compression algorithms for audiovisual information
scalability
- bitrate -- switching to lower bitrates
- bandwidth -- dynamically discard data
- encoder and decoder complexity -- signal quality
audiovisual information
- still images, video, audio, text
- (synthetic) talking heads and synthesized speech
- synthetic graphics and 3D scenes
- streamed data applied to media objects
- user interaction -- e.g. changes of viewpoint
example
Imagine, a talking figure standing next to a desk
and a projection screen, explaining the contents of
a video that is being projected
on the screen, pointing at a globe that stands on the desk.
The user that is watching that scene decides to
change from viewpoint to get a better look at the globe ...
media objects
- media objects -- units of aural, visual or audiovisual content
- composition -- to create compound media objects (audiovisual scene)
- transport -- multiplex and synchronize data associated with media objects
- interaction -- feedback from users' interaction with audiovisual scene
composition
- placing media objects anywhere in a given coordinate system
- applying transforms to change the appearance of a media object
- applying streamed data to media objects
- modifying the users viewpoint
transport
The data stream (Elementary Streams)
that result from the coding process can be transmitted
or stored separately and need
to be composed so as to create the actual
multimedia presentation at the receivers side.
scenegraph
- BIFS (Binary Format for Scenes) -- describes spatio-temporal arrangements of (media) objects in the scene
- OD (Object Descriptor) -- defines the relationship between the elementary streams associated with an object
- event routing -- to handle user interaction
DMIF
Delivery Multimedia Integration Framework
|
|
(a) scene graph | (b) sprites |

benefits
- end-users -- interactive media accross all platforms and networks
- providers -- transparent information for transport optimization
- authors -- reusable content, protection and flexibility
managing intellectual property
XMT
- XMT contains a subset of X3D
- SMIL is mapped (incompletely) to XMT
SMIL
TV-like multimedia presentations
parallel and sequential
Authoring a SMIL presentation comes down, basically, to
name media components for text, images,audio and video with URLs, and to schedule their presentation either in parallel or in sequence.
presentation characteristics
- The presentation is composed from several components that are accessible via URL's, e.g. files stored on a Web server.
- The components have different media types, such as audio, video, image or text. The begin and end times of different components are specified relative to events in other media components. For example, in a slide show, a particular slide is displayed when the narrator in the audio starts talking about it.
- Familiar looking control buttons such as stop, fast-forward and rewind allow the user to interrupt the presentation and to move forwards or backwards to another point in the presentation.
- Additional functions are "random access", i.e. the presentation can be started anywhere, and "slow motion", i.e. the presentation is played slower than at its original speed.
- The user can follow hyperlinks embedded in the presentation.

applications
- Photos taken with a digital camera can be coordinated with a commentary
- Training courses can be devised integrating voice and images.
- A Web site showing the items for sale, might show photos of the product range in turn on the screen, coupled with a voice talking about each as it appears.
- Slide presentations on the Web written in HTML might be timed so that bullet points come up in sequence at specified time intervals, changing color as they become the focus of attention.
- On-screen controls might be used to stop and start music.

example
<par>
<a href="#Story"> <img src="button1.jpg"/> </a>
<a href="#Weather"> <img src="button2.jpg"/></a>
<excl>
<par id="Story" begin="0s">
<video src="video1.mpg"/>
<text src="captions.html"/>
</par>
<par id="Weather">
<img src="weather.jpg"/>
<audio src="weather-rpt.mp3"/>
</par>
</excl>
</par>

history
Experience from both the CD-ROM community and from the Web multimedia community suggested that it would be beneficial to adopt a declarative format for expressing media synchronization on the Web as an alternative and complementary approach to scripting languages.
Following a workshop in October 1996, W3C established a first working group on synchronized multimedia in March 1997. This group focused on the design of a declarative language and the work gave rise to SMIL 1.0 becoming a W3C Recommendation in June 1998.
SMIL 2.0 Modules
- The Animation Modules
- The Content Control Modules
- The Layout Modules
- The Linking Modules
- The Media Object Modules
- The Metainformation Module
- The Structure Module
- The Timing and Synchronization Module
- The Time Manipulations Module
- The Transition Effects Module

module-based reuse
- SMIL modules could be used to provide lightweight multimedia functionality on mobile phones, and to integrate timing into profiles such as the WAP forum's WML language, or XHTML Basic.
- SMIL timing, content control, and media objects could be used to coordinate broadcast and Web content in an enhanced-TV application.
- SMIL Animation is being used to integrate animation into W3C's Scalable Vector Graphics language (SVG).
- Several SMIL modules are being considered as part of a textual representation for MPEG4.

www.web3d.org
- VRML 1.0 -- static 3D worlds
- VRML 2.0 or VRML97 -- dynamic behaviors
- VRML200x -- extensions
- X3D -- XML syntax
- RM3D -- Rich Media in 3D
groups.yahoo.com/group/rm3d/
The Web3D Rich Media Working Group was formed to develop a Rich Media standard format (RM3D) for use in next-generation media devices. It is a highly active group with participants from a broad range of companies including 3Dlabs, ATI, Eyematic, OpenWorlds, Out of the Blue Design, Shout Interactive, Sony, Uma, and others.
RM3D
The Web3D Consortium initiative is fueled by a clear need for a standard high performance Rich Media format. Bringing together content creators with successful graphics hardware and software experts to define RM3D will ensure that the new standard addresses authoring and delivery of a new breed of interactive applications.
requirements
- rich media -- audio, video, images, 2D & 3D graphics
(with support for temporal behavior, streaming and synchronisation)
- applicability -- specific application areas, as determined by
commercial needs and experience of working group members
- interoperability -- VRML97, X3D, MPEG-4, XML (DOM access)
- object model -- common model for representation of objects and capabilities
- extensibility -- integration of new objects (defined in Java or C++), scripting capabilities and declarative content
- high-quality realtime rendering -- realtime interactive media experiences
- platform adaptability -- query function for programmatic behavior selection
- predictable behavior -- well-defined order of execution
- high precision number systems -- greater than single-precision IEEE floating point numbers
- minimal size -- download and memory footprint
SMIL is closer to the author
and RM3D is closer to the implementer.
working draft
Since there are three vastly different proposals for this section (time model), the original <RM3D> 97 text
is kept. Once the issues concerning time-dependent nodes are resolved, this section can be
modified appropriately.
time model
- MPEG-4 -- spring metaphor
- SMIL -- cascading time
- RM3D/VRML -- event routing
MPEG-4 -- spring metaphor
- duration -- minimal, maximal, optimal
SMIL -- cascading time
- time container -- speed, accelerate, decelerate, reverse, synchronize
<seq speed="2.0">
<video src="movie1.mpg" dur="10s"/>
<video src="movie2.mpg" dur="10s"/>
<img src="img1.jpg" begin="2s" dur="10s">
<animateMotion from="-100,0" to="0,0" dur="10s"/>
</img>
<video src="movie4.mpg" dur="10s"/>
</seq>
RM3D/VRML -- event routing
- TimeSensor -- isActive, start, end, cycleTime, fraction, loop
web content
- 1st generation -- hand-coded HTML pages
- 2nd generation -- templates with content and style
- 3rd generation -- rich markup with metadata (XML)

structure to the meaningful content of web pages,
meta data
Metadata is data about data.
Specifically, the term refers to data used to identify, describe, or locate information resources,
whether these resources are physical or electronic. While structured metadata processed by computers
is relatively new, the basic concept of metadata has been used for many years in helping manage
and use large collections of information. Library card catalogs are a familiar example of such
metadata.

Dublin Core example
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/">
<rdf:Description rdf:about="http://www.dlib.org/dlib/may98/miller/05miller.html">
<dc:title>An Introduction to the Resource Description Framework</dc:title>
<dc:creator>Eric J. Miller</dc:creator>
<dc:description>The Resource Description Framework (RDF) is an
infrastructure that enables the encoding, exchange and reuse of
structured metadata. rdf is an application of xml that imposes needed
structural constraints to provide unambiguous methods of expressing
semantics. rdf additionally provides a means for publishing both
human-readable and machine-processable vocabularies designed to
encourage the reuse and extension of metadata semantics among
disparate information communities. the structural constraints rdf
imposes to support the consistent encoding and exchange of
standardized metadata provides for the interchangeability of separate
packages of metadata defined by different resource description
communities. </dc:description>
<dc:publisher>Corporation for National Research Initiatives</dc:publisher>
<dc:subject>
<rdf:Bag>
<rdf:li>machine-readable catalog record formats</rdf:li>
<rdf:li>applications of computer file organization and
access methods</rdf:li>
</rdf:Bag>
</dc:subject>
<dc:rights>Copyright © 1998 Eric Miller</dc:rights>
<dc:type>Electronic Document</dc:type>
<dc:format>text/html</dc:format>
<dc:language>en</dc:language>
<dcterms:isPartOf rdf:resource="http://www.dlib.org/dlib/may98/05contents.html"/>
</rdf:Description>
</rdf:RDF>
Dublin Core
- title -- name given to the resource
- creator -- entity primarily responsible for making the content of the resource
- subject -- topic of the content of the resource
- description -- an account of the content of the resource
- publisher -- entity responsible for making the resource available
- contributor -- entity responsible for making contributions to the content of the resource
- date -- date of an event in the lifecycle of the resource
- type -- nature or genre of the content of the resource
- format -- physical or digital manifestation of the resource
- identifier -- unambiguous reference to the resource within a given context
- source -- reference to a resource from which the present resource is derived
- language -- language of the intellectual content of the resource
- relation -- reference to a related resource
- coverage -- extent or scope of the content of the resource
- rights -- information about rights held in and over the resource


information repository
The Web is becoming a universal repository of human knowledge
and culture, which has allowed unprecedented sharing of
ideas and information in a scale never seen before.

browsing & navigation
To satisfy his information need,
the user might navigate the hyperspace of web links
searching for information of interest.
However, since the hyperspace is vast and almost unknown,
such a navigation task is usually inefficient.

information agent
- gather information
- filter and select

presentation agent
- access information
- find suitable mode of presentation

PERsonal and SOcial NAvigation through information spaceS
PERSONAS
investigating a new approach to navigation through information spaces, based on a personalised and social navigational paradigm.

Agneta & Frieda
The AGNETA & FRIDA system seeks to integrate web-browsing and narrative
into a joint mode. Below the browser window (on the desktop) are placed two
female characters, sitting in their livingroom chairs, watching the browser during the
session (more or less like watching television). Agneta and Frida (mother and
daughter) physically react, comment, make ironic remarks about and develop
stories around the information presented in the browser (primarily to each other),
but are also sensitive to what the navigator is doing and possible malfunctions of the
browser or server.

Agneta & Frieda
In this way they seek to attach emotional, comical or
anecdotal connotations to the information and happenings in the browsing session. Through an activity slider, the navigator can
decide on how active she wants the characters to be, depending on the purpose of the browsing session (serious information
seeking, wayfinding, exploration or entertainment browsing).

game as social system
actors | rule(s) | resource(s) |
players | events | game space |
roles | evaluation | situation |
goals | facilitator(s) | context |

criteria
- relevance -- what is our message?
- identity -- who are we?
- impact -- why would anybody be interested?

climate star
- climate strategies -- (1) emission reduction, (2) adaptation
- climate systems -- (3) feedback monitoring, (4) investment in research, (5) climate response
- energy and CO2 -- (6) investment in efficiency, (7) investment in green technology, (8) governement rules
- regional development -- (9) campain for awareness, (10) securing food and water
- adaptation measures -- (11) public space, (12) water management, (13) use of natural resources
- international relations -- (14) CO2 emission trade, (15) European negotiations, (16) international convenants

simulation parameters
- people -- how is the policy judged by the people?
- profit -- what is the influence on the (national) economy?
- planet -- what are the effects for the environment?

|
game play, model-based simulation, exploration |

game elements
- game cycle -- turns in subsequent rounds (G)
- simulation(s) -- based on (world) climate model (W)
- exploration -- by means of interactive video (E)

argument(s)
- topic-centered -- common beliefs, use of logic, examples
- viewer-centered -- patriotisms, religious or romantic sentimentality
- speaker-centered -- the makers are well-informed, sincere and trusthworthy

concepts

technology

projects & further reading
As a project, you may think of implementing for
example JPEG compression, following [Fundamentals],
or a SMIL-based application for cultural heritage.
You may further explore the technical issues
on authoring DV material, using any of the
Adobe,
mentioned in appendix E.
or compare
For further reading I advice you to take a look
at the respective specifications of MPEG-4
and SMIL,
and compare the functionality of MPEG-4 and SMIL-based presentation
environments.
An invaluable book dealing with the many technical
aspects of compression and standards in [Fundamentals].

- costume designs -- photographed from Die Russchische Avantgarde
und die Buhne 1890-1930
- theatre scene design, also from (above)
- dance Erica Russel, [Animovie]
- MPEG-4 -- bits rates, from [MPEG-4].
- MPEG-4 -- scene positioning, from [MPEG-4].
- MPEG-4 -- up and downstream data, from [MPEG-4].
- MPEG-4 -- left: scene graph; right: sprites, from [MPEG-4].
- MPEG-4 -- syntax, from [MPEG-4].
- MIT Media Lab web site.
- student work -- multimedia authoring I, dutch windmill.
- student work -- multimedia authoring I, Schröder house.
- student work -- multimedia authoring I, train station.
- animation -- Joan Gratch, from [Animovie].
- animation -- Joan Gratch, from [Animovie].
- animation -- Joan Gratch, from [Animovie].
- animation -- Joan Gratch, from [Animovie].
- Agneta and Frieda example.
- diagram (Clima Futura) game elements
- signs -- people, [Signs], p. 246, 247.
