standards
"Perhaps the most immediate need for MPEG-4 is defensive. It supplies tools with which to create uniform (and top-quality) audio and video encoders on the Internet, preempting what may become an unmanageable tangle of proprietary formats."
Indeed, if we are looking for a general characterization it
would be that MPEG-4 is primarily
and, moreover, one that is suitable for a variety of
display devices and networks, including low bitrate
mobile networks.
MPEG-4 supports scalability on a variety of levels:
Imagine, a talking figure standing next to a desk
and a projection screen, explaining the contents of
a video that is being projected
on the screen, pointing at a globe that stands on the desk.
The user that is watching that scene decides to
change from viewpoint to get a better look at the globe ...
How would you describe such a scene?
How would you encode it?
And how would you approach decoding
and user interaction?
The data stream (Elementary Streams)
that result from the coding process can be transmitted
or stored separately and need
to be composed so as to create the actual
multimedia presentation at the receivers side.
At a system level, MPEG-4 offers the following
functionalities to achieve this:
that allows for transparent interaction with resources,
irrespective of whether these are available from local
storage, come from broadcast, or must be obtained from
some remote site.
Also transparency with respect to network type is
supported.
Quality of Service is only supoorted to the
extent that it ispossible to indicate needs for
bandwidth and transmission rate.
It is however the responsability of the network provider to
realize any of this.
benefits
Of media objects.
www.eetimes.com/story/OEG20010220S0065
MPEG-4 is "a big standard," said Tim Schaaff, vice president of engineering for Apple Computer Inc.'s Interactive Media Group. "It's got tons of tools inside." Its success, he said, will depend on the industry's willingness to home in on a small subset, winnowing from a number of profiles and levels designed for streaming a slew of digital multimedia types -- audio, several types of video, still images, and 2-D and 3-D graphics.
Some may find it to ambitious.
But, then again, what it offers is clearly worthwhile.
MPEG-4's chief features include highly efficient compression, error resilience, bandwidth scalability ranging from 5 kbits to 20 Mbits/second, network and transport-protocol independence, content security and object-based interactivity, or the ability to pluck a lone image -- say, the carrot Bugs Bunny is about to chomp -- out of a video scene and move it around independently.
And, not altogether unimportant,
it may offer significant commercial benefits.
Broadband service providers, such as cable and DSL companies, are right behind wireless in sizing up MPEG-4, largely because its low bit rate could help them add channels in their broadband pipes while incorporating interactive features in the content. Possibilities include multiple video streams, clickable video, real-time 3-D animation and interactive advertising.
The SMIL language is an XML application, resembling HTML.
SMIL presentations can be written using a simple text-editor
or any of the more advanced tools, such as GRINS.
There is a variety of SMIL players.
The most wellknown perhaps is the RealNetworks G8 players,
that allows for incorporating RealAudio and RealVideo
in SMIL presentations.
parallel and sequential
Authoring a SMIL presentation comes down, basically, to
name media components for text, images,audio and video with URLs, and to schedule their presentation either in parallel or in sequence.
Quoting the SMIL 2.0 working draft, we can characterize
the SMIL presentation characteristics as follows:
presentation characteristics
applications example
history
Experience from both the CD-ROM community and from the Web multimedia community suggested that it would be beneficial to adopt a declarative format for expressing media synchronization on the Web as an alternative and complementary approach to scripting languages.
In summary,
SMIL 2.0 proposes a declarative format
to describe the temporal behavior of a multimedia presentation,
associate hyperlinks with media objects, describe the form of the
presentation on a screen, and specify interactivity
in multimedia presentations.
Now,why such a fuzz about "declarative format"?
Isn't scripting more exciting?
And aren't the tools more powerful?
Ok, ok. I don't want to go into that right now.
Let's just consider a declarative format
to be more elegant. Ok?
SMIL 2.0 Modules
module-based reuse
The Web3D Rich Media Working Group was formed to develop a Rich Media standard format (RM3D) for use in next-generation media devices. It is a highly active group with participants from a broad range of companies including 3Dlabs, ATI, Eyematic, OpenWorlds, Out of the Blue Design, Shout Interactive, Sony, Uma, and others.
In particular:
RM3D
The Web3D Consortium initiative is fueled by a clear need for a standard high performance Rich Media format. Bringing together content creators with successful graphics hardware and software experts to define RM3D will ensure that the new standard addresses authoring and delivery of a new breed of interactive applications.
The working group is active in a number of areas including,
for example, multitexturing and the integration of video
and other streaming media in 3D worlds.
requirements
SMIL is closer to the author
and RM3D is closer to the implementer.
MPEG-4, in this respect is even further away from the
author since its chief focus is on compression
and delivery across a network.
working draft
Since there are three vastly different proposals for this section (time model), the original <RM3D> 97 text
is kept. Once the issues concerning time-dependent nodes are resolved, this section can be
modified appropriately.
Now, what are the options?
Each of the standards discussed to far
provides us with a particular solution to timing.
Summarizing, we have a time model based on a spring metaphor in MPEG-4,
the notion of cascading time in SMIL (inspired by
cascading stylesheets for HTML) and timing based on the
routing of events in RM3D/VRML.
time model
MPEG-4 -- spring metaphor SMIL -- cascading time
RM3D/VRML -- event routing draft version 1 (16/5/2003)
Dependent on network resources and platform capabilities,
the 'right' level of signal quality can be determined
by selecting the optimal codec, dynamically.
media objects
For 3D-scene description, MPEG-4 builds on concepts
taken from VRML (Virtual Reality Modeling Language,
discussed in chapter 7).
In addition, MPEG-4 defines a set of functionalities
For the delivery of streamed data, DMIF, which stands for
authoring
In effect, although MPEG-4 is primarily concerned
with efficient encoding
and scalable transport and delivery,
the object-based approach has also clear
advantages from an authoring perspective.
syntax
when discussing RM3D, we will further establish
whatthe relations between, respectively MPEG-4,
SMIL and RM3D are,
and in particular where there is disagreement,
for example with respect to the timing model
underlying animations and the temporal control of
media objects.
the press
unfocused ambition
SMIL
Where HTML has become successful as a means to write simple hypertext
content,
the SMIL language is meant to become a vehicle of choice
for writing synchronized hypermedia.
The working draft mentions a number of possible applications,
for example a photoalbun with spoken comments,
multimedia training courses, product demos with explanatory
text, timed slide presentations, onlime music with controls.
Notice that there are two parallel (PAR)
tags, and one exclusive (EXCL) tag.
The exclusive tag has been introduced in SMIL 2.0
to allow for making an exclusive choice,so that only
one of the items can be selected at a particular time.
The SMIL 2.0 working draft defines a number of elements
and attributes to control presentation, synchronization
and interactivity, extending the functionality of SMIL 1.0.
The SMIL 2.0 working draft is at the moment of writing
being finalized.
It specifies a number of language profiles
topromote the reuse of SMIL modules.
It also improves on the accessibility features of SMIL 1.0,
which allows for,
for example,, replacing captions by audio descriptions.
RM3D
In 1997, VRML2 was accepted as a standard, offering rich means
to create 3D worlds with dynamic behavior and user interaction.
VRML97 (which is the same as VRML2) was, however, not the success
it was expected to be, due to (among others)
incompatibility between browsers,
incomplete implementations of the standards,
and high performance requirements.
requirements
The RM3D group aims at interoperability with other
standards.
In particular, an XML syntax is being defined in parallel
(including interfaces for the DOM).
And, there is mutual interest and exchange of ideas between the
MPEG-4 and RM3D working group.
Notice that extensibility also requires the definition of
a declarative format, so that the content author need
not bother with programmatic issues.
timing model
The spring metaphor amounts to the ability
to shrink or stretch a media object within given bounds
(minimum, maximum)
to cope with, for example, network delays.
Media objects, in SMIL, are stored in some sort of container
of which the timing properties can be manipulated.
When a TimeSensor starts to emit time events,
it also sends out an event notifying other objects
that it has become active.
Dependent on itsso-called cycleTime,
it sends out the fraction it covered
since it started.
This fraction may be send to one of the standard
interpolators or a script so that some value can be set,
such as for example the orientation,
dependent on the fraction of the time intercal that has passed.
When the TimeSensor is made to loop,
this is done repeatedly.
Although time in VRML is absolute,
the frequency with which fraction events are emitted depends
on the implementation and processor speed.
research directions -- meta standards
[]
readme
preface
1
2
3
4
5
6
7
appendix
checklist
powerpoint
resources
director
eliens@cs.vu.nl