topical media & game development

talk show tell print

...



1

MPEG-4

The MPEG standards (in particular 1,2 and 3) have been a great success, as testified by the popularity of mp3 and DVD video.

Now, what can we expect from MPEG-4? Will MPEG-4 provide multimedia for our time, as claimed in  [Time]. The author, Rob Koenen, is senior consultant at the dutch KPN telecom research lab, active member of the MPEG-4 working group and editor of the MPEG-4 standard document.

"Perhaps the most immediate need for MPEG-4 is defensive. It supplies tools with which to create uniform (and top-quality) audio and video encoders on the Internet, preempting what may become an unmanageable tangle of proprietary formats."

Indeed, if we are looking for a general characterization it would be that MPEG-4 is primarily

MPEG-4


a toolbox of advanced compression algorithms for audiovisual information

and, moreover, one that is suitable for a variety of display devices and networks, including low bitrate mobile networks. MPEG-4 supports scalability on a variety of levels:

scalability

Dependent on network resources and platform capabilities, the 'right' level of signal quality can be determined by selecting the optimal codec, dynamically.

...



2

media objects

It is fair to say that MPEG-4 is a rather ambitious standard. It aims at offering support for a great variety of audiovisual information, including still images, video, audio, text, (synthetic) talking heads and synthesized speech, synthetic graphics and 3D scenes, streamed data applied to media objects, and user interaction -- e.g. changes of viewpoint.

audiovisual information


Let's give an example, taken from the MPEG-4 standard document.

example


Imagine, a talking figure standing next to a desk and a projection screen, explaining the contents of a video that is being projected on the screen, pointing at a globe that stands on the desk. The user that is watching that scene decides to change from viewpoint to get a better look at the globe ...

How would you describe such a scene? How would you encode it? And how would you approach decoding and user interaction?

The solution lies in defining media objects and a suitable notion of composition of media objects.

media objects


  • media objects -- units of aural, visual or audiovisual content
  • composition -- to create compound media objects (audiovisual scene)
  • transport -- multiplex and synchronize data associated with media objects
  • interaction -- feedback from users' interaction with audiovisual scene
For 3D-scene description, MPEG-4 builds on concepts taken from VRML (Virtual Reality Modeling Language, discussed in chapter 7).

Composition, basically, amounts to building a scene graph, that is a tree-like structure that specifies the relationship between the various simple and compound media objects. Composition allows for placing media objects anywhere in a given coordinate system, applying transforms to change the appearance of a media object, applying streamed data to media objects, and modifying the users viewpoint.

composition


  • placing media objects anywhere in a given coordinate system
  • applying transforms to change the appearance of a media object
  • applying streamed data to media objects
  • modifying the users viewpoint

So, when we have a multimedia presentation or audiovisual scene, we need to get it accross some network and deliver it to the end-user, or as phrased in  [MPEG-4]:

transport


The data stream (Elementary Streams) that result from the coding process can be transmitted or stored separately and need to be composed so as to create the actual multimedia presentation at the receivers side.

At a system level, MPEG-4 offers the following functionalities to achieve this:

scenegraph


  • BIFS (Binary Format for Scenes) -- describes spatio-temporal arrangements of (media) objects in the scene
  • OD (Object Descriptor) -- defines the relationship between the elementary streams associated with an object
  • event routing -- to handle user interaction

...



3

In addition, MPEG-4 defines a set of functionalities For the delivery of streamed data, DMIF, which stands for

DMIF


Delivery Multimedia Integration Framework

that allows for transparent interaction with resources, irrespective of whether these are available from local storage, come from broadcast, or must be obtained from some remote site. Also transparency with respect to network type is supported. Quality of Service is only supoorted to the extent that it ispossible to indicate needs for bandwidth and transmission rate. It is however the responsability of the network provider to realize any of this.

...


(a) scene graph (b) sprites

4

authoring

What MPEG-4 offers may be summarized as follows

benefits


  • end-users -- interactive media accross all platforms and networks
  • providers -- transparent information for transport optimization
  • authors -- reusable content, protection and flexibility
In effect, although MPEG-4 is primarily concerned with efficient encoding and scalable transport and delivery, the object-based approach has also clear advantages from an authoring perspective.

One advantage is the possibility of reuse. For example, one and the same background can be reused for multiplepresentations or plays, so you could imagine that even an amateur game might be 'located' at the centre-court of Roland Garros or Wimbledon.

Another, perhaps not so obvious, advantage is that provisions have been made for

managing intellectual property

of media objects.

And finally, media objects may potentially be annotated with meta-information to facilitate information retrieval.

...



5

syntax

In addition to the binary formats, MPEG-4 also specifies a syntactical format, called XMT, which stands for eXtensible MPEG-4 Textual format.

XMT


  • XMT contains a subset of X3D
  • SMIL is mapped (incompletely) to XMT
when discussing RM3D which is of interest from a historic perspective, we will further establish what the relations between, respectively MPEG-4, SMIL and RM3D are, and in particular where there is disagreement, for example with respect to the timing model underlying animations and the temporal control of media objects.

...



6

example(s) -- structured audio

The Machine Listening Group of the MIT Media Lab is developing a suite of tools for structered audio, which means transmitting sound by describing it rather than compressing it. It is claimed that tools based on the MPEG-4 standard will be the future platform for computer music, audio for gaming, streaming Internet radio, and other multimedia applications. The structured audio project is part of a more encompassing research effort of the Music, Mind and Machine Group of the MIT Media Lab, which envisages a new future of audio technologies and interactive applications that will change the way music is conceived, created, transmitted and experienced,

(C) Æliens 18/6/2009

You may not copy or print any of this material without explicit permission of the author or the publisher. In case of other copyright issues, contact the author.