topical media & game development
Animation in front of television news in ViP

1
merging video and 3D
In june 2003, I was approached by a theatre production company to advice on the use
of "VR in theatre".
As described in more detail in
section 9.3,
I explored what technology was available to realize such VR-augmented theatre.
These explorations resulted in the development of the ViP system,
that I once announced as follows:
www.virtualpoetry.tv
The ViP system enhances your party with innovative multimedia presentations.
It supports multiple webcams and
digital video cameras,
mixed with video and images,
enhanced by 3D animations and text,
in an integrated fashion.
For your party, we create a ViP presentation,
with your content and special effects, to entertain your audience.

In the course of time, I continued working on the system and,
although I do not use it for parties, but rather for enlivening my lectures,
it does include many of the features of a VJ system,
such as the
drumpad described in
3.2.
The major challenge, when I started its development,
was to find an effective way to map live video
from a low/medium resolution camera as textures onto 3D geometry.
I started with looking at the ARToolkit but I was at the time
not satisfied with its frame rate.
Then, after some first explorations, I discovered that mapping video on 3D
was a new (to some extent still experimental) built-in feature of the DirectX 9 SDK,
in the form of the VMR9 (video mixing renderer) filter.
the Video Mixing Renderer filter
The VMR filter is a compound class that handles connections,
mixing, compositing, as well as synchronization and presentation
in an integrated fashion.
But before discussing the VMR9 in more detail, let's look first at
how a single media stream is processed by the filter graph,
as depicted in the figure below.

2
Basically, the process consists of the phases of parsing, decoding
and rendering.
For each of these phases, dependent on respectively the source,
format and display requirements, a different filter may be used.
Synchronization can be either dtermined by the renderer,
by pulling new frames in, or by the parser, as in the case of live capture,
by pushing data on the stream, possibly causing the loss of data when decoding
cannot keep up with the incoming stream.
The VMR was originally introduced to allow for mixing multiple video streams,
and allowed for user-defined compositor and allocator/presenter
components.
|
|
(a) VMR filter | (b) multiple VMRs |

3
Before the VMR9, images could be obtained from the video stream by intercepting this stream and copying
frames to a texture surface.
The VMR9, however, renders the frames directly on Direct3D surfaces, with (obviously) less overhead.
Although the VMR9 supports multiple video pins,
for combining multiple video streams, it does not allow
for independent search or access to these streams.
To do this you must deploy multiple video mixing renderers
that are connected to a common allocator/presenter component, as depicted on the
right of the figure above (b).
When using the VMR9 with Direct3D, the rendering of 3D scenes is driven by the rate at which the video frames
are processed.
Lecture on digital dossier for Abramovic, in ViP

4
the ViP system
In developing the ViP system, I proceeded from the requirement
to project live video capture in 3D space.
As noted previously, this means that incoming video drives the
rendering of 3D scenes and that, hence,
capture speed determines the rendering frame rate.
I started with adapting the simple allocator/presenter example
from the DirectX 9 SDK, and developed a scene management system that
could handle incoming textures from the video stream.
The scene class interface, which allows for (one-time) initialization,
time-dependent compositing, restore device setting and rendering textures, looks as follows:
class scene {
public:
virtual int init(IDirect3DDevice9*); // initialize scene (once)
virtual int compose(float time); // compose (in the case of an animation)
virtual int restore(IDirect3DDevice9*); // restore device settings
virtual int render(IDirect3DDevice9* device, IDirect3DTexture9* texture);
protected:
...
};

The scene graph itself was constructed as a tree, using both arrays of (sub) scenes
as well as a dictionary for named scenes, which is traversed each time
a video texture comes in.
The requirements the scene management system had to satisfy are further
indicated in section 9.3.
Leaving further details aside, observe that for the simple case
of one incoming video stream, transferring the texture by parameter suffices.
Later on, I adapted the GamePlayer which uses multiple
video mixing renderes, and then the need arose to use a different
way of indexing and accessing the textures from the video stream(s).
So, since it is good practice in object-oriented software engineering
to suppress parameters by using object instance variables, the interface
for the scene class changed into:
class scene {
public:
virtual int load(); // initialize scene (once)
virtual int compose(); // compose (in the case of an animation)
virtual int restore(); // restore device settings
virtual int render(); // display the (sub) scene
protected:
...
};

Adopting the scene class as the unifying interface for all 3D objects
and compound scenes proved to be a convenient way to
control the complexity of the ViP application.
However, for manipulating the textures and allocating shader effects to scenes,
I needed a global data structure (dictionaries) to access these items
by name, whenever needed.
As a final remark, which is actually more concerned with the
software engineering of such systems that its functionality per se,
to be able to deal with the multiple variant libraries
that existed in the various releases of DirectX 9,
it was needed to develop the ViP library and its components
as a collection of DLLs, to avoid the name and linking clashes
that would otherwise occur.
|
|
installation | reality of TV news |

5
example(s) -- reality of TV news
The Reality of TV news project by Peter Frucht
uses an interesting mix of technology:
- live video capture from the device of an external USB2.0 TV card
- live audio capture from the soundcard (line in)
- display of live audio and video with java3D (had to be invented)
- autonomous 3D objects with a specified lifetime
- collision behaviour (had to be invented)
- changing of texture-, material- and sound characteristics at runtime
- dual-screen display with each screen rotated toward the other by 45 degrees about the Y-axis
- 3D sound
In the project, as phrased by Peter Frucht,
the permanent flow of the alternating adverts and news reports are captured live and displayed in a 3D virtual-reality installation. The currently captured audio and video data is displayed on the surface of 3D shapes as short loops. The stream enters the 3D universe piece by piece (like water drops), in this way it is getting displaced in time and space - news reports and advertising will be displayed partly in the same time. By colliding to each other the 3D shapes exchange video material. This re-editing mixes the short loops together, for instance some pieces of advertising will appear while the newsreader speaks.
The software was developed by
Martin Bouma, Anthony Augustin and Peter Frucht himself,
with jdk 1.5, java3d 1.31, Java Media Framework 2.1.1e.
The primary technological background of the artist, Peter Frucht,
was the book CodeArt, [CodeArt], by his former professor from the
Media Art School in Cologne, Germany.
The book is unfortunately only available in German,
and should be translated in English!
research directions -- augmented reality
In the theatre production that motivated the development of the ViP system,
the idea was to have wearable LCD-projection glasses, with a head-mounted low resolution camera.
This setup is common in augmented reality applications,
where for example a historic site is enriched with graphics and text,
laid on top of the (video rendered) view of the site.
Since realtime image analysis is generally not feasible,
either positioning and orientation information must be used,
or simplified markers indicating the significant spots in the scene,
to determine what information to use as an overlay and how it should be displayed.
The ARToolkit is an advanced, freely available,
toolkit, that uses fast marker recognition to determine the viewpoint
of a spectator.
The information that is returned on the recognition of a marker
includes both position and orientation,
which may be used by the application to draw the overlay graphics
in accordance with the spectator's viewpoint.
Augnented reality is likely to become a hot thing.
In april 2005 it was featured at
BBC World, with a tour through Basel.
(C) Æliens
04/09/2009
You may not copy or print any of this material without explicit permission of the author or the publisher.
In case of other copyright issues, contact the author.