content annotation
learning objectives
After reading this chapter you should be able to
explain the difference between content and meta information,
to mention relevant content parameters for audio,
to characterize the requirements for video libraries,
to define an annotation logic for video,
and to discuss feature extraction in samples of musical material.

Current technology does not allow us to extract information
automatically from arbitrary media objects.
In these cases, at least for the time being,
we need to assist search by annotating content
with what is commonly referred to as meta-information.
In this chapter, we will look at two more media types,
in particular audio and video.
Studying audio, we will learn how we may combine
feature extraction and meta-information to define a
data model that allows for search.
Studying video, on the other hand,
will indicate the complexity of devising a
knowledge representation scheme that captures
the content of video fragments.
Concluding this chapter, we will discuss an architecture
for feature extraction for arbitrary media objects.
audio databases
- audio signals -- compression, discrete representation
- musical patterns -- similarity-based retrieval

audio data model
- meta-data -- describing content
- features -- using feature extraction

example
singers -- (Opera,Role,Person)
score -- ...
transcript -- ...

signal-based content
- audio data -- over time x
- wave -- period T, frequency
- velocity -- , with wavelength
- amplitude -- a
windowing
- break signal up in small windows of time

feature extraction
- intensity -- watts/
- loudness -- in decibels
- pitch -- from frequency and amplitude
- brightness -- amount of distortion

video annotation
- what are the interesting aspects?
- how do we represent this information?
video content
video v, frame f
f has associated objects and activities
objects and activities have properties

property
property: name = value
object schema
(fd,fi) -- frame-dependent and frame-independent properties
object instance: (oid,os,ip)

example
frame | objects | frame-dependent properties |
1 | Jane | has(briefcase), at(path) |
- | house | door(closed) |
- | briefcase | |
2 | Jane | has(briefcase), at(door) |
- | Dennis | at(door) |
- | house | door(open) |
- | briefcase | |

frame-independent properties
object | frame-independent properties | value |
Jane | age | 35 |
| height | 170cm |
house | address | ... |
| color | brown |
briefcase | color | black |
| size | 40 x 31 |

activity
- activity name -- id
- statements -- role = v
example
{ giver : Person, receiver : Person, item : Object }
giver = Jane, receiver = Dennis, object = briefcase

video libraries
which videos are in the library
what constitutes the content of each video
what is the location of a particular video

query language for video libraries
- segment retrievals -- exchange of briefcase
- object retrievals -- all people in v:[s,e]
- activity retrieval -- all activities in v:[s,e]
- property-based -- find all videos with object oid

VideoSQL
SELECT -- v:[s,e]
FROM -- video:<source><V>
WHERE -- term IN funcall

example
SELECT vid:[s,e]
FROM video:VidLib
WHERE (vid,s,e) IN VideoWithObject(Dennis) AND
object IN ObjectsInVideo(vid,s,e) AND
object != Dennis AND
typeof(object) = Person

To improve library access, the Informedia
Digital Video Library uses automatic processing to derive
descriptors for video.
A new extension to the video processing extracts geographic
references from these descriptors.
The operational library interface shows the geographic entities addressed in a story, highlighting the regions discussed in the video through a map display synchronized with the video display.

The map can also serve as a query mechanism, allowing users to search the terabyte library for stories taking place in a selected area of interest.

questions
- what -- content-related
- when -- position on time-continuum
- where -- geographic location

More recently, it has been recognized that
the process of spatialization -- where a spatial
map-like structure is applied to data where no inherent
or obvious one does exist -- can provide an interpretable
structure to other types of data.

atlas of cyberspace
We present a wide range of spatializations that have
employed a variety of graphical techniques and visual metaphors
so as to provide striking and powerful images that extend
from two dimension 'maps' to three-dimensional immersive landscapes.

feature grammar
detector song; ## to get the filename
detector lyrics; ## extracts lyrics
detector melody; ## extracts melody
detector check; ## to walk the tree
atom str name;
atom str text;
atom str note;
midi: song;
song: file lyrics melody check;
file: name;
lyrics: text*;
melody: note*;
event('twinkle',2,time=384, note_on:[chan=2,pitch=72,vol=111]).
event('twinkle',2,time=768, note_off:[chan=2,pitch=72,vol=100]).
melody detector
int melodyDetector(tree *pt, list *tks ){
char buf[1024]; char* _result;
void* q = _query;
int idq = 0;
idq = query_eval(q,"X:melody(X)");
while ((_result = query_result(q,idq)) ) {
putAtom(tks,"note",_result);
}
return SUCCESS;
}
prediction techniques
- social-based -- dependent on (group) rating of item(s)
- information-based -- dependent on features of item(s)
- hybrid methods -- combining predictors

definition(s)
- rating -- a value representing a user's interest
- recommendation -- item(s) that might be of interest to the user
- regret -- a function to measure the accuracy of recommendations

guided tour(s)
- automated (viewpoint) navigation in virtual space,
- an animation explaining, for example, the construction of an artwork, or
- the (narrative) presentation of a sequence of concept nodes.

concepts

technology

projects & further reading
As a project, think of implementing musical similarity matching,
or developing an application retrieving video fragments
using a simple annotation logic.
You may further explore the
construction of media repositories, and finding a
balance between automatic indexing, content search and
meta information.
For further reading I advice you to google
recent research on video analysis,
and the online material
on search engines.

- works from [Design]
- faces -- from www.alterfin.org, an interesting site with many surprising interactive toys in flash, javascript and html.
- mouth -- Annika Karlson Rixon, entitled A slight Acquaintance, taken from a theme article about the body in art and science, the Volkskrant, 24/03/05.
- story -- page from the comic book version of City of Glass, [Glass], drawn in an almost tradional style.
- story -- frame from [Glass].
- story -- frame from [Glass].
- story -- frame from [Glass].
- white on white -- typographical joke.
- modern art -- city of light (1968-69), Mario Merz, taken from [Modern].
- modern art -- Marocco (1972), Krijn Griezen, taken from [Modern].
- modern art -- Indestructable Object (1958), Man Ray, Blue, Green, Red I (1964-65), Ellsworth Kelly, Great American Nude (1960), T. Wesselman, taken from [Modern].
- signs -- sports, [Signs], p. 272, 273.
