<s5>Æliens</s5> -- slide(s)

learning objectives

After reading this chapter you should be able to explain the difference between content and meta information, to mention relevant content parameters for audio, to characterize the requirements for video libraries, to define an annotation logic for video, and to discuss feature extraction in samples of musical material.

Current technology does not allow us to extract information automatically from arbitrary media objects. In these cases, at least for the time being, we need to assist search by annotating content with what is commonly referred to as meta-information.

In this chapter, we will look at two more media types, in particular audio and video. Studying audio, we will learn how we may combine feature extraction and meta-information to define a data model that allows for search. Studying video, on the other hand, will indicate the complexity of devising a knowledge representation scheme that captures the content of video fragments.

Concluding this chapter, we will discuss an architecture for feature extraction for arbitrary media objects.

...

audio databases

audio signals -- compression, discrete representation
musical patterns -- similarity-based retrieval

audio data model

meta-data -- describing content
features -- using feature extraction

example


   singers -- (Opera,Role,Person)
   score -- ...
   transcript -- ...

signal-based content

audio data -- $%F(x)$ over time x
wave -- period T, frequency $f = 1/T$
velocity -- $v = w/T = w * f$ , with $w$ wavelength
amplitude -- a

windowing

break signal up in small windows of time

feature extraction

intensity -- watts/ $m^2$
loudness -- in decibels
pitch -- from frequency and amplitude
brightness -- amount of distortion

...

video annotation

what are the interesting aspects?
how do we represent this information?

video content


  video v, frame f 
  f has associated objects and activities 
  objects and activities have properties

property


  property: name = value

object schema


   (fd,fi) -- frame-dependent and frame-independent properties

object instance: (oid,os,ip)

object-id -- oid
object-schema -- os = (fd,fi)
set of statements -- ip: name = v and name = v IN f

example

frame	objects	frame-dependent properties
1	Jane	has(briefcase), at(path)
-	house	door(closed)
-	briefcase
2	Jane	has(briefcase), at(door)
-	Dennis	at(door)
-	house	door(open)
-	briefcase

frame-independent properties

object	frame-independent properties	value
Jane	age	35
	height	170cm
house	address	...
	color	brown
briefcase	color	black
	size	40 x 31

activity

activity name -- id
statements -- role = v

example


   { giver : Person, receiver : Person, item : Object } 
   giver = Jane, receiver = Dennis, object = briefcase

...

video libraries


  which videos are in the library 
  what constitutes the content of each video
  what is the location of a particular video

query language for video libraries

segment retrievals -- exchange of briefcase
object retrievals -- all people in v:[s,e]
activity retrieval -- all activities in v:[s,e]
property-based -- find all videos with object oid

VideoSQL


  SELECT -- v:[s,e] 
  FROM -- video:<source><V> 
  WHERE -- term IN funcall

example


  SELECT  vid:[s,e]
  FROM video:VidLib
  WHERE (vid,s,e) IN VideoWithObject(Dennis) AND
  	object IN ObjectsInVideo(vid,s,e) AND
  	object != Dennis AND
  	typeof(object) = Person

...

To improve library access, the Informedia Digital Video Library uses automatic processing to derive descriptors for video. A new extension to the video processing extracts geographic references from these descriptors.

The operational library interface shows the geographic entities addressed in a story, highlighting the regions discussed in the video through a map display synchronized with the video display.

The map can also serve as a query mechanism, allowing users to search the terabyte library for stories taking place in a selected area of interest.

questions

what -- content-related
when -- position on time-continuum
where -- geographic location

More recently, it has been recognized that the process of spatialization -- where a spatial map-like structure is applied to data where no inherent or obvious one does exist -- can provide an interpretable structure to other types of data.

atlas of cyberspace

We present a wide range of spatializations that have employed a variety of graphical techniques and visual metaphors so as to provide striking and powerful images that extend from two dimension 'maps' to three-dimensional immersive landscapes.

...

feature grammar


  
  detector song; ## to get the filename
  detector lyrics; ## extracts lyrics
  detector melody; ## extracts melody
  detector check;  ## to walk the tree
  
  atom str name;
  atom str text;
  atom str note;  
  
  midi: song;
  
  song: file lyrics melody check;
  
  file: name;
  
  lyrics: text*;
  melody: note*;


  event('twinkle',2,time=384, note_on:[chan=2,pitch=72,vol=111]).
  event('twinkle',2,time=768, note_off:[chan=2,pitch=72,vol=100]).

melody detector


  int melodyDetector(tree *pt, list *tks ){
  char buf[1024]; char* _result;
  void* q = _query;
  int idq = 0; 
  
    idq = query_eval(q,"X:melody(X)");
    while ((_result = query_result(q,idq)) ) {
           putAtom(tks,"note",_result);
           }
    return SUCCESS;
  }

...

prediction techniques

social-based -- dependent on (group) rating of item(s)
information-based -- dependent on features of item(s)
hybrid methods -- combining predictors

definition(s)

rating -- a value representing a user's interest
recommendation -- item(s) that might be of interest to the user
regret -- a function to measure the accuracy of recommendations

guided tour(s)

automated (viewpoint) navigation in virtual space,
an animation explaining, for example, the construction of an artwork, or
the (narrative) presentation of a sequence of concept nodes.

...

6. content annotation

(*) How can video information be made accessible? Discuss the requirements for supporting video queries.

concepts

technology

projects & further reading

As a project, think of implementing musical similarity matching, or developing an application retrieving video fragments using a simple annotation logic.

You may further explore the construction of media repositories, and finding a balance between automatic indexing, content search and meta information.

For further reading I advice you to google recent research on video analysis, and the online material on search engines.

the artwork

works from [Design]
faces -- from www.alterfin.org, an interesting site with many surprising interactive toys in flash, javascript and html.
mouth -- Annika Karlson Rixon, entitled A slight Acquaintance, taken from a theme article about the body in art and science, the Volkskrant, 24/03/05.
story -- page from the comic book version of City of Glass, [Glass], drawn in an almost tradional style.
story -- frame from [Glass].
story -- frame from [Glass].
story -- frame from [Glass].
white on white -- typographical joke.
modern art -- city of light (1968-69), Mario Merz, taken from [Modern].
modern art -- Marocco (1972), Krijn Griezen, taken from [Modern].
modern art -- Indestructable Object (1958), Man Ray, Blue, Green, Red I (1964-65), Ellsworth Kelly, Great American Nude (1960), T. Wesselman, taken from [Modern].
signs -- sports, [Signs], p. 272, 273.

Æliens

2009

content annotation

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('@tmp-6.html#slide-'); } document.write('r-5-2-quest'); if (!slidemode) document.write('.html');document.write('>');

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('@tmp-6.html#slide-'); } document.write('6-4-decision-techn'); if (!slidemode) document.write('.html');document.write('>');

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('@tmp-6.html#slide-'); } document.write('6-4-decision-expla'); if (!slidemode) document.write('.html');document.write('>');

projects & further reading