6

content annotation

video annotation requires a logical approach to story telling

content annotation

learning objectives

After reading this chapter you should be able to explain the difference between content and meta information, to mention relevant content parameters for audio, to characterize the requirements for video libraries, to define an annotation logic for video, and to discuss feature extraction in samples of musical material.

Current technology does not allow us to extract information automatically from arbitrary media objects. In these cases, at least for the time being, we need to assist search by annotating content with what is commonly referred to as meta-information.

In this chapter, we will look at two more media types, in particular audio and video. Studying audio, we will learn how we may combine feature extraction and meta-information to define a data model that allows for search. Studying video, on the other hand, will indicate the complexity of devising a knowledge representation scheme that captures the content of video fragments.

Concluding this chapter, we will discuss an architecture for feature extraction for arbitrary media objects.

...



audio databases


audio data model


example



   singers -- (Opera,Role,Person)
   score -- ...
   transcript -- ...
  

signal-based content


windowing


feature extraction


...



...



...



video annotation


video content



  video v, frame f 
  f has associated objects and activities 
  objects and activities have properties
  

property


  property: name = value 
  

object schema


   (fd,fi) -- frame-dependent and frame-independent properties 
  

object instance: (oid,os,ip)

example


frameobjectsframe-dependent properties
1Janehas(briefcase), at(path)
-housedoor(closed)
-briefcase
2Janehas(briefcase), at(door)
-Dennisat(door)
-housedoor(open)
-briefcase

frame-independent properties


objectframe-independent propertiesvalue
Janeage35
height170cm
houseaddress...
colorbrown
briefcasecolorblack
size40 x 31

activity

example


   { giver : Person, receiver : Person, item : Object } 
   giver = Jane, receiver = Dennis, object = briefcase 
  

...



video libraries



  which videos are in the library 
  what constitutes the content of each video
  what is the location of a particular video
  

query language for video libraries


VideoSQL



  SELECT -- v:[s,e] 
  FROM -- video:<source><V> 
  WHERE -- term IN funcall 
  

example



  SELECT  vid:[s,e]
  FROM video:VidLib
  WHERE (vid,s,e) IN VideoWithObject(Dennis) AND
  	object IN ObjectsInVideo(vid,s,e) AND
  	object != Dennis AND
  	typeof(object) = Person
  

...



...



To improve library access, the Informedia Digital Video Library uses automatic processing to derive descriptors for video. A new extension to the video processing extracts geographic references from these descriptors.

The operational library interface shows the geographic entities addressed in a story, highlighting the regions discussed in the video through a map display synchronized with the video display.

The map can also serve as a query mechanism, allowing users to search the terabyte library for stories taking place in a selected area of interest.

questions


More recently, it has been recognized that the process of spatialization -- where a spatial map-like structure is applied to data where no inherent or obvious one does exist -- can provide an interpretable structure to other types of data.

atlas of cyberspace


We present a wide range of spatializations that have employed a variety of graphical techniques and visual metaphors so as to provide striking and powerful images that extend from two dimension 'maps' to three-dimensional immersive landscapes.

...



...



feature grammar



  
  detector song; ## to get the filename
  detector lyrics; ## extracts lyrics
  detector melody; ## extracts melody
  detector check;  ## to walk the tree
  
  atom str name;
  atom str text;
  atom str note;  
  
  midi: song;
  
  song: file lyrics melody check;
  
  file: name;
  
  lyrics: text*;
  melody: note*;
  
  


  event('twinkle',2,time=384, note_on:[chan=2,pitch=72,vol=111]).
  event('twinkle',2,time=768, note_off:[chan=2,pitch=72,vol=100]).
  

melody detector



  int melodyDetector(tree *pt, list *tks ){
  char buf[1024]; char* _result;
  void* q = _query;
  int idq = 0; 
  
    idq = query_eval(q,"X:melody(X)");
    while ((_result = query_result(q,idq)) ) {
           putAtom(tks,"note",_result);
           }
    return SUCCESS;
  } 
  

...



...



prediction techniques


definition(s)


guided tour(s)


...



6. content annotation

concepts


technology


projects & further reading

As a project, think of implementing musical similarity matching, or developing an application retrieving video fragments using a simple annotation logic.

You may further explore the construction of media repositories, and finding a balance between automatic indexing, content search and meta information.

For further reading I advice you to google recent research on video analysis, and the online material on search engines.

the artwork

  1. works from  [Design]
  2. faces -- from www.alterfin.org, an interesting site with many surprising interactive toys in flash, javascript and html.
  3. mouth -- Annika Karlson Rixon, entitled A slight Acquaintance, taken from a theme article about the body in art and science, the Volkskrant, 24/03/05.
  4. story -- page from the comic book version of City of Glass,  [Glass], drawn in an almost tradional style.
  5. story -- frame from  [Glass].
  6. story -- frame from  [Glass].
  7. story -- frame from  [Glass].
  8. white on white -- typographical joke.
  9. modern art -- city of light (1968-69), Mario Merz, taken from  [Modern].
  10. modern art -- Marocco (1972), Krijn Griezen, taken from  [Modern].
  11. modern art -- Indestructable Object (1958), Man Ray, Blue, Green, Red I (1964-65), Ellsworth Kelly, Great American Nude (1960), T. Wesselman, taken from  [Modern].
  12. signs -- sports,  [Signs], p. 272, 273.