5 content annotation

Current technology does not allow us to extract information automatically from arbitrary media objects. In these cases, at least for the time being, we need to assist search by annotating content with what is commonly referred to as meta-information.

In this chapter, we will look at two more media types, in particular audio and video. Studying audio, we will learn how we may combine feature extraction and meta-information to define a data model that allows for search. Studying video, on the other hand, will indicate the complexity of devising a knowledge representation scheme that captures the content of video fragments.

Concluding this chapter, we will discuss an architecture for feature extraction for arbitrary media objects. As an example, look at the (simple) feature grammar below, specifying the structure of a hypothetical community.


  detector world;  finds the name of the world


  detector people;  checks name, eliminates institutes


  detector company;  looks if there are at least two persons


  
  atom str name;
  
  community: world people company;
  
  world: name;
  people: person*;
  
  person: name;

A community consists of people, and is a community only if it allows for the people to be in eachothers company.

A community has a name. The actual purpose of this grammar is to select the persons that belong to a particular community from the input, which consists of names of potential community members. Note that the grammar specifies three detectors. These detectors correspond to functions that are invoked when expanding the corresponding non-terminal in the grammar. An example of a detector function is the personDetector function partially specified below.

  int personDetector(tree *pt, list *tks ){
  ...
  q = query_query("kit=pl src=check.pl");
  
  while (t = next_token(tks)) {
        sprintf(buf,"person(%s)",t);
        query_eval(q,buf);
        if (query_result(q,0)) // put name(person) on tokenstream


                  putAtom(tks,"name",t);
        }
  ...
  }

The personDetector function checks for each token on the input tokenstream tks whether the token corresponds to the name of a person belonging to the community. The check is performed by an embedded logic component that contains the information needed to establish whether a person is a member of the community. Note that the query for a single token may result in adding multiple names to the token stream.

The companyDetector differs from the personDetector in that it needs to inspect the complete parse tree to see whether the (implicit) company predicate is satisfied.

When parsing succeeds and the company predicate is satisfied a given input may result in a sequence of updates of the underlying database, as illustrated below.


  V0 := newoid();
  V1 := newoid();
    community_world.insert(oid(V0),oid(V1));
      world_name.insert(oid(V1),"casa");
    community_people.insert(oid(V0),oid(V1));
  V2 := newoid();
      people_person.insert(oid(V1),oid(V2));
        person_name.insert(oid(V2),"alice");
      people_person.insert(oid(V1),oid(V2));
        person_name.insert(oid(V2),"sebastiaan");
      ...

Evidently, the updates correspond to assigning appropriate values to the attributes of a structured object, reflecting the properties of the given community.

The overall architecture of the ACOI framework is depicted in slide acoi. Taking a feature grammar specification, such as the simple community grammar, as a point of reference, we see that it is related to an actual feature detector (possibly containing an embedded logic component) that is invoked by the Feature detector Engine when an appropriate media object is presented for indexing.

questions

5. content annotation

(*) How can video information be made accessible? Discuss the requirements for supporting video queries.

concepts

What are the ingredients of an audio data model
What information must be stored to enable search for video content?
What is feature extraction? Indicate how feature extraction can be deployed for arbitrary media formats.

technology

What are the parameters for signal-based (audio) content?
Give an example of the representation of frame-dependent en frame-independent properties of a video fragment.
What are the elements of a query language for searching in video libraries?
Give an example (with explanation) of the use of VideoSQL.

[] readme preface 1 2 3 4 5 6 7 appendix checklist powerpoint resources director

eliens@cs.vu.nl

draft version 1 (16/5/2003)

5

content annotation

finds the name of the world

checks name, eliminates institutes

looks if there are at least two persons

// put name(person) on tokenstream

questions

5. content annotation

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('5.html#slide-'); } document.write('q-5'); if (!slidemode) document.write('.html');document.write('>');

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('5.html#slide-'); } document.write('q-5-c'); if (!slidemode) document.write('.html');document.write('>');

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('5.html#slide-'); } document.write('q-5-t'); if (!slidemode) document.write('.html');document.write('>');