introduction multimedia
[] readme preface 1 2 3 4 5 6 7 appendix checklist powerpoint resources director

talk show tell print

5

content annotation




Current technology does not allow us to extract information automatically from arbitrary media objects. In these cases, at least for the time being, we need to assist search by annotating content with what is commonly referred to as meta-information.

In this chapter, we will look at two more media types, in particular audio and video. Studying audio, we will learn how we may combine feature extraction and meta-information to define a data model that allows for search. Studying video, on the other hand, will indicate the complexity of devising a knowledge representation scheme that captures the content of video fragments.

Concluding this chapter, we will discuss an architecture for feature extraction for arbitrary media objects. As an example, look at the (simple) feature grammar below, specifying the structure of a hypothetical community.


  detector world; 
finds the name of the world

detector people;
checks name, eliminates institutes

detector company;
looks if there are at least two persons

atom str name; community: world people company; world: name; people: person*; person: name;
A community consists of people, and is a community only if it allows for the people to be in eachothers company.

A community has a name. The actual purpose of this grammar is to select the persons that belong to a particular community from the input, which consists of names of potential community members. Note that the grammar specifies three detectors. These detectors correspond to functions that are invoked when expanding the corresponding non-terminal in the grammar. An example of a detector function is the personDetector function partially specified below.

  int personDetector(tree *pt, list *tks ){
  ...
  q = query_query("kit=pl src=check.pl");
  
  while (t = next_token(tks)) {
        sprintf(buf,"person(%s)",t);
        query_eval(q,buf);
        if (query_result(q,0)) 
// put name(person) on tokenstream

putAtom(tks,"name",t); } ... }
The personDetector function checks for each token on the input tokenstream tks whether the token corresponds to the name of a person belonging to the community. The check is performed by an embedded logic component that contains the information needed to establish whether a person is a member of the community. Note that the query for a single token may result in adding multiple names to the token stream.

The companyDetector differs from the personDetector in that it needs to inspect the complete parse tree to see whether the (implicit) company predicate is satisfied.

When parsing succeeds and the company predicate is satisfied a given input may result in a sequence of updates of the underlying database, as illustrated below.


  V0 := newoid();
  V1 := newoid();
    community_world.insert(oid(V0),oid(V1));
      world_name.insert(oid(V1),"casa");
    community_people.insert(oid(V0),oid(V1));
  V2 := newoid();
      people_person.insert(oid(V1),oid(V2));
        person_name.insert(oid(V2),"alice");
      people_person.insert(oid(V1),oid(V2));
        person_name.insert(oid(V2),"sebastiaan");
      ...
  
Evidently, the updates correspond to assigning appropriate values to the attributes of a structured object, reflecting the properties of the given community.

The overall architecture of the ACOI framework is depicted in slide acoi. Taking a feature grammar specification, such as the simple community grammar, as a point of reference, we see that it is related to an actual feature detector (possibly containing an embedded logic component) that is invoked by the Feature detector Engine when an appropriate media object is presented for indexing.


questions

5. content annotation

concepts


technology




[] readme preface 1 2 3 4 5 6 7 appendix checklist powerpoint resources director
eliens@cs.vu.nl

draft version 1 (16/5/2003)