learning objectives

After reading this chapter you should be able to describe scenarios for information retrieval, to explain how content analysis for images can be done, to characterize similarity metrics, to define the notions of recall and precision, and to give an example of frequence tables, as used in text search.

Searching for information on the web is cumbersome. Given our experiences today, we may not even want to think about searching for multimedia information on the (multimedia) web.

Nevertheless, in this chapter we will briefly sketch one of the possible scenarios indicating the need for multimedia search. In fact, once we have the ability to search for multimedia information, many scenarios could be thought of.

As a start, we will look at two media types, images and documents. We will study search for images, because it teaches us important lessons about content analysis of media objects and what we may consider as being similar. Perhaps surprisingly, we will study text documents because, due to our familiarity with this media type, text documents allow us to determine what we may understand by effective search.

...

Amsterdam Drugport

Amsterdam is an international centre of traffic and trade. It is renowned for its culture and liberal attitude, and attracts tourists from various ages, including young tourists that are attracted by the availability of soft drugs. Soft drugs may be obtained at so-called coffeeshops, and the possession of limited amounts of soft drugs is being tolerated by the authories.

The European Community, however, has expressed their concern that Amsterdam is the centre of an international criminal drug operation. Combining national and international police units, a team is formed to start an exhaustive investigation, under the code name Amsterdam Drugport.

information

video surveillance -- monitoring
telephone wiretaps -- audio recording
photography -- archive
documents -- investigations
transactions -- structured data
geographic information -- locations, routes

media types

images -- photos
video -- surveillance
audio -- interviews, phone tracks
documents -- forensic, reports
handwriting -- notes
structured data -- transactions

retrieval

image query -- all images with this person
audio query -- identity of speaker
text query -- all transactions with BANK Inc.
video query -- all segments with victim
complex queries -- convicted murderers with BANK transactions
heterogeneous queries -- photograph + murderer + transaction
complex heterogeneous queries -- in contact with + murderer + transaction

...

information retrieval

Information retrieval, according to [IR], deals with the representation, storage, organisation of, and access to information items.

To see what is involved, imagine that we have a (user) query like:

find me the pages containg information on ...

information retrieval models

boolean or set-theoretic models
vector or algebraic models
probabilistic models

vector models

attribute term weighting scheme improves performance
partial matching strategy allows retrieval of approximate material
metric distance allows for sorting according to degree of similarity

image query

obtaining descriptive information
establishing similarity

content-based description

objects in image
shape descriptor -- shape/region of object
property description -- cells in image

shape

bounding box -- (XLB,XUB,YLB,YUB)

property

property -- name=value

example


  shape descriptor: XLB=10; XUB=60; YLB=3; YUB=50   (rectangle)  
  property descriptor: pixel(14,7): R=5; G=1; B=3

definitions

image grid: $(m * n)$ cells of equal size
cell property: (Name, Value, Method)

example


  property: (bwcolor,{b,w},bwalgo)

...

similarity-based retrieval

How do we determine whether the content of a segment (of a segmented image) is similar to another image (or set of images)?

solutions

metric approach -- distance between two image objects
transformation approach -- relative to specification

metric approach

distance $d:X->[0,1]$ is distance measure if:


           d(x,y) = d(y,x)
  	 d(x,y)  $<=$  d(x,z) + d(z,y)
  	 d(x,x) = 0

pixel properties

objects with pixel properties $p_1,...,p_n$
pixels: $(x,y,v1,...,v_n)$
object contains w x h (n+2)-tuples

complexity

a set of points in k-dimensional space for k = n + 2

feature extraction

maps object into s-dimensional space

...

transformation approach

Given two objects o1 and o2, the level of dissimilarity is proportional to the (minimum) cost of transforming object o1 into object o2 or vice versa

transformation operators


    $to_1,...,to_r$  -- translation, rotation, scaling

cost

$cost(TS) = %S_{i=1}^{r} cost(to_{i})$

distance

$d(o,o') = min { cost(TS) | TS in TSeq(o,o') }$

advantages

user-defined similarity -- choice of transformation operators
user-defined cost-function

operations


   rotate(image-id,dir,angle)
   segment(image-id, predicate)
   edit(image-id, edit-op)

...

image repository

storage -- unsegmented images
description -- limited set of features
index -- feature-based index
retrieval -- distance between feature vectors

mission

Our goal is to study aspects of the deployment and architecture of virtual environments as an interface to (intelligent) multimedia information systems ...

...

query

document database + string matching

problems

synonymy -- topic T does not occur literally in document D
polysemy -- some words may have many meanings

...

effective search

precision -- how many answers are correct
recall -- how many of the right documents are returned

precision and recall


  precision = ( returned and relevant ) / returned  
  recall = ( returned and relevant ) / relevant

anomalies

return all documents: perfect recall, low precision
return 'nothing': 'perfect' precision, low recall

example

term/document	d0	d1	d2
snacks	1	0	0
drinks	1	0	3
rock-roll	0	1	1

complextity

compare term frequencies per document -- O(M*N)

reduction

stop list -- irrelevant words
word stems -- reduce different words to relevant part

...

user-oriented measures

coverage ratio -- fraction of known documents
novelty ratio -- fraction of new (relevant) documents
relative recall -- fraction of expected documents
recall effort -- fraction of examined documents

...


(a) context	(b) self-reflection

Aesthetics

intentions -- motives of the artist
expression -- where form takes over
representation -- the relation of art to reality

...

5. information retrieval

(*) What is meant by the complementarity of authoring and retrieval? Sketch a possible scenario of (multimedia) information retrieval and indicate how this may be implemented. Discuss the issues that arise in accessing multimedia information and how content annotation may be deployed.

concepts

technology

projects & further reading

As a project, you may implement simple image analysis algorithms that, for example, extract a color histogram, or detect the presence of a horizon-like edge.

You may further explore scenarios for information retrieval in the cultural heritage domain. and compare this with other applications of multimedia information retrieval, for example monitoring in hospitals.

For further reading I suggest to make yourself familiar with common techniques in information retrieval as described in [IR], and perhaps devote some time to studying image analisis, [Image].

the artwork

artworks -- ..., Miro, Dali, photographed from Kunstsammlung Nordrhein-Westfalen, see artwork 2.
left Miro from [Kunst], right: Karel Appel
match of the day (1) -- Geert Mul
match of the day (2) -- Geert Mul
match of the day (3) -- Geert Mul
mario ware -- taken from gammo/veronica.
baten kaitos -- eternal ways and the lost ocean, taken from gammo/veronica.
idem.
PANORAMA -- screenshots from field test.
signs -- people, [Signs], p. 252, 253.

6 content annotation

video annotation requires a logical approach to story telling

content annotation

audio

video

feature extraction

expert recommendation(s)

learning objectives

After reading this chapter you should be able to explain the difference between content and meta information, to mention relevant content parameters for audio, to characterize the requirements for video libraries, to define an annotation logic for video, and to discuss feature extraction in samples of musical material.

Current technology does not allow us to extract information automatically from arbitrary media objects. In these cases, at least for the time being, we need to assist search by annotating content with what is commonly referred to as meta-information.

In this chapter, we will look at two more media types, in particular audio and video. Studying audio, we will learn how we may combine feature extraction and meta-information to define a data model that allows for search. Studying video, on the other hand, will indicate the complexity of devising a knowledge representation scheme that captures the content of video fragments.

Concluding this chapter, we will discuss an architecture for feature extraction for arbitrary media objects.

...

audio databases

audio signals -- compression, discrete representation
musical patterns -- similarity-based retrieval

audio data model

meta-data -- describing content
features -- using feature extraction

example


   singers -- (Opera,Role,Person)
   score -- ...
   transcript -- ...

signal-based content

audio data -- $%F(x)$ over time x
wave -- period T, frequency $f = 1/T$
velocity -- $v = w/T = w * f$ , with $w$ wavelength
amplitude -- a

windowing

break signal up in small windows of time

feature extraction

intensity -- watts/ $m^2$
loudness -- in decibels
pitch -- from frequency and amplitude
brightness -- amount of distortion

...

video annotation

what are the interesting aspects?
how do we represent this information?

video content


  video v, frame f 
  f has associated objects and activities 
  objects and activities have properties

property


  property: name = value

object schema


   (fd,fi) -- frame-dependent and frame-independent properties

object instance: (oid,os,ip)

object-id -- oid
object-schema -- os = (fd,fi)
set of statements -- ip: name = v and name = v IN f

example

frame	objects	frame-dependent properties
1	Jane	has(briefcase), at(path)
-	house	door(closed)
-	briefcase
2	Jane	has(briefcase), at(door)
-	Dennis	at(door)
-	house	door(open)
-	briefcase

frame-independent properties

object	frame-independent properties	value
Jane	age	35
	height	170cm
house	address	...
	color	brown
briefcase	color	black
	size	40 x 31

activity

activity name -- id
statements -- role = v

example


   { giver : Person, receiver : Person, item : Object } 
   giver = Jane, receiver = Dennis, object = briefcase

...

video libraries


  which videos are in the library 
  what constitutes the content of each video
  what is the location of a particular video

query language for video libraries

segment retrievals -- exchange of briefcase
object retrievals -- all people in v:[s,e]
activity retrieval -- all activities in v:[s,e]
property-based -- find all videos with object oid

VideoSQL


  SELECT -- v:[s,e] 
  FROM -- video:<source><V> 
  WHERE -- term IN funcall

example


  SELECT  vid:[s,e]
  FROM video:VidLib
  WHERE (vid,s,e) IN VideoWithObject(Dennis) AND
  	object IN ObjectsInVideo(vid,s,e) AND
  	object != Dennis AND
  	typeof(object) = Person

...

To improve library access, the Informedia Digital Video Library uses automatic processing to derive descriptors for video. A new extension to the video processing extracts geographic references from these descriptors.

The operational library interface shows the geographic entities addressed in a story, highlighting the regions discussed in the video through a map display synchronized with the video display.

The map can also serve as a query mechanism, allowing users to search the terabyte library for stories taking place in a selected area of interest.

questions

what -- content-related
when -- position on time-continuum
where -- geographic location

More recently, it has been recognized that the process of spatialization -- where a spatial map-like structure is applied to data where no inherent or obvious one does exist -- can provide an interpretable structure to other types of data.

atlas of cyberspace

We present a wide range of spatializations that have employed a variety of graphical techniques and visual metaphors so as to provide striking and powerful images that extend from two dimension 'maps' to three-dimensional immersive landscapes.

...

feature grammar


  
  detector song; ## to get the filename
  detector lyrics; ## extracts lyrics
  detector melody; ## extracts melody
  detector check;  ## to walk the tree
  
  atom str name;
  atom str text;
  atom str note;  
  
  midi: song;
  
  song: file lyrics melody check;
  
  file: name;
  
  lyrics: text*;
  melody: note*;


  event('twinkle',2,time=384, note_on:[chan=2,pitch=72,vol=111]).
  event('twinkle',2,time=768, note_off:[chan=2,pitch=72,vol=100]).

melody detector


  int melodyDetector(tree *pt, list *tks ){
  char buf[1024]; char* _result;
  void* q = _query;
  int idq = 0; 
  
    idq = query_eval(q,"X:melody(X)");
    while ((_result = query_result(q,idq)) ) {
           putAtom(tks,"note",_result);
           }
    return SUCCESS;
  }

...

prediction techniques

social-based -- dependent on (group) rating of item(s)
information-based -- dependent on features of item(s)
hybrid methods -- combining predictors

definition(s)

rating -- a value representing a user's interest
recommendation -- item(s) that might be of interest to the user
regret -- a function to measure the accuracy of recommendations

guided tour(s)

automated (viewpoint) navigation in virtual space,
an animation explaining, for example, the construction of an artwork, or
the (narrative) presentation of a sequence of concept nodes.

...

6. content annotation

(*) How can video information be made accessible? Discuss the requirements for supporting video queries.

concepts

technology

projects & further reading

As a project, think of implementing musical similarity matching, or developing an application retrieving video fragments using a simple annotation logic.

You may further explore the construction of media repositories, and finding a balance between automatic indexing, content search and meta information.

For further reading I advice you to google recent research on video analysis, and the online material on search engines.

the artwork

works from [Design]
faces -- from www.alterfin.org, an interesting site with many surprising interactive toys in flash, javascript and html.
mouth -- Annika Karlson Rixon, entitled A slight Acquaintance, taken from a theme article about the body in art and science, the Volkskrant, 24/03/05.
story -- page from the comic book version of City of Glass, [Glass], drawn in an almost tradional style.
story -- frame from [Glass].
story -- frame from [Glass].
story -- frame from [Glass].
white on white -- typographical joke.
modern art -- city of light (1968-69), Mario Merz, taken from [Modern].
modern art -- Marocco (1972), Krijn Griezen, taken from [Modern].
modern art -- Indestructable Object (1958), Man Ray, Blue, Green, Red I (1964-65), Ellsworth Kelly, Great American Nude (1960), T. Wesselman, taken from [Modern].
signs -- sports, [Signs], p. 272, 273.

7 information system architecture

effective retrieval requires visual interfaces

information system architecture

architectural issues

media abstractions

networked multimedia

living in a virtual economy

learning objectives

After reading this chapter you should be able to dicuss the considerations that play a role in developing a multimedia information system, characterize an abstract multimedia data format, give examples of multimedia content queries, define the notion of virtual resources, and discuss the requirements for networked virtual environments.

From a system development perspective, a multimedia information system may be considered as a multimedia database, providing storage and retrieval facilities for media objects. Yet, rather than a solution this presents us with a problem, since there are many options to provide such storage facilities and equally many to support retrieval.

In this chapter, we will study the architectural issues involved in developing multimedia information systems, and we will introduce the notion of media abstraction to provide for a uniform approach to arbitrary media objects.

Finally, we will discuss the additional problems that networked multimedia confront us with.

...

issues

multimedia storage and retrieval -- homegrown, third-party and legacy sources
information architecture -- common format, native format, hybrid
media abstraction -- unified indexes, query relaxation

content organisation

autonomy -- index per media type
uniformity -- unified index
hybrid -- media indexes + unified index

Principle of Uniformity

... from a semantical point of view the content of a multimedia source is independent of the source itself, so we may use statements as meta data to provide a description of media objects.

from a semantical point of view the content of a multimedia source is independent of the source itself.

use statements as meta data

$md(o)$ -- metadata associated with media object o

tradeoffs

metadata can be stored using standard relational and OO structures
manipulating metadata is easy
feature extraction is (!) straightforward
is it?

software architecture

a database of media object, supporting
operations on media objects, and offering
logical views on media objects

information retrieval cycle

specification of the user's information need
translation into query operations
search and retrieval of media objects
ranking according to likelihood or relevance
presentation of results and user feedback
resulting in a possibly modified query

despite high interactivity, access is difficult;
quick response is and will remain important!

...

media abstraction

state -- smallest chunk of media data
feature -- any object in a state
attributes -- characteristics of objects
feature extraction map -- to identify content
relations -- to capture state-dependent information
(inter)relations between 'states' or chunks

example -- image database


  states: { pic1.gif,...,picn.gif } 
  features: names of people 
  extraction: find people in pictures 
  relations: left-of, ...

example -- video database


  states:  set of frames 
  features:  persons and objects
  extraction:  gives features per frame 
  relations:  frame-dependent and frame-independent information
  inter-state relation:  specifies sequences of frames

simple multimedia database

a finite set $M$ of media abstractions

structured multimedia database

equivalence relations --to deal with synonymy
partial ordering -- to deal with inheritance
query relaxation -- to please the user

...

SMDS -- functions


  Type: object  $|->$  type 
  ObjectWithFeatures:  $f |-> { o |$  object o contains  $f }$  
  ObjectWithFeaturesAndAttributes:  $(f,a,v) |-> { o |$  o contains f with  $a=v }$  
  FeaturesInObject:  $o |-> { f | o$  contains  $f }$  
  FeaturesAndAttributesInObject:  $o |-> { (f,a,v) | o$  contains  $f$  with  $a=v }$

SMDS-SQL

SELECT -- media entities

m -- if m is not a continuous media object
$m:[i,j]$ -- m is continuous, $i,j$ integers (segments)
$m.a$ -- m is media entity, a is attribute

FROM

<media><source><M>

WHERE

term IN funcall

example


    SELECT M
    FROM   smds source1 M
    WHERE  Type(M) = Image AND
  	 M IN ObjectWithFeature("Dennis") AND
  	 M IN ObjectWithFeature("Jane") AND
  	 left("Jane","Dennis",M)

hybrid representations: HM-SQL

express queries in specialized language
perform operations (joins) between SMDS and non-SMDS data

differences

function calls are annotated with media source
queries to non-SMDS data may be embedded

example HM-SQL


   SELECT M
   FROM smds video1, videodb video2
   WHERE M IN smds:ObjectWithFeature("Dennis") AND
         M IN videodb:VideoWithObject("Dennis")

digital libraries

Digital libraries are constructed -- collected and organized -- by a community of users. Their functional capabilities support the information needs and users of this community. Digital libraries are an extension, enhancement and integration of a variety of information institutions as physicalplaces where resources are selected, collected, organized, preserved and accessed in support of a user community.

... federated structures that provide humans both intellectual and physical access to the huge and growing worldwide networks of information encoded in multimedia digital formats.

digital libraries (5S)

streams: (content) -- from text to multimedia content
structures: (data) -- from database to hypertext networks
spaces: (information) -- from vector space to virtual reality
scenarios: (procedures) -- from service to stories
societies: (stakeholders) -- from authors to libraries


   D-Lib Forum -- www.dlib.org
   Informedia -- www.informedia.cs.cmu.edu

...

networked multimedia

real-time transmission of continuous media information (audio, video)
substantial volumes of data (despite compression)
distribution-oriented -- e.g. audio/video broadcast

network criteria

throughput -- bitrates, burstiness
transmission delay -- including signal propagation time
delay variation -- jitter
error rate -- data alteration, loss

multicasting and broadcasting capabilities
document caching

Quality of Service

Quality of Service is a concept based on the statement that not all applications need the same performance from the network over which they run. Thus, applications may indicate their specific requirements to the network, before they actually start transmitting information data.

QoS requirements

hard requirements
guidance for optimizing internal resources
criteria for acceptance

...

virtual objects

$VO = { (O_i,Q_i,C_i) | 1 <= i <= k }$

where

$C_1,...,C_k$ -- mutually exclusive conditions
$Q_1,...,Q_k$ -- queries
$O_1,...,O_k$ -- objects

...

networked virtual environments

shared sense of space -- room, building, terrain
shared sense of presence -- avatar (body and motion)
shared sense of time -- real-time interaction and behavior

a way to communicate -- by gesture, voice or text
a way to share ... -- interaction through objects

challenges

network bandwidth -- limited resource
heterogeneity -- multiple platforms
distributed interaction -- network delays
resource management -- real-time interaction and shared objects
failure management -- stop, ..., degradation
scalability -- wrt. number of participants

manage dynamic shared state

...

the Java Media Framework, and
the DLP+X3D platform

java Media Framework

The JavaTM Media APIs meet the increasing demand for multimedia in the enterprise by providing a unified, non-proprietary, platform-neutral solution. This set of APIs supports the integration of audio and video clips, animated presentations, 2D fonts, graphics, and images, as well as speech input/output and 3D models. By providing standard players and integrating these supporting technologies, the Java Media APIs enable developers to produce and distribute compelling, media-rich content.

recommender economy

cross sale -- users who bought A also bought B
up sale -- if you buy A and B together ...

recommender model


  U = user
  I = item
  B = behavior
  R = recommendation
  F = feature

observations -- U \* I \* B
recommendations -- U \* I


  B = [ time = 20sec, rating = r ]
  F = [ artist = rembrandt, topic = portrait ]
  R = [ artist(rembrandt) = r, topic(portrait) = r ]


  A = [  p_{1}, p_2 , ... ]
  where p_{k} = [ f_1 = v_1, f_2 = v_2, ... ]

with as an example


   A_{nightwatch} = [ artist=rembrandt, topic=group ]
   A_{guernica} = [ artist=picasso, topic=group ]

...

users, artworks and properties

distance metric


       d(x,y) = d(y,x)
       d(x,y) <= d(x,z) + d(z,y)
       d(x,x) = 0

dimension(s)

positive vs negative
individual vs community/collaborative
feature-based vs item-based

interpretation(s)

neutral interpretation -- use d(s_{n}, a_{k}) < d(s_{n}, s_{n+1} )
positive interpretation -- increase w(feature(a_{k}))
negative interpretation -- decrease w(feature(s_{n+1}))

...

7. information system architecture

(*) What are the issues in designing a (multimedia) information system architecture. Discuss the tradeoffs involved.

concepts

technology

projects & further reading

As a project, you may implement a multi-player game in which you may exchange pictures and videos, for example pictures and videos of celebrities.

Further you may explore the development of a data format for text, images and video with appropriate presentation parameters, including postioning on the screen and intermediate transitions.

For further reading you may study information system architecture patterns, nd explore the technical issues of constructing server based advanced multimedia applications in [Fundamentals].

the artwork

examples of dutch design, from [Flat].
idem.
screenshots -- from splinter cell: chaos theory, taken from Veronica/Gammo, a television program about games.
screenshots -- respectively Sekken 5, Sims 2, and Super Monkey Ball, taken from insidegamer.nl.
screenshots -- from Unreal Tournament, see section 7.3.
idem.
idem.
resonance -- exhibition and performances, Montevideo, april 2005.
CHIP -- property diagram connecting users.
signs -- sports, [Signs], p. 274, 275.

Æliens

2009

information retrieval

information

solutions

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('@tmp-part-iii.html#slide-'); } document.write('aest-ececlop'); if (!slidemode) document.write('.html');document.write('>');

projects & further reading

content annotation

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('@tmp-part-iii.html#slide-'); } document.write('r-5-2-quest'); if (!slidemode) document.write('.html');document.write('>');

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('@tmp-part-iii.html#slide-'); } document.write('6-4-decision-techn'); if (!slidemode) document.write('.html');document.write('>');

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('@tmp-part-iii.html#slide-'); } document.write('6-4-decision-expla'); if (!slidemode) document.write('.html');document.write('>');

projects & further reading

information system architecture

virtual objects

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('@tmp-part-iii.html#slide-'); } document.write('decision-model-privacy'); if (!slidemode) document.write('.html');document.write('>');

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('@tmp-part-iii.html#slide-'); } document.write('decision-model-space'); if (!slidemode) document.write('.html');document.write('>');

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('@tmp-part-iii.html#slide-'); } document.write('decision-behavior'); if (!slidemode) document.write('.html');document.write('>');

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('@tmp-part-iii.html#slide-'); } document.write('decision-chip-collection'); if (!slidemode) document.write('.html');document.write('>');

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('@tmp-part-iii.html#slide-'); } document.write('description-distance'); if (!slidemode) document.write('.html');document.write('>');

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('@tmp-part-iii.html#slide-'); } document.write('decision-dimensions'); if (!slidemode) document.write('.html');document.write('>');

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('@tmp-part-iii.html#slide-'); } document.write('decision-interpr'); if (!slidemode) document.write('.html');document.write('>');

projects & further reading