information retrieval
learning objectives
After reading this chapter you should be able to
describe scenarios for information retrieval,
to explain how content analysis for images can be done,
to characterize similarity metrics,
to define the notions of recall and precision,
and to give an example of frequence tables,
as used in text search.

Searching for information on the web is cumbersome. Given our experiences today, we may not even want to think about searching for multimedia information on the (multimedia) web.
Nevertheless, in this chapter we will briefly sketch one of the possible scenarios indicating the need for multimedia search. In fact, once we have the ability to search for multimedia information, many scenarios could be thought of.
As a start, we will look at two media types, images and documents. We will study search for images, because it teaches us important lessons about content analysis of media objects and what we may consider as being similar. Perhaps surprisingly, we will study text documents because, due to our familiarity with this media type, text documents allow us to determine what we may understand by effective search.

Amsterdam Drugport
Amsterdam is an international centre of traffic and trade. It is renowned for its culture and liberal attitude, and attracts tourists from various ages, including young tourists that are attracted by the availability of soft drugs. Soft drugs may be obtained at so-called coffeeshops, and the possession of limited amounts of soft drugs is being tolerated by the authories.
The European Community, however, has expressed their concern that Amsterdam is the centre of an international criminal drug operation. Combining national and international police units, a team is formed to start an exhaustive investigation, under the code name Amsterdam Drugport.
information
- video surveillance -- monitoring
- telephone wiretaps -- audio recording
- photography -- archive
- documents -- investigations
- transactions -- structured data
- geographic information -- locations, routes

media types
- images -- photos
- video -- surveillance
- audio -- interviews, phone tracks
- documents -- forensic, reports
- handwriting -- notes
- structured data -- transactions

retrieval
- image query -- all images with this person
- audio query -- identity of speaker
- text query -- all transactions with BANK Inc.
- video query -- all segments with victim
- complex queries -- convicted murderers with BANK transactions
- heterogeneous queries -- photograph + murderer + transaction
- complex heterogeneous queries -- in contact with + murderer + transaction

information retrieval
Information retrieval, according to [IR], deals with the representation, storage, organisation of, and access to information items.
To see what is involved, imagine that we have a (user) query like:
find me the pages containg information on ...
information retrieval models
- boolean or set-theoretic models
- vector or algebraic models
- probabilistic models

vector models
- attribute term weighting scheme improves performance
- partial matching strategy allows retrieval of approximate material
- metric distance allows for sorting according to degree of similarity

image query
- obtaining descriptive information
- establishing similarity
content-based description
- objects in image
- shape descriptor -- shape/region of object
- property description -- cells in image

shape
- bounding box -- (XLB,XUB,YLB,YUB)
property
example
shape descriptor: XLB=10; XUB=60; YLB=3; YUB=50 (rectangle)
property descriptor: pixel(14,7): R=5; G=1; B=3

definitions
- image grid: cells of equal size
- cell property: (Name, Value, Method)
example
property: (bwcolor,{b,w},bwalgo)

similarity-based retrieval
How do we determine whether the content of a segment
(of a segmented image) is similar to another image (or set
of images)?
solutions
- metric approach -- distance between two image objects
- transformation approach -- relative to specification

metric approach
distance is distance measure if:
d(x,y) = d(y,x)
d(x,y) d(x,z) + d(z,y)
d(x,x) = 0
pixel properties
- objects with pixel properties
- pixels:
- object contains w x h (n+2)-tuples
complexity
a set of points in k-dimensional space for k = n + 2

feature extraction
- maps object into s-dimensional space
transformation approach
Given two objects o1 and o2,
the level of dissimilarity is proportional
to the (minimum) cost of transforming object o1
into object o2 or vice versa

transformation operators
-- translation, rotation, scaling

cost
-
distance
-
advantages
- user-defined similarity -- choice of transformation operators
- user-defined cost-function

operations
rotate(image-id,dir,angle)
segment(image-id, predicate)
edit(image-id, edit-op)
image repository
- storage -- unsegmented images
- description -- limited set of features
- index -- feature-based index
- retrieval -- distance between feature vectors

mission
Our goal is to study aspects of the deployment and architecture of virtual environments as an interface to (intelligent) multimedia information systems ...

query
- document database + string matching
problems
- synonymy -- topic T does not occur literally in document D
- polysemy -- some words may have many meanings

effective search
- precision -- how many answers are correct
- recall -- how many of the right documents are returned

precision and recall
precision = ( returned and relevant ) / returned
recall = ( returned and relevant ) / relevant
anomalies
- return all documents: perfect recall, low precision
- return 'nothing': 'perfect' precision, low recall

example
term/document | d0 | d1 | d2 |
snacks | 1 | 0 | 0 |
drinks | 1 | 0 | 3 |
rock-roll | 0 | 1 | 1 |

complextity
compare term frequencies per document -- O(M*N)
reduction
- stop list -- irrelevant words
- word stems -- reduce different words to relevant part
user-oriented measures
- coverage ratio -- fraction of known documents
- novelty ratio -- fraction of new (relevant) documents
- relative recall -- fraction of expected documents
- recall effort -- fraction of examined documents

|
|
(a) context | (b) self-reflection |

Aesthetics
- intentions -- motives of the artist
- expression -- where form takes over
- representation -- the relation of art to reality

concepts

technology

projects & further reading
As a project, you may implement simple
image analysis algorithms that, for example, extract
a color histogram, or detect the presence of
a horizon-like edge.
You may further explore
scenarios for information retrieval in the
cultural heritage domain.
and compare this with other
applications of multimedia information retrieval,
for example monitoring in hospitals.
For further reading I suggest to make yourself
familiar with common techniques
in information retrieval as described in [IR],
and perhaps devote some time to studying image analisis, [Image].

- artworks -- ..., Miro, Dali, photographed from
Kunstsammlung Nordrhein-Westfalen, see artwork 2.
- left Miro from [Kunst], right: Karel Appel
- match of the day (1) -- Geert Mul
- match of the day (2) -- Geert Mul
- match of the day (3) -- Geert Mul
- mario ware -- taken from gammo/veronica.
- baten kaitos -- eternal ways and the lost ocean, taken from gammo/veronica.
- idem.
- PANORAMA -- screenshots from field test.
- signs -- people, [Signs], p. 252, 253.

video annotation requires a logical approach to story telling

content annotation
learning objectives
After reading this chapter you should be able to
explain the difference between content and meta information,
to mention relevant content parameters for audio,
to characterize the requirements for video libraries,
to define an annotation logic for video,
and to discuss feature extraction in samples of musical material.

Current technology does not allow us to extract information
automatically from arbitrary media objects.
In these cases, at least for the time being,
we need to assist search by annotating content
with what is commonly referred to as meta-information.
In this chapter, we will look at two more media types,
in particular audio and video.
Studying audio, we will learn how we may combine
feature extraction and meta-information to define a
data model that allows for search.
Studying video, on the other hand,
will indicate the complexity of devising a
knowledge representation scheme that captures
the content of video fragments.
Concluding this chapter, we will discuss an architecture
for feature extraction for arbitrary media objects.
audio databases
- audio signals -- compression, discrete representation
- musical patterns -- similarity-based retrieval

audio data model
- meta-data -- describing content
- features -- using feature extraction

example
singers -- (Opera,Role,Person)
score -- ...
transcript -- ...

signal-based content
- audio data -- over time x
- wave -- period T, frequency
- velocity -- , with wavelength
- amplitude -- a
windowing
- break signal up in small windows of time

feature extraction
- intensity -- watts/
- loudness -- in decibels
- pitch -- from frequency and amplitude
- brightness -- amount of distortion

video annotation
- what are the interesting aspects?
- how do we represent this information?
video content
video v, frame f
f has associated objects and activities
objects and activities have properties

property
property: name = value
object schema
(fd,fi) -- frame-dependent and frame-independent properties
object instance: (oid,os,ip)

example
frame | objects | frame-dependent properties |
1 | Jane | has(briefcase), at(path) |
- | house | door(closed) |
- | briefcase | |
2 | Jane | has(briefcase), at(door) |
- | Dennis | at(door) |
- | house | door(open) |
- | briefcase | |

frame-independent properties
object | frame-independent properties | value |
Jane | age | 35 |
| height | 170cm |
house | address | ... |
| color | brown |
briefcase | color | black |
| size | 40 x 31 |

activity
- activity name -- id
- statements -- role = v
example
{ giver : Person, receiver : Person, item : Object }
giver = Jane, receiver = Dennis, object = briefcase

video libraries
which videos are in the library
what constitutes the content of each video
what is the location of a particular video

query language for video libraries
- segment retrievals -- exchange of briefcase
- object retrievals -- all people in v:[s,e]
- activity retrieval -- all activities in v:[s,e]
- property-based -- find all videos with object oid

VideoSQL
SELECT -- v:[s,e]
FROM -- video:<source><V>
WHERE -- term IN funcall

example
SELECT vid:[s,e]
FROM video:VidLib
WHERE (vid,s,e) IN VideoWithObject(Dennis) AND
object IN ObjectsInVideo(vid,s,e) AND
object != Dennis AND
typeof(object) = Person

To improve library access, the Informedia
Digital Video Library uses automatic processing to derive
descriptors for video.
A new extension to the video processing extracts geographic
references from these descriptors.
The operational library interface shows the geographic entities addressed in a story, highlighting the regions discussed in the video through a map display synchronized with the video display.

The map can also serve as a query mechanism, allowing users to search the terabyte library for stories taking place in a selected area of interest.

questions
- what -- content-related
- when -- position on time-continuum
- where -- geographic location

More recently, it has been recognized that
the process of spatialization -- where a spatial
map-like structure is applied to data where no inherent
or obvious one does exist -- can provide an interpretable
structure to other types of data.

atlas of cyberspace
We present a wide range of spatializations that have
employed a variety of graphical techniques and visual metaphors
so as to provide striking and powerful images that extend
from two dimension 'maps' to three-dimensional immersive landscapes.

feature grammar
detector song; ## to get the filename
detector lyrics; ## extracts lyrics
detector melody; ## extracts melody
detector check; ## to walk the tree
atom str name;
atom str text;
atom str note;
midi: song;
song: file lyrics melody check;
file: name;
lyrics: text*;
melody: note*;
event('twinkle',2,time=384, note_on:[chan=2,pitch=72,vol=111]).
event('twinkle',2,time=768, note_off:[chan=2,pitch=72,vol=100]).
melody detector
int melodyDetector(tree *pt, list *tks ){
char buf[1024]; char* _result;
void* q = _query;
int idq = 0;
idq = query_eval(q,"X:melody(X)");
while ((_result = query_result(q,idq)) ) {
putAtom(tks,"note",_result);
}
return SUCCESS;
}
prediction techniques
- social-based -- dependent on (group) rating of item(s)
- information-based -- dependent on features of item(s)
- hybrid methods -- combining predictors

definition(s)
- rating -- a value representing a user's interest
- recommendation -- item(s) that might be of interest to the user
- regret -- a function to measure the accuracy of recommendations

guided tour(s)
- automated (viewpoint) navigation in virtual space,
- an animation explaining, for example, the construction of an artwork, or
- the (narrative) presentation of a sequence of concept nodes.

concepts

technology

projects & further reading
As a project, think of implementing musical similarity matching,
or developing an application retrieving video fragments
using a simple annotation logic.
You may further explore the
construction of media repositories, and finding a
balance between automatic indexing, content search and
meta information.
For further reading I advice you to google
recent research on video analysis,
and the online material
on search engines.

- works from [Design]
- faces -- from www.alterfin.org, an interesting site with many surprising interactive toys in flash, javascript and html.
- mouth -- Annika Karlson Rixon, entitled A slight Acquaintance, taken from a theme article about the body in art and science, the Volkskrant, 24/03/05.
- story -- page from the comic book version of City of Glass, [Glass], drawn in an almost tradional style.
- story -- frame from [Glass].
- story -- frame from [Glass].
- story -- frame from [Glass].
- white on white -- typographical joke.
- modern art -- city of light (1968-69), Mario Merz, taken from [Modern].
- modern art -- Marocco (1972), Krijn Griezen, taken from [Modern].
- modern art -- Indestructable Object (1958), Man Ray, Blue, Green, Red I (1964-65), Ellsworth Kelly, Great American Nude (1960), T. Wesselman, taken from [Modern].
- signs -- sports, [Signs], p. 272, 273.

effective retrieval requires visual interfaces

information system architecture
learning objectives
After reading this chapter you should be able to
dicuss the considerations that play a role in
developing a multimedia information system,
characterize an abstract multimedia data format,
give examples of multimedia content queries,
define the notion of virtual resources,
and discuss the requirements for networked virtual
environments.

From a system development perspective, a multimedia
information system may be considered as a multimedia
database,
providing storage and retrieval facilities for media objects.
Yet, rather than a solution this presents us with a
problem,
since there are many options to provide such storage facilities
and equally many to support retrieval.
In this chapter, we will study the architectural issues
involved in developing multimedia information systems,
and we will introduce the notion of media abstraction
to provide for a uniform approach to arbitrary media objects.
Finally, we will discuss the additional problems
that networked multimedia confront us with.
issues
- multimedia storage and retrieval -- homegrown, third-party and legacy sources
- information architecture -- common format, native format, hybrid
- media abstraction -- unified indexes, query relaxation

content organisation
- autonomy -- index per media type
- uniformity -- unified index
- hybrid -- media indexes + unified index

Principle of Uniformity
... from a semantical point of view the content of a multimedia source is independent of the source itself, so we may
use statements as meta data to provide a description
of media objects.
- from a semantical point of view the content of a multimedia source is independent of the source itself.
- use statements as meta data
- -- metadata associated with media object o

tradeoffs
- metadata can be stored using standard relational and OO structures
- manipulating metadata is easy
- feature extraction is (!) straightforward
is it?

software architecture
- a database of media object, supporting
- operations on media objects, and offering
- logical views on media objects

information retrieval cycle
- specification of the user's information need
- translation into query operations
- search and retrieval of media objects
- ranking according to likelihood or relevance
- presentation of results and user feedback
- resulting in a possibly modified query

- despite high interactivity, access is difficult;
- quick response is and will remain important!

media abstraction
- state -- smallest chunk of media data
- feature -- any object in a state
- attributes -- characteristics of objects
- feature extraction map -- to identify content
- relations -- to capture state-dependent information
- (inter)relations between 'states' or chunks

example -- image database
states: { pic1.gif,...,picn.gif }
features: names of people
extraction: find people in pictures
relations: left-of, ...

example -- video database
states: set of frames
features: persons and objects
extraction: gives features per frame
relations: frame-dependent and frame-independent information
inter-state relation: specifies sequences of frames

simple multimedia database
- a finite set of media abstractions
structured multimedia database
- equivalence relations --to deal with synonymy
- partial ordering -- to deal with inheritance
- query relaxation -- to please the user

SMDS -- functions
Type: object type
ObjectWithFeatures: object o contains
ObjectWithFeaturesAndAttributes: o contains f with
FeaturesInObject: contains
FeaturesAndAttributesInObject: contains with

SMDS-SQL
SELECT -- media entities
- m -- if m is not a continuous media object
- -- m is continuous, integers (segments)
- -- m is media entity, a is attribute
FROM
WHERE

example
SELECT M
FROM smds source1 M
WHERE Type(M) = Image AND
M IN ObjectWithFeature("Dennis") AND
M IN ObjectWithFeature("Jane") AND
left("Jane","Dennis",M)

hybrid representations: HM-SQL
- express queries in specialized language
- perform operations (joins) between SMDS and non-SMDS data
differences
- function calls are annotated with media source
- queries to non-SMDS data may be embedded

example HM-SQL
SELECT M
FROM smds video1, videodb video2
WHERE M IN smds:ObjectWithFeature("Dennis") AND
M IN videodb:VideoWithObject("Dennis")

digital libraries
Digital libraries are constructed -- collected and organized --
by a community of users.
Their functional capabilities support the information needs
and users of this community.
Digital libraries are an extension, enhancement and integration of
a variety of information institutions as physicalplaces
where resources are selected, collected, organized, preserved
and accessed in support of a user community.

... federated structures that provide humans both
intellectual and physical access to the huge and growing
worldwide networks of information encoded in multimedia
digital formats.

digital libraries (5S)
- streams: (content) -- from text to multimedia content
- structures: (data) -- from database to hypertext networks
- spaces: (information) -- from vector space to virtual reality
- scenarios: (procedures) -- from service to stories
- societies: (stakeholders) -- from authors to libraries

networked multimedia
- real-time transmission of continuous media information (audio, video)
- substantial volumes of data (despite compression)
- distribution-oriented -- e.g. audio/video broadcast

network criteria
- throughput -- bitrates, burstiness
- transmission delay -- including signal propagation time
- delay variation -- jitter
- error rate -- data alteration, loss

- multicasting and broadcasting capabilities
- document caching

Quality of Service
Quality of Service is a concept
based on the statement that not all applications need
the same performance from the network over which they run.
Thus, applications may indicate their specific requirements
to the network, before they actually start transmitting
information data.
QoS requirements
- hard requirements
- guidance for optimizing internal resources
- criteria for acceptance

virtual objects
-
where
- -- mutually exclusive conditions
- -- queries
- -- objects

networked virtual environments
- shared sense of space -- room, building, terrain
- shared sense of presence -- avatar (body and motion)
- shared sense of time -- real-time interaction and behavior

- a way to communicate -- by gesture, voice or text
- a way to share ... -- interaction through objects

challenges
- network bandwidth -- limited resource
- heterogeneity -- multiple platforms
- distributed interaction -- network delays
- resource management -- real-time interaction and shared objects
- failure management -- stop, ..., degradation
- scalability -- wrt. number of participants

manage dynamic shared state
- the Java Media Framework, and
- the DLP+X3D platform
java Media Framework
The JavaTM Media APIs meet the increasing demand for multimedia in the enterprise by providing a unified, non-proprietary, platform-neutral solution. This set of APIs supports the integration of audio and video clips, animated presentations, 2D fonts, graphics, and images, as well as speech input/output and 3D models. By providing standard players and integrating these supporting technologies, the Java Media APIs enable developers to produce and distribute compelling, media-rich content.

recommender economy
- cross sale -- users who bought A also bought B
- up sale -- if you buy A and B together ...

recommender model
U = user
I = item
B = behavior
R = recommendation
F = feature

- observations -- U \* I \* B
- recommendations -- U \* I

B = [ time = 20sec, rating = r ]
F = [ artist = rembrandt, topic = portrait ]
R = [ artist(rembrandt) = r, topic(portrait) = r ]

A = [ p_{1}, p_2 , ... ]
where p_{k} = [ f_1 = v_1, f_2 = v_2, ... ]
with as an example
A_{nightwatch} = [ artist=rembrandt, topic=group ]
A_{guernica} = [ artist=picasso, topic=group ]

|
users, artworks and properties |

distance metric
d(x,y) = d(y,x)
d(x,y) <= d(x,z) + d(z,y)
d(x,x) = 0

dimension(s)
- positive vs negative
- individual vs community/collaborative
- feature-based vs item-based

interpretation(s)
- neutral interpretation -- use d(s_{n}, a_{k}) < d(s_{n}, s_{n+1} )
- positive interpretation -- increase w(feature(a_{k}))
- negative interpretation -- decrease w(feature(s_{n+1}))

concepts

technology

projects & further reading
As a project, you may implement a multi-player
game in which you may exchange pictures and videos,
for example pictures and videos of celebrities.
Further you may explore the development of a data format
for text, images and video with appropriate presentation
parameters, including postioning on the screen
and intermediate transitions.
For further reading you may study
information system architecture patterns,
nd explore the technical issues of constructing
server based advanced multimedia applications
in [Fundamentals].

- examples of dutch design, from [Flat].
- idem.
- screenshots -- from splinter cell: chaos theory,
taken from
Veronica/Gammo,
a television program about games.
- screenshots -- respectively Sekken 5,
Sims 2,
and Super Monkey Ball,
taken from insidegamer.nl.
- screenshots -- from Unreal Tournament,
see section 7.3.
- idem.
- idem.
- resonance -- exhibition and performances,
Montevideo, april 2005.
- CHIP -- property diagram connecting users.
- signs -- sports, [Signs], p. 274, 275.
