information retrieval
learning objectives
After reading this chapter you should be able to
describe scenarios for information retrieval,
to explain how content analysis for images can be done,
to characterize similarity metrics,
to define the notions of recall and precision,
and to give an example of frequence tables,
as used in text search.

Searching for information on the web is cumbersome. Given our experiences today, we may not even want to think about searching for multimedia information on the (multimedia) web.
Nevertheless, in this chapter we will briefly sketch one of the possible scenarios indicating the need for multimedia search. In fact, once we have the ability to search for multimedia information, many scenarios could be thought of.
As a start, we will look at two media types, images and documents. We will study search for images, because it teaches us important lessons about content analysis of media objects and what we may consider as being similar. Perhaps surprisingly, we will study text documents because, due to our familiarity with this media type, text documents allow us to determine what we may understand by effective search.

Amsterdam Drugport
Amsterdam is an international centre of traffic and trade. It is renowned for its culture and liberal attitude, and attracts tourists from various ages, including young tourists that are attracted by the availability of soft drugs. Soft drugs may be obtained at so-called coffeeshops, and the possession of limited amounts of soft drugs is being tolerated by the authories.
The European Community, however, has expressed their concern that Amsterdam is the centre of an international criminal drug operation. Combining national and international police units, a team is formed to start an exhaustive investigation, under the code name Amsterdam Drugport.
information
- video surveillance -- monitoring
- telephone wiretaps -- audio recording
- photography -- archive
- documents -- investigations
- transactions -- structured data
- geographic information -- locations, routes

media types
- images -- photos
- video -- surveillance
- audio -- interviews, phone tracks
- documents -- forensic, reports
- handwriting -- notes
- structured data -- transactions

retrieval
- image query -- all images with this person
- audio query -- identity of speaker
- text query -- all transactions with BANK Inc.
- video query -- all segments with victim
- complex queries -- convicted murderers with BANK transactions
- heterogeneous queries -- photograph + murderer + transaction
- complex heterogeneous queries -- in contact with + murderer + transaction

information retrieval
Information retrieval, according to [IR], deals with the representation, storage, organisation of, and access to information items.
To see what is involved, imagine that we have a (user) query like:
find me the pages containg information on ...
information retrieval models
- boolean or set-theoretic models
- vector or algebraic models
- probabilistic models

vector models
- attribute term weighting scheme improves performance
- partial matching strategy allows retrieval of approximate material
- metric distance allows for sorting according to degree of similarity

image query
- obtaining descriptive information
- establishing similarity
content-based description
- objects in image
- shape descriptor -- shape/region of object
- property description -- cells in image

shape
- bounding box -- (XLB,XUB,YLB,YUB)
property
example
shape descriptor: XLB=10; XUB=60; YLB=3; YUB=50 (rectangle)
property descriptor: pixel(14,7): R=5; G=1; B=3

definitions
- image grid: cells of equal size
- cell property: (Name, Value, Method)
example
property: (bwcolor,{b,w},bwalgo)

similarity-based retrieval
How do we determine whether the content of a segment
(of a segmented image) is similar to another image (or set
of images)?
solutions
- metric approach -- distance between two image objects
- transformation approach -- relative to specification

metric approach
distance is distance measure if:
d(x,y) = d(y,x)
d(x,y) d(x,z) + d(z,y)
d(x,x) = 0
pixel properties
- objects with pixel properties
- pixels:
- object contains w x h (n+2)-tuples
complexity
a set of points in k-dimensional space for k = n + 2

feature extraction
- maps object into s-dimensional space
transformation approach
Given two objects o1 and o2,
the level of dissimilarity is proportional
to the (minimum) cost of transforming object o1
into object o2 or vice versa

transformation operators
-- translation, rotation, scaling

cost
-
distance
-
advantages
- user-defined similarity -- choice of transformation operators
- user-defined cost-function

operations
rotate(image-id,dir,angle)
segment(image-id, predicate)
edit(image-id, edit-op)
image repository
- storage -- unsegmented images
- description -- limited set of features
- index -- feature-based index
- retrieval -- distance between feature vectors

mission
Our goal is to study aspects of the deployment and architecture of virtual environments as an interface to (intelligent) multimedia information systems ...

query
- document database + string matching
problems
- synonymy -- topic T does not occur literally in document D
- polysemy -- some words may have many meanings

effective search
- precision -- how many answers are correct
- recall -- how many of the right documents are returned

precision and recall
precision = ( returned and relevant ) / returned
recall = ( returned and relevant ) / relevant
anomalies
- return all documents: perfect recall, low precision
- return 'nothing': 'perfect' precision, low recall

example
term/document | d0 | d1 | d2 |
snacks | 1 | 0 | 0 |
drinks | 1 | 0 | 3 |
rock-roll | 0 | 1 | 1 |

complextity
compare term frequencies per document -- O(M*N)
reduction
- stop list -- irrelevant words
- word stems -- reduce different words to relevant part
user-oriented measures
- coverage ratio -- fraction of known documents
- novelty ratio -- fraction of new (relevant) documents
- relative recall -- fraction of expected documents
- recall effort -- fraction of examined documents

|
|
(a) context | (b) self-reflection |

Aesthetics
- intentions -- motives of the artist
- expression -- where form takes over
- representation -- the relation of art to reality

concepts

technology

projects & further reading
As a project, you may implement simple
image analysis algorithms that, for example, extract
a color histogram, or detect the presence of
a horizon-like edge.
You may further explore
scenarios for information retrieval in the
cultural heritage domain.
and compare this with other
applications of multimedia information retrieval,
for example monitoring in hospitals.
For further reading I suggest to make yourself
familiar with common techniques
in information retrieval as described in [IR],
and perhaps devote some time to studying image analisis, [Image].

- artworks -- ..., Miro, Dali, photographed from
Kunstsammlung Nordrhein-Westfalen, see artwork 2.
- left Miro from [Kunst], right: Karel Appel
- match of the day (1) -- Geert Mul
- match of the day (2) -- Geert Mul
- match of the day (3) -- Geert Mul
- mario ware -- taken from gammo/veronica.
- baten kaitos -- eternal ways and the lost ocean, taken from gammo/veronica.
- idem.
- PANORAMA -- screenshots from field test.
- signs -- people, [Signs], p. 252, 253.
