5

information retrieval

information retrieval is usually an afterthought

information retrieval

learning objectives

After reading this chapter you should be able to describe scenarios for information retrieval, to explain how content analysis for images can be done, to characterize similarity metrics, to define the notions of recall and precision, and to give an example of frequence tables, as used in text search.

Searching for information on the web is cumbersome. Given our experiences today, we may not even want to think about searching for multimedia information on the (multimedia) web.

Nevertheless, in this chapter we will briefly sketch one of the possible scenarios indicating the need for multimedia search. In fact, once we have the ability to search for multimedia information, many scenarios could be thought of.

As a start, we will look at two media types, images and documents. We will study search for images, because it teaches us important lessons about content analysis of media objects and what we may consider as being similar. Perhaps surprisingly, we will study text documents because, due to our familiarity with this media type, text documents allow us to determine what we may understand by effective search.

...



Amsterdam Drugport


Amsterdam is an international centre of traffic and trade. It is renowned for its culture and liberal attitude, and attracts tourists from various ages, including young tourists that are attracted by the availability of soft drugs. Soft drugs may be obtained at so-called coffeeshops, and the possession of limited amounts of soft drugs is being tolerated by the authories.

The European Community, however, has expressed their concern that Amsterdam is the centre of an international criminal drug operation. Combining national and international police units, a team is formed to start an exhaustive investigation, under the code name Amsterdam Drugport.

information

media types


retrieval


...



information retrieval


Information retrieval, according to  [IR], deals with the representation, storage, organisation of, and access to information items.

To see what is involved, imagine that we have a (user) query like:

find me the pages containg information on ...

information retrieval models


vector models


image query


content-based description


shape


property


example



  shape descriptor: XLB=10; XUB=60; YLB=3; YUB=50   (rectangle)  
  property descriptor: pixel(14,7): R=5; G=1; B=3 
  

definitions


example



  property: (bwcolor,{b,w},bwalgo) 
  

...



similarity-based retrieval


How do we determine whether the content of a segment (of a segmented image) is similar to another image (or set of images)?

solutions

metric approach


distance d:X->[0,1] is distance measure if:


           d(x,y) = d(y,x)
  	 d(x,y) <= d(x,z) + d(z,y)
  	 d(x,x) = 0
  

pixel properties


complexity


a set of points in k-dimensional space for k = n + 2

feature extraction


...



transformation approach


Given two objects o1 and o2, the level of dissimilarity is proportional to the (minimum) cost of transforming object o1 into object o2 or vice versa

transformation operators



    to_1,...,to_r  -- translation, rotation, scaling
  

cost


distance


advantages


operations



   rotate(image-id,dir,angle)
   segment(image-id, predicate)
   edit(image-id, edit-op)
  

...



image repository


mission


Our goal is to study aspects of the deployment and architecture of virtual environments as an interface to (intelligent) multimedia information systems ...

...



query


problems


...



effective search


precision and recall



  precision = ( returned and relevant ) / returned  
  recall = ( returned and relevant ) / relevant 
  

anomalies


example


term/documentd0d1d2
snacks100
drinks103
rock-roll011

complextity


compare term frequencies per document -- O(M*N)

reduction


...



...



user-oriented measures


...


(a) context(b) self-reflection

Aesthetics


...



5. information retrieval

concepts


technology


projects & further reading

As a project, you may implement simple image analysis algorithms that, for example, extract a color histogram, or detect the presence of a horizon-like edge.

You may further explore scenarios for information retrieval in the cultural heritage domain. and compare this with other applications of multimedia information retrieval, for example monitoring in hospitals.

For further reading I suggest to make yourself familiar with common techniques in information retrieval as described in  [IR], and perhaps devote some time to studying image analisis,  [Image].

the artwork

  1. artworks -- ..., Miro, Dali, photographed from Kunstsammlung Nordrhein-Westfalen, see artwork 2.
  2. left Miro from  [Kunst], right: Karel Appel
  3. match of the day (1) -- Geert Mul
  4. match of the day (2) -- Geert Mul
  5. match of the day (3) -- Geert Mul
  6. mario ware -- taken from gammo/veronica.
  7. baten kaitos -- eternal ways and the lost ocean, taken from gammo/veronica.
  8. idem.
  9. PANORAMA -- screenshots from field test.
  10. signs -- people,  [Signs], p. 252, 253.