topical media & game development

topical media & game development

[] readme course(s) preface I 1 2 II 3 4 III 5 6 7 IV 8 9 10 V 11 12 afterthought(s) appendix reference(s) example(s) resource(s) _

research directions -- information retrieval models

Information retrieval research has quite a long history, with a focus on indexing text and developing efficient search algorithms. Nowadays, partly due to the wide-spread use of the web, research in information retrieval includes modeling, classification and clustering, system architectures, user interfaces, information visualisation, filtering, descriptive languages, etcetera. See [IR].

information retrieval

Information retrieval, according to [IR], deals with the representation, storage, organisation of, and access to information items.

To see what is involved, imagine that we have a (user) query like:

find me the pages containg information on ...

Then the goal of the information retrieval system is to retrieve information that is useful or relevant to the user, in other words: information that satisfies the user's information need.

Given an information repository, which may consist of web pages but also multimedia objects, the information retrieval system must extract syntactic and semantic information from these (information) items and use this to match the user's information need.

Effective information retrieval is determined by, on the one hand, the user task and, on the other hand, the logical view of the documents or media objects that constitute the information repository. As user tasks, we may distinguish between retrieval (by query) and browsing (by navigation). To obtain the relevant information in retrieval we generally apply filtering, which may also be regarded as a ranking based on the attributes considered most relevant.

The logical view of text documents generally amounts to a set of index terms characterizing theb document. To find relevant index terms, we may apply operations to the document, such as the elimination of stop words or text stemming. As you may easily see, full text provides the most complete logical view, whereas a small set of categories provides the most concise logical view. Generally, the user task will determine whether semantic richness or efficiency of search will be considered as more important when deciding on the obvious tradeoffs involved.

information retrieval models

In [IR], a great variety of information retrieval models is described. For your understanding, an information retrieval model makes explicit how index terms are represented and how the index terms characterizing an information item are matched with a query.

When we limit ourselves to the classic models for search and filtering, we may distinguish between:

information retrieval models

boolean or set-theoretic models
vector or algebraic models
probabilistic models

Boolean models typically allow for yes/no answers only. The have a set-theoretic basis, and include models based on fuzzy logic, which allow for somewhat more refined answers.

Vector models use algebraic operations on vectors of attribute terms to determine possible matches. The attributes that make up a vector must in principle be orthogonal. Attributes may be given a weight, or even be ignored. Much research has been done on how to find an optimal selection of attributes for a given information repository.

Probabilistic models include general inference networks, and belief networks based on Bayesan logic.

Although it is somewhat premature to compare these models with respect to their effectiveness in actual information etrieval tasks, there is, according to [IR], a general consensus that vector models will outperform the probabilistic models on general collections of text documents. How they will perform for arbitrary collections of multimedia objects might be an altogether different question!

Nevertheless, in the sections to follow we will focus primarily on generalized vector representations of multimedia objects. So, let's conclude with listing the advantages of vector models.

vector models

attribute term weighting scheme improves performance
partial matching strategy allows retrieval of approximate material
metric distance allows for sorting according to degree of similarity

Reading the following sections, you will come to understand how to adopt an attribute weighting scheme, how to apply partial matching and how to define a suitable distance metric.

So, let me finish with posing a research issue: How can you improve a particular information retrieval model or matching scheme by using a suitable method of knowledge representation and reasoning? To give you a point of departure, look at the logic-based multimedia information retrieval system proposed in [Dolores].

[] readme course(s) preface I 1 2 II 3 4 III 5 6 7 IV 8 9 10 V 11 12 afterthought(s) appendix reference(s) example(s) resource(s) _

You may not copy or print any of this material without explicit permission of the author or the publisher. In case of other copyright issues, contact the author.

research directions -- information retrieval models

information retrieval models

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('research-5-1.html#slide-'); } document.write('r-4-1-models'); if (!slidemode) document.write('.html');document.write('>');

document.write(' <a href=');if (!slidemode) { document.write('@slide-'); } else { document.write('research-5-1.html#slide-'); } document.write('r-4-1-vector'); if (!slidemode) document.write('.html');document.write('>');