topical media & game development
research directions -- user-oriented measures
Even though the reductions proposed may result in
limiting the size of the frequency tables,
we may still be faced with frequency tables
of considerable size.
One way to reduce the size further, as discussed
in
[MMDBMS], is to apply
latent sematic indexing
which comes down to clustering the document database,
and limiting ourselves to the most relevant words only,
where relevance is determined by the ratio of occurrence
over the total number of words.
In effect, the less the word occurs, the more discriminating
it might be.
Alternatively,the choice of what words are
considered relevant may be determined by
taking into account the area of application
or the interest of a particular group of users.

1
user-oriented measures
Observe that, when evaluating a particular information
retrieval system, the notions of precision and recall
as introduced before are rather system-oriented measures,
based on the assumption of a user-independent notion of
relevance.
However, as stated in [IR],
different users might have a different interpretation
on which document is relevant.
In [IR], some user-oriented measures are briefly discussed,
that to some extent cope with this problem.
user-oriented measures
- coverage ratio -- fraction of known documents
- novelty ratio -- fraction of new (relevant) documents
- relative recall -- fraction of expected documents
- recall effort -- fraction of examined documents

Consider a reference collection,
an example information request
and a retrieval strategy to be evaluated.
Then the coverage ratio
may be defined as the fraction of the documents
known to be relevant, or more precisely the number
of (known) relevant documents retrieved divided by the
total number of documents known to be relevant by the user.
The novelty ratio may then be defined as the
fraction of the documents retrieved which were not known
to be relevant by the user, or more precisely
the number of relevent documents that were not known
by the user divided by the total number of relevant documents
retrieved.
The relative recall is obtained by dividing
the number of relevant documents found by the number
of relevant documents the user expected to be found.
Finally, recall effortmay be characterized as
the ratio of the number of relevant documents
expected and the total number of documents that
has to be examined to retrieve these documents.
Notice that these measures all have a clearly 'subjective'
element, in that, although they may be
generalized to a particular group of users,
they will very likely not generalize to all
groups of users.
In effect, this may lead to different retrieval
strategies for different categories of users,
taking into account levelof expertise and familiarity
with the information repository.
(C) Æliens
04/09/2009
You may not copy or print any of this material without explicit permission of the author or the publisher.
In case of other copyright issues, contact the author.