image query
content-based description
shape
property
example
definitions
example
similarity-based retrieval
How do we determine whether the content of a segment (of a segmented image) is similar to another image (or set of images)?
Think of, for example, the problem of finding all photos
that match a particular face.
metric approach
pixel properties
complexity
a set of points in k-dimensional space for k = n + 2
feature extraction
transformation approach
Given two objects o1 and o2,
the level of dissimilarity is proportional
to the (minimum) cost of transforming object o1
into object o2 or vice versa
transformation operators cost distance advantages
operations
image repository
draft version 1 (16/5/2003)solutions
As we will see later, the transformation approach in
some way subsumes the metric approach, since we can formulate
a distance measure for the transformation approach as well.
metric approach
Leaving the details for your further research, it
is not hard to see that even if the absolute value
of a distance has no meaning, relative distances do.
So, when an image contains a face with dark sunglasses,
it will be closer to (an image of) a face with
dark sunglasses than a face without sunglasses,
other things being equal.
It is also not hard to see that a pixel-wise
approach is, computationally, quite complex.
An object is considered as
In other words, to establish similarity between two images
(that is, calculate the distance)
requires n+2 times the number of pixels comparisons.
feature extraction
For example, one of the features could indicate
whether or not it was a face with dark sunglasses.
So, instead of calculating the distance by
establishing color differences of between regions
of the images where sunglasses may be found,
we may limit ourselves to considering a binary value,
yes or no, to see whether the face has sunglasses.
transformation approach
Now, this principle might be applied to any representation
of an object or image, including feature vectors.
Yet, on the level of images, we may think of the
following operations:
Moreover, we can attach a cost to each of these
operations and calculate the cost of
a transformation sequence TSby summing the costs
of the individual operations.
Based on the cost function we can define a distance metric,
which we call for obvious reasons the edit distance,
to establish similarity between objects.
An obvious advantage of the edit distance
over the pixel-wise distance metric is thatwe may
have a rich choice of transformation operators
that we can attach (user-defined) cost to at will.
research directions -- multimedia repositories
And, indeed, this seems to be what most image
databases provide.
Note that the actual encoding is not of importance.
The same type of information can be encoded using
either XML, relational tables or object databases.
What is of importance is the functionality that is
offered to the user, in terms of storage and retrieval
as well as presentation facilities.
Obviously, the underlying multimedia repository
must provide adequate retrieval facilities
and must also be able to deliver the desired objects
in a format suitable for the representation and
possibly incorporation in such an environment.
Actually, at this stage, I have only some vague ideas
about how to make this vision come through.
Look, however, at chapter 7
and appendix platform
for some initial ideas.
[]
readme
preface
1
2
3
4
5
6
7
appendix
checklist
powerpoint
resources
director
eliens@cs.vu.nl