In the following
we will first give an overview of musical
search facilities on the web and
then we will discuss similarity-based
retrieval of musical patterns in somewhat more depth
in the section on
research directions,
In section 6.3, we will have a closer look
at feature extraction for arbitrary media types.

1
musical similarity
Although intuitively obvious, how can we characterize musical
simularity?
And perhaps more importantly, how can we compute
the extent to which one piece of music or a melody line is
similar to another piece of music or melody line.
As concerns musical content, at least for most genres,
it appears that
According to [Concepts], we should focus primarily on melody,
since
"It is melody that makes music memorable:
we are likely to recall a tune long after we have
forgotten its text."
Other features, content-based as well as descriptive,
may however be used as additional filters in the proces
of retrieval.
Melodic searching and matching has been explored mainly
in the context of bibliographic tools and for the
analysis of (monophonic) repertories [Similarity].
As described in section [Web],
many of these efforts have been made available to
the general public through the Web.
Challenges for the near future are, however, to provide
for melodic similarity matching on polyphonic works,
and retrieval over very large databases of musical fragments.
In this section we will look in somewhat more detail
at the problem of melodic similarity matching.
In particular, we will discuss representational issues,
matching algorithms and additional analysis tools
that may be used for musical information retrieval.
melodic similarity
Consider the musical fragment Twinkle, twinkle little star
(known in the Dutch tradition as "Altijd is Kortjakje ziek"),
which has been used by Mozart for a series of variations [Mozart].
Now, imagine how you would approach establishing the similarity between
the original theme and these variations.
As a matter of fact, we discovered that exactly this problem
had been tackled in the study reported in [Compare], which we will discuss
later.
Before that, we may reflect on what we mean by the concept of a melody.
In the aforementioned variations the original melody is disguised
by, for example, decorations and accompaniments.
In some variations, the melody is distributed among
the various parts (the left and right hand).
In other variations, the melody is only implied by the harmonic structure.
Nevertheless, for the human ear there seems to be,
as it is called in [Concepts],
a 'prototypical' melody that is present in each of the variations.
When we restrict ourselves to pitch-based comparisons,
melodic similarity may be established by
comparing profiles of pitch-direction (up, down, repeat) or
pitch contours (which may be depicted graphically).
Also, given a suitable representation,
we may compare pitch-event strings (assuming a normalized
pitch representation such as position within a scale)
or intervallic contours (which gives the distance between
notes in for example semitones).
Following [Concepts], we may observe however that the more general
the system of representation, the longer the (query) string
will need to be to produce meaningful discriminations.
As further discussed in [Concepts], recent studies in
musical perception indicate that pitch-information without
durational values does not suffice.
representational issues
Given a set of musical fragments, we may envisage
several reductions to arrive at the (hypothetical)
prototypical melody.
Such reductions must provide for the elimination of
confounds such as rests, repeated notes and grace notes,
and result in, for example, a pitch-string (in a suitable representation),
a duration profile, and (possibly) accented note profiles
and harmonic reinforcement profiles (which capture notes that
are emphasized by harmonic changes).
Unfortunately, as observed in [Concepts],
the problem of which reductions to apply
is rather elusive, since it depends to a great extent
on the goals of the query and the repertory at hand.
As concerns the representation of pitch information,
there is a choice between a base-7 representation,
which corresponds with the position relative to the
tonic in the major or minor scales,
a base-12 representation, which corresponds
with a division in twelve semitones as in
the chromatic scale,
and more elaborate encodings,
which also reflect notational differences in identical notes
that arise through the use of accidentals.
For MIDI applications, a base-12 notation is most suitable,
since the MIDI note information is given in semitone steps.
In addition to relative pitch information, octave information is also
important, to establish the rising and falling of melodic contour.
When we restrict ourselves to directional profiles
(up, down, repeat),
we may include information concerning the slope, or degree of change,
the relation of the current pitch to the original pitch,
possible repetitions,
recurrence of pitches after intervening pitches,
and possible segmentations in the melody.
In addition, however, to support relevant comparisons
it seems important to have information on the rhythmic and harmonic
structure as well.

2
example(s) -- napster
Wasn't it always your dream to have all your music free?
Napster was the answer.
(But not for long.)
Napster is, as we learn in the
WikiPedia,
an online music service which was originally a file sharing service created by Shawn Fanning. Napster was the first widely-used peer-to-peer music sharing service, and it made a major impact on how people, especially college students, used the Internet. Its technology allowed music fans to easily share MP3 format song files with each other, thus leading to the music industry's accusations of massive copyright violations. The service was named Napster after Fanning's nickname.
However, Napster has been forced to become commercial.
So the question is:
is there life after napster?
Well, there is at least open source!
research directions -- musical similarity matching
An altogether different approach at establishing melodic similarity
is proposed in [Compare]. This approach has been followed
in the Meldex system [Meldex], discussed in
section [Web].
This is a rather technical section, that may be skipped on first reading.
The approach is different in that it relies on
a (computer science) theory of finite sequence comparison,
instead of musical considerations.
The general approach is, as explained in [Compare],
to search for an optimal correspondence between elements
of two sequences, based on a distance metric or measure
of dissimilarity,
also known more informally as the edit-distance,
which amounts to the (minimal) number of transformations that need to be applied
to the first sequence in order to obtain the second one.
Typical transformations include deletion,
insertion and replacement.
In the musical domain, we may also apply transformations
such as consolidation (the replacement of several elements
by one element) and fragmentation (which is the reverse of
consolidation).
The metric is even more generally applicable by associating a weight
with each of the transformations.
Elements of the musical sequences used in [Compare] are
pitch-duration pairs, encoded in base-12 pitch information
and durations as multiples of 1/16th notes.
The actual algorithms for determining the dissimilarity
between two sequences uses dynamic programming techniques.
The algorithm has been generalized to look for matching phrases,
or subsequences, within a sequence.
The complexity of the algorithm is ,
provided that a limit is imposed on the number of notes
involved in consolidation and fragmentation.
Nevertheless, as indicated in experiments for the
Meldex database, the resulting complexity is still
forbidding when large databases are involved.
The Meldex system offers apart from the (approximate)
dynamic programming algorithm also a state matching algorithm
that is less flexible, but significantly faster.
The Meldex experiments involved a database
of 9400 songs, that were used to investigate six
musical search criteria:
(1) exact interval and rhythm,
(2) exact contour and rhythm,
(3) exact interval,
(4) exact contour,
(5) approximate interval and rhythm, and
(6) approximate contour and rhythm.
Their results indicate that the number of notes needed
to return a reasonable number of songs scales
logarithmically with database size [Meldex].
It must be noted that the Meldex database contained
a full (monophonic) transcription of the songs.
An obvious solution to manage the complexity of
searching over a large database would seem to be the storage
of prototypical themes or melodies instead of complete songs.
indexing and analysis
There are several tools available that may assist us in creating
a proper index of musical information.
One of these tools is the Humdrum system,
which offers facilities
for metric and harmonic analysis,
that have proven their worth in several musicological investigations [Humdrum].
Another tool that seems to be suitable for our purposes,
moreover since it uses a simple pitch-duration, or piano-roll,
encoding of musical material, is
the system for metric and harmonic analysis described in [Meter].
Their system derives a metrical structure, encoded
as hierarchical levels of equally spaced beats,
based on preference-rules which determine the overall
likelihood of the resulting metrical structure.
Harmonic analysis further results in (another level of)
chord spans labelled with roots,
which is also determined by preference rules that take
into account the previously derived metrical structure.
As we have observed before, metrical and harmonic analysis
may be used to eliminate confounding information
with regard to the 'prototypical' melodic structure.
(C) Æliens
04/09/2009
You may not copy or print any of this material without explicit permission of the author or the publisher.
In case of other copyright issues, contact the author.