Content-based retrieval -- related work

contents abstract intro web ACOI detector query retrieval conclusions References

The ACOI architecture, as sketched in the previous sections, allows for musical feature detection in an integrated framework. We aim at providing content-based retrieval facilities in the musical domain for a large category of users over a very large database of musical works.

As concerns musical content, at least for most genres, it appears that we should focus primarily on melody, since, as phrased in [Concepts]:

"It is melody that makes music memorable: we are likely to recall a tune long after we have forgotten its text."

Other features, content-based as well as descriptive, may however be used as additional filters in the proces of retrieval.

Melodic searching and matching has been explored mainly in the context of bibliographic tools and for the analysis of (monophonic) repertories [Similarity]. As described in section Web, many of these efforts have been made available to the general public through the Web. Challenges for the near future are, however, to provide for melodic similarity matching on polyphonic works, and retrieval over very large databases of musical fragments.

In this section we will look in somewhat more detail at the problem of melodic similarity matching. In particular, we will discuss representational issues, matching algorithms and additional analysis tools that may be used for musical information retrieval.

slide: Twinkle, twinkle ...

Melodic similarity

Consider the musical fragment depicted in slide Kortjakje, Twinkle, twinkle little star (known in the Dutch tradition as "Altijd is Kortjakje ziek"), which has been used by Mozart for a series of variations [Mozart]. Now, imagine how you would approach establishing the similarity between the original theme and these variations. As a matter of fact, we discovered that exactly this problem had been tackled in the study reported in [Compare], which we will discuss later. Before that, we may reflect on what we mean by the concept of a melody. In the aforementioned variations the original melody is disguised by, for example, decorations and accompaniments. In some variations, the melody is distributed among the various parts (the left and right hand). In other variations, the melody is only implied by the harmonic structure. Nevertheless, for the human ear there seems to be, as it is called in [Concepts], a 'prototypical' melody that is present in each of the variations.

When we restrict ourselves to pitch-based comparisons, melodic similarity may be established by comparing profiles of pitch-direction (up, down, repeat) or pitch contours (which may be depicted graphically). Also, given a suitable representation, we may compare pitch-event strings (assuming a normalized pitch representation such as position within a scale) or intervallic contours (which gives the distance between notes in for example semitones). Following [Concepts], we may observe however that the more general the system of representation, the longer the (query) string will need to be to produce meaningful discriminations. As further discussed in [Concepts], recent studies in musical perception indicate that pitch-information without durational values does not suffice.

Representational issues

Given a set of musical fragments, we may envisage several reductions to arrive at the (hypothetical) prototypical melody. Such reductions must provide for the elimination of confounds such as rests, repeated notes and grace notes, and result in, for example, a pitch-string (in a suitable representation), a duration profile, and (possibly) accented note profiles and harmonic reinforcement profiles (which capture notes that are emphasized by harmonic changes). Unfortunately, as observed in [Concepts], the problem of which reductions to apply is rather elusive, since it depends to a great extent on the goals of the query and the repertory at hand.

As concerns the representation of pitch information, there is a choice between a base-7 representation, which corresponds with the position relative to the tonic in the major or minor scales, a base-12 representation, which corresponds with a division in twelve semitones as in the chromatic scale, and more elaborate encodings, which also reflect notational differences in identical notes that arise through the use of accidentals. For MIDI applications, a base-12 notation is most suitable, since the MIDI note information is given in semitone steps. In addition to relative pitch information, octave information is also important, to establish the rising and falling of melodic contour.

When we restrict ourselves to directional profiles (up, down, repeat), we may include information concerning the slope, or degree of change, the relation of the current pitch to the original pitch, possible repetitions, recurrence of pitches after intervening pitches, and possible segmentations in the melody. In addition, however, to support relevant comparisons it seems important to have information on the rhythmic and harmonic structure as well.

Similarity matching

An altogether different approach at establishing melodic similarity is proposed in [Compare]. This approach has been followed in the Meldex system [Meldex], discussed in section Web. The approach is different in that it relies on a (computer science) theory of finite sequence comparison, instead of musical considerations. The general approach is, as explained in [Compare], to search for an optimal correspondence between elements of two sequences, based on a distance metric or measure of dissimilarity, also known more informally as the edit-distance, which amounts to the (minimal) number of transformations that need to be applied to the first sequence in order to obtain the second one. Typical transformations include deletion, insertion and replacement. In the musical domain, we may also apply transformations such as consolidation (the replacement of several elements by one element) and fragmentation (which is the reverse of consolidation). The metric is even more generally applicable by associating a weight with each of the transformations. Elements of the musical sequences used in [Compare] are pitch-duration pairs, encoded in base-12 pitch information and durations as multiples of 1/16th notes.

The matching algorithm can be summarized by the following recurrence relation for the dissimilarity metric. Given two sequences

A = a₁,…,a_m and

B = b₁,…,b_n and

B = b₁,…,b_n, we define the distance as

d_ij = min {

d_i−1,j + w (a_i,0)

d_i,j−1 + w (0,b_j)

d_i−1,j−1 + w (a_i,b_j)

d_i−k,j−1 + w (a_i−k+1,…,a_i,b_j). 2 \leqslant k \leqslant i

d_{i−1,j−k+1} + w (a_i,b_j−k+1,…b_j) 2 \leqslant k \leqslant j

with

d_ij = min {

d_i−1,j + w (a_i,0)

d_i,j−1 + w (0,b_j)

d_i−1,j−1 + w (a_i,b_j)

d_i−k,j−1 + w (a_i−k+1,…,a_i,b_j). 2 \leqslant k \leqslant i

d_{i−1,j−k+1} + w (a_i,b_j−k+1,…b_j) 2 \leqslant k \leqslant j

and

d_ij = min {

d_i−1,j + w (a_i,0)

d_i,j−1 + w (0,b_j)

d_i−1,j−1 + w (a_i,b_j)

d_i−k,j−1 + w (a_i−k+1,…,a_i,b_j). 2 \leqslant k \leqslant i

d_{i−1,j−k+1} + w (a_i,b_j−k+1,…b_j) 2 \leqslant k \leqslant j

The weigths

d_ij = min {

d_i−1,j + w (a_i,0)

d_i,j−1 + w (0,b_j)

d_i−1,j−1 + w (a_i,b_j)

d_i−k,j−1 + w (a_i−k+1,…,a_i,b_j). 2 \leqslant k \leqslant i

d_{i−1,j−k+1} + w (a_i,b_j−k+1,…b_j) 2 \leqslant k \leqslant j

are determined by the degree of dissonance and the length of the notes involved. The actual algorithms for determining the dissimilarity between two sequences uses dynamic programming techniques. The algorithm has been generalized to look for matching phrases, or subsequences, within a sequence. The complexity of the algorithm is

d_ij = min {

d_i−1,j + w (a_i,0)

d_i,j−1 + w (0,b_j)

d_i−1,j−1 + w (a_i,b_j)

d_i−k,j−1 + w (a_i−k+1,…,a_i,b_j). 2 \leqslant k \leqslant i

d_{i−1,j−k+1} + w (a_i,b_j−k+1,…b_j) 2 \leqslant k \leqslant j

, provided that a limit is imposed on the number of notes involved in consolidation and fragmentation.

Nevertheless, as indicated in experiments for the Meldex database, the resulting complexity is still forbidding when large databases are involved. The Meldex system offers apart from the (approximate) dynamic programming algorithm also a state matching algorithm that is less flexible, but significantly faster. The Meldex experiments involved a database of 9400 songs, that were used to investigate six musical search criteria: (1) exact interval and rhythm, (2) exact contour and rhythm, (3) exact interval, (4) exact contour, (5) approximate interval and rhythm, and (6) approximate contour and rhythm. Their results indicate that the number of notes needed to return a reasonable number of songs scales logarithmically with database size [Meldex]. It must be noted that the Meldex database contained a full (monophonic) transcription of the songs. An obvious solution to manage the complexity of searching over a large database would seem to be the storage of prototypical themes or melodies instead of complete songs.

Indexing and analysis

There are several tools available that may assist us in creating a proper index of musical information. One of these tools is the Humdrum system, which offers facilities for metric and harmonic analysis, that have proven their worth in several musicological investigations [Humdrum]. Another tool that seems to be suitable for our purposes, moreover since it uses a simple pitch-duration, or piano-roll, encoding of musical material, is the system for metric and harmonic analysis described in [Meter]. Their system derives a metrical structure, encoded as hierarchical levels of equally spaced beats, based on preference-rules which determine the overall likelihood of the resulting metrical structure. Harmonic analysis further results in (another level of) chord spans labelled with roots, which is also determined by preference rules that take into account the previously derived metrical structure. As we have observed before, metrical and harmonic analysis may be used to eliminate confounding information with regard to the 'prototypical' melodic structure.

Discussion

Sofar our review of related research indicates that we should rely on approximate matching for determining musical similarity in response to a query. Nevertheless, it is clear that we should strive for reducing the actual information stored, In addition, we should aim at an hierarchically organized index of musical information, for example based on melodic contours of thematic fragments, to reduce the actual search involving comparisons by approximate matching.

Since it is not our intention to develop any new technology, we will very likely rely on the software that has been developed in the projects mentioned, in particular the Humdrum suite of tools, the metric and harmonic analysis tools described in [Meter], and the matching software developed for the Meldex system (that has been kindly made available to us by R. McNabb). Together with the feature detection architecture described in section Detector, this should give us the technology with which to create a powerful information retrieval system in the musical domain. In addition we wish to explore the usage of timbre (i.e. the distribution of instrument usage) and genre (as indicated by contextual information, and possibly statistical analysis of the musical information in the database), to improve on the effectiveness of searching for particular songs or works.