topical media & game development

talk show tell print

object-oriented programming

The DejaVU experience -- jamming (on) the Web

The hush library was originally developed to have an easy-to-use and flexible GUI library for the Software Engineering practicum at the Vrije Universiteit. New components and extensions were created by students and research assistants, including components for (Csound-based) music, video, (OpenGL-based) VRML and MIDI. Since the Web was then in its early stages, we also built a Web browser and created a number of experimental extensions to enhance the functionality of the Web with new media and communication facilities. See slide dv-research.

The DejaVU experience


slide: The DejaVU experience

Our approach was simple but effective. First we created the components that provided the desired functionality, then we provided a script interface for these components, and finally we provided new (HTML-like) tags for the syntactic description of the new functionality. We used stylesheets to separate the syntactical description from its operational realization. These stylesheets were written in Tcl. As the Web was maturing, we did not pursue this line of research. Nevertheless, since this work still represents a valid approach, we will discuss one of my favorite extensions, an extension that allows for jamming (on) the Web.

Jamming (on) the Web

Compared to textual and graphical material, the capabilities of the Web for musical information are rather poor. The embedding of music, or sound in general, rarely goes beyond links to raw audio and MIDI files or to streamed audio connections. To display a musical work, HTML authors have to use images containing the score. All of these solutions are very low level as they basically regard music as being just sound (or a picture in the case of a score).

True score files are usually a few orders of magnitude smaller, and the audio signal can be synthesized at the client side at any appropriate sample rate. Additionally, a high-level description of music provides the browser with far more information when compared to the raw samples. In previous work we proposed to transmit musical scores (instead of the raw samples) across the Internet and to add sound synthesis functionality to Web browsers, see  [Music], and the use of generic SGML to encode structured documents, see  [Animate].

In this section, we describe an experimental framework that offers many of the ingredients for true networked music support including facilities for editing, displaying and playing musical scores as well as facilities for high-level exchange of musical material and real-time collaborative work involving music and sound. Our approach is based on traditional music notation and on MIDI for playing facilities. The framework builds upon the work done in the DejaVU project at the Software Engineering section of the Vrije Universiteit, which resulted in a suite of components for developing distributed Web-aware hypermedia applications.



slide: The score in a plugin

Scores on the Web

The most ambitious markup language for the dissemination of music on the Web is probably the Standard Music Description Language, described in  [SMDL]. SMDL expresses a musical work in terms of four basic domains. The logical domain -- the primary focus of SMDL -- is, according to the standard, describable as `the composer's intentions with respect to pitches, rhythms, harmonies, dynamics, tempi, articulations, accents, etc.'. The central element of the logical domain, the cantus element, is an abstract, one-dimensional finite coordinate space onto which musical and non-musical events can be scheduled. This allows for the inclusion of any dependent time sequences (such as automated lighting information) in a musical work. The standard uses HyTime,  [HyTime], hyperlinking to specify the relations with information from the other three domains: the gestural domain -- describing any number of particular performances (e.g. MIDI files or digital audio) of the work, the visual domain -- describing any number of scores (a printable/displayable version) of the work, and the analytical domain -- comprising any number of theoretical analyses or commentaries about the information in the three other domains. The addressing power of HyTime makes it possible to link directly into information expressed in other formats, including MIDI files, digital audio recordings or specific score notations, without modification. Our approach is more modest and we deploy a much simpler SGML representation, primarily geared to encode printable/displayable versions of the score (i.e. SMDL's visual domain). However, the format used is sufficiently rich to be able to generate a playable MIDI representation as well. Information which is usually added by performers (in SMDL this is represented in the gestural domain), such as explicit interpretations of tempi, articulations and accents, are not supported in the current version.


    <SCORE>
      <TITLE>Corrente</TITLE>
      <COMPOSER>Antonio Vivaldi</COMPOSER>
      <STAFF>
        <MEASURE Sig="3,4" Key=F Clef=Gclef>
          <NOTE Pos="1,3" Stem=down>d6 4 0
          <REST Pos="3,6">C6 8 0
          <NOTE Pos="4,6" Stem=up>a5 8 0
          <NOTETUPLE Stem=down>
            <NOTE Pos="5,6">f5 8 0</NOTE>
            <NOTE Pos="6,6">a5 8 0</NOTE>
          </NOTETUPLE>
        </MEASURE>
        ...
      </STAFF>
    </SCORE>
  

To support display and editing of SGML scores on the Web, we developed the Amuse score editor as a plugin for our Web browser (see slide jamming-plugin). The editor has a graphical user interface and does not require any SGML knowledge from the user. Above is a fragment of an example score file, for which the associated style sheet with a CSS1-like syntax is shown below. Both documents can be edited by the graphical score editor plugin. Changes in the style sheet are dynamically reflected in the display of the score. A significant enlargement of the page-width parameter, for example, will allow for more measures on a single staff, and will result in a redraw of the complete score.



    SCORE {
      margin-left : 30;
      margin-right : 30;
      margin-top : 80;
      margin-bottom : 20;
      page-height : 1000;
      page-width : 920;
    }
    TITLE {
      title-align : Center;
      title-font : -*-Times-Bold-R-Normal--*-240-*;
    }
    COMPOSER {
      composer-align : Center;
      composer-font : -*-Times-*-R-Normal--*-180-*;
    }
  

slide: An associated style sheet

Playing on the Web

The playback facilities of our framework are centered around the MIDI server. After registering as a MIDI client, the score editor is able to send the generated MIDI version of the score to the separate MIDI server. The MIDI server builds upon a socket-level client/server library and a class library that provides the basic functionality for MIDI devices, MIDI clients and the MIDI server. Note that the audio device is usually an exclusive resource, and by connecting to a single MIDI server, several client applications can have simultaneous access to a single MIDI output device. The functionality of the MIDI server comprises:

  • registering and unregistering MIDI devices,
  • routing MIDI data between clients and MIDI devices, and
  • administration and security checks.
When a MIDI device is registered, a cookie is given out that may be used by a client to request the server to set up a virtual connection with that device. The cookie also prohibits unauthorized clients from accessing a MIDI output device.

Collective improvisation

We developed the keyboard applet, depicted in slide midi-keyboard, as an alternative input device to be able to send `live' MIDI data to our server. Since multiple applications can have access to the MIDI server, a user can have a score edit session running, and simultaneously be playing a keyboard applet.

To engage in a jam session, the keyboard applet connects to the JamServer instead of the MIDI server. The JamServer acts as the central point of a jam session, keeping track of all clients engaged in the session.



slide: The jam server

To start a jam session, all jam clients connect to a single JamServer and send it their MIDI data. The JamServer is connected to one or more MIDI servers, as depicted in slide jam-server. By having the JamServer separate from the MIDI server itself, the latter is relieved from the burden of jam session management. Every connected MIDI device will receive all the MIDI data submitted by the jam clients. This data is relayed to these devices by the MIDI server(s), through the virtual MIDI data stream that is created when registering as a jam client.

In slide jam-server we see three jam clients connected to a single JamServer (on machine B). The MIDI server is running on the same machine as the JamServer. Both the clients on machine A and C have registered a MIDI-out device (a software sound synthesis MIDI program developed for Solaris) with the MIDI server on B. The user on A has additionally registered a MIDI-in device (the keyboard). Using the keyboard, the user on A can contribute to the jamming. The score editor on C is directly connected to the MIDI server and is not engaged in the jam session. The MIDI server will redirect MIDI requests from the score editor only to the MIDI device on C.

Measurements

To give an indication of the speed and response times of our system, we have used a special jam client, jamping, that measures the average delay between sending a MIDI message to the JamServer and receiving the same message on a connected MIDI device. For a 486DX2-66 PC with Linux with one client and both servers local, this resulted in a round-trip-delay time of 5.5 milliseconds. A similar setup on a Sparc-5 with Solaris resulted in 2.6 milliseconds. A similar configuration with the JamServer on a LAN gave 3.5 milliseconds average round-trip-delay time. Nevertheless, with a server in Amsterdam and a client in Sweden, we obtained an average round-trip-delay time of 87 milliseconds, with a peak of 1.6 seconds. Clearly, the length and variability of round-trip-delay times may be a prohibiting factor for jamming on a global scale.

Architecture of the Web components

The software described so far was developed for our SGML-based Web browser as an extension to the hush class library,  [Animate].


slide: Web components

In slide jam-browser an overview is given of the basic Web-related components of the hush library. The browser provides the top-level user interface for all Web components, including a viewer, a scrollbar, navigation buttons (back, forward, home, reload) and an entry box to enter URLs. The netclient, web and MIMEviewer components form the conceptual base of our approach of connecting to the Web:

  • viewer -- a widget for the inline display of several MIME types, such as HTML, VRML and Amuse score formats.
  • web -- an extension of the MIMEviewer with history and caching.
  • netclient -- the interface to the Internet, supporting several protocols.

The MIMEviewer component provides an abstract interface to viewers for several MIME types. The web widget only knows about the (abstract) MIMEviewer class while the actual functionality is implemented in several concrete viewer classes, one per MIME type. Specific viewers for new MIME types can be plugged dynamically into the MIMEviewer object.

When the MIMEviewer gets the instruction to display a document of a certain MIME type, it changes its role and becomes a viewer for that particular MIME type. This dynamic role-switching idiom is discussed in more detail in chapter 2. As a result, the addition of new viewers can be done without changing the web widget.

The netclient component builds the bridge between the local web widget and the World Wide Web by providing an abstract and uniform interface to network (file) access and transport protocols. In the realization of the netclient components we have employed the dynamic role-switching idiom in the same way as in the implementation of the MIMEviewer components.

The web object creates a MIMEviewer object and tells which role it should play (e.g. SGML, Amuse or VRMLviewer). This role can be changed during the lifetime of a single MIMEviewer object by calling a method to change its role. A browser typically uses only one single MIMEviewer object that changes its role according to the type of data that should be displayed. The SGMLviewer is the default viewer, it displays generic SGML documents by using style sheets for each document type. By default, a style sheet for HTML is used. Since our generic SGMLviewer is better suited to textual documents and does not offer editing support, we developed a separate viewer/editor to process our Amuse/SGML score files.

Since the MIMEviewer provides no network functionality at all, it generates events whenever it needs to retrieve data pointed to by a URL. Such events are generated as a response to user interaction (e.g. clicking an anchor) or to fetch inline data during the parsing process. These events are typically handled by the web component which plays a central role in our approach because it combines the functionality of the MIMEviewer and the netclient components. Additionally, the web component adds a history and caching mechanism to the MIMEviewer. The web component's behavior is similar to the standard widgets of the hush framework, and can be conveniently used as a part of an application's GUI. Because the web widget has both a C++ class interface and a script interface, it is easy to create, or extend, applications with Web functionality.



(C) Æliens 04/09/2009

You may not copy or print any of this material without explicit permission of the author or the publisher. In case of other copyright issues, contact the author.