Active Web Documents -- an example

As an example, we will illustrate how to deploy the SGML related components used to parse and display SGML encoded documents. The SGMLviewer is based on SP, James Clark's conforming SGML parser  [SP]. Web applications benefit from the use of such a parser in a number of ways:

Extending Document Types

We will start by illustrating how to extend the default HTML document type definition. The document instance given below is specified in HTML but employs in addition two extensions for active documents. The first extension is a video tag, which is used to display inline video fragments. The other is an applet tag, used to embed small applications written in a scripting language. We use Tcl in the examples. In the example below, the applet displays some notes that are played when the user clicks on the image. The first line of the example defines the type of the document (demo). It is specified in a separate document type definition (DTD) that we will describe below. The parser automatically retrieves the DTD from the Web if its location is specified by a URL. The next line illustrates the use of an entity declaration, an SGML mechanism used here as a primitive macro facility defining the title of the document. The title entity is used twice, in the title and h1 tag, and will be expanded by the parser. The third line specifies the style sheet that is needed to display the contents of the document. The use of style sheets will be explained later. The video element requires a src attribute defining the location of the video file. Finally, the applet tag is used to inline embedded script code defining a small musical applet. The image and the note files are located in the directory specified in the data attribute of the applet tag.

Whether active document elements are to be defined by a new, specific tag or by the more general applet mechanism is a matter of taste. A new tag requires modification of the DTD and style sheet but describes the element in a more declarative way, which gives the application more freedom in displaying the contents. For example, a browser may decide to display a text alternative if the local platform does not support video.

Extending the DTD

Recall that a document type definition defines the structure of a document by describing the elements and attributes that can be interspersed as tags with the document content. The following DTD extends the (draft) HTML 3.0 DTD  [HTML3] with the video and applet elements required for the example above. The first line extends the special parameter entity defined by the HTML DTD with video and applet elements, enabling the use of video and applet tags where image or anchor tags are allowed. The second definition of special (in the HTML DTD) will be ignored by the parser. In the second line, the SGML parameter entity mechanism is used to include the draft HTML 3.0 document type definition, which is referenced by a formal public identifier. The demo element consists of the sub-elements defined by the HTML DTD. The two O characters in the element declaration indicate that begin and end tag of the demo element may be omitted, since the begin and end of the document can be derived by the parser. In contrast, the begin tag of a video or applet element is mandatory, the end tag optional. The video and applet elements have optional (implied), mandatory (required) and default attributes (e.g. the lang attribute defaults to Tcl).

The information contained in the DTD is used by the parser to generate a complete and validated document instance. Note that this task could be performed by an HTTP-server as well, which would significantly simplify the design and implementation of web clients. Therefore, there are strong arguments to add SGML functionality to servers as well  [SperGold94].

Style sheet

Style sheets define how the various elements should be processed. The hush browser (see figure fig:browser) defines a default style for HTML elements. However, these styles can be redefined and extended by a document instance using a special processing instruction, notated as <?stylesheet url>. The browser retrieves the URL specified in the processing instruction and uses it to display the contents of the document. The example specifies a URL to a style sheet that describes how to process the new video tag. Recall that processing instructions are application dependent, so the parser passes the text in a processing instruction directly to the application. At the moment, we use an experimental style sheet language based on Tcl. An example of a style sheet fragment that specifies how the title tag should be processed, is given below. While the style sheet mechanism needs some refinement, our approach supports the extension of existing document types and allows for extensive experimenting with the (many) new tags proposed for the HTML 3.0 standard.

Creating New Document Types

The document instance of the previous example was very similar to plain HTML documents, which made it worthwhile to reuse the original HTML DTD and style sheet. However, for some applications HTML is not suited at all and a completely new document structure is needed. The next example shows a document instance of a simple musical application. The first line defines the (filename of) a new document type definition. The next line contains an entity definition describing a G7 chord. An SGML processing instruction is used to specify the filename of the style sheet. The root of the document hierarchy is the score element, consisting of several chords. Chords are build of notes, which are described by single characters. The first two chords use the entity defined before, specifying the notes of a G7 chord. The third occurrence of chord describes a C major chord.

A new DTD

The DTD corresponding to the simple musical document given above defines the structural elements and their attributes. When the application processes the document, the parser will fill in the default duration for all notes, resolve the entity definition and add the missing end tags for the notes and chords.

The DTD defines three structural elements: a score containing several chords, a chord containing several notes and a note consisting of data. Both chord and note have three attributes: id, name and duration. The first two are optional and the last has a default value of 4. Note the difference between the id attribute, which has to be a unique identifier, and the name attribute, which can be an arbitrary string.

Style sheet

Playing the document by a web browser, does not necessarily involve displaying the data visually. The simple style sheet shown below simply collects notes and chords and plays them by using the play command. Note that most of the timing relations are implicit in the document. For example, the notes within a single chord are to be played in parallel, and the chords themselves are to be played sequentially. However, this is not explicitly defined by the document instance or DTD and can only be intuitively derived from the element names. Even in the style sheet below, these timing relations remain implicit. The procedures corresponding to the open and close tags build a string representation of the score. At the opening of the score element the string is initialized with a command defining the tempo at 120 beats per minute. During the parsing process the string is extended with the parsed notes. After the last chord has been parsed the resulting string is: t120 (g<b<f< r)(g<b<f< r)(c<e<g< r). This string is played after the score end tag has been encountered. The Tcl command play used to play the notes is provided by the hymne extension of hush  [OssEl94,OssEl95].