Multivalent Documents:
A New Model for Networked Documents

Thomas A. Phelps and Robert Wilensky
University of California, Berkeley
Department of Electrical Engineering and Computer Science, Computer Science Division
{phelps,wilensky}@cs.berkeley.edu

Multivalent Document Model

Still today, in the midst of the enormous technological attention paid to the World Wide Web, most of its HTML pages would lose little, principally hyperlinks and forms, if reduced to paper. Even in an attempt to reproduce the capabilities of paper, a state-of-the-art Netscapeism-enhanced HTML 3.0 is not sufficient for specialized documents, e.g., ones in topological mathematics. New WWW developments such as Java look promising, but they operate independently of the rest of the document, in the case of Java as islands of interactivity in sea of HTML; this makes something as seemingly straightforward as an outline viewer difficult or at least laborious to implement as Java has no access to the HTML parser and layout engine.

We believe that there is much to be gained by fundamentally rethinking digital documents, especially documents with complex content, as "Multivalent Documents" [ 1 , 2 ]. In contrast to monolithic document formats with concomitant complicated editors, multivalent documents comprise multiple "layers" of distinct but intimately related content. Small, dynamically-loaded program-objects, or "behaviors", activate the content and work in concert with each other and layers of content to support arbitrarily-specialized document classes. The "type graph" binds together the disparate layers and behaviors of a multivalent document, pieces which for a single document can potentially come over the network from various geographic locations, to present the user with a single unified conceptual document. The multivalent documents paradigm embraces with digitalness of digital documents, especially their potential for radical customization by both author and reader; indeed, a prime example of the power of the paradigm is the "deep" user annotations that can manipulate the content of the document. The prototype multivalent implementation is written in Java and operates on the World Wide Web, but gives Java control of the entire document, both content and behavior, at a fine-grained level.

In relationship to today's Web, there is no need to push increasingly-specialized new tags through the standards process, beyond a base typing convention, as the code needed to interpret, render and subsequently manipulate them is dynamically fused in with first class status. In the multivalent model, Web browsers are transformed into caches of frequently-used code. However, Web indexes take on increasing importance, as URLs are no longer direct pointers to self-contained documents, but rather conceptual names (similar to Kahn-Wilensky handles) for which its pieces must be found before the document can be reconstituted. For simply replicating existing documents, such indexes are less important as all the pieces can be stored together at a single server, but for more aggressive applications, say for an ongoing discussion of an important document conducted as collections of user annotations or "commentaries" on the document, a Web index becomes essential.

The Multivalent Documents model is an document content and program object composition mechanism. It is a higher-level protocol than CORBA, and more fine-grained that OpenDoc/OLE or Java as it is currently used with HTML. In operation, the user would ask a meta-indexer for document pieces related to a conceptual document name. Each layer of content and program behavior is typed as though they were functions in a strictly-typed programming language. The type graph determines valid arrangements, (default or customized organizations can be set by a distinguished meta-layer), and the user can dynamically rearrange pieces through the user interface.

In short, the multivalent model adds another dimension, "depth", to current conceptions, a dimension that carries the extended content of a document in a rawer form so that the receiver can subsequently manipulate it in pursuit of quicker insight or greater understanding into the message being communicated. Writing simple layers with existing editors, authors "build up" to a document of arbitrary complexity. Behaviors can be added and specialized as needed to achieve full expression of the content. (Of course, generally useful behaviors can be reused across documents.) As well, general purpose behaviors can be replaced with superior versions, perhaps available commercially, at a fine-grained level. Like the author, the reader of a document constitutes it from its parts -- possibly, perhaps usually taking advantage of additional layers and behaviors written by authors other than those involved with the initial document (as with distributed user annotations).

Current Implementation

It is instructive to examine a concrete instance of the paradigm. The first area of application of the multivalent model is to a corpus of 100,000 scanned page images from Berkeley's Digital Library project [ 3 ]. In transforming paper documents into digital form, scanning is often the first step. Furthermore, it is important to maintain and perhaps primarily work with the image of documents with historical value -- illuminated manuscripts and Picasso's sketchbooks, for example -- for in general the image is the only digital representation of a document that preserves its full information content including exact page layout, illustrations and special characters. Moreover, as a technical exercise, scanned pages are most recalcitrant form of a digital document: it is just a two dimensional array of pixels, little different from a picture of a sunset as far as a Web browser is concerned.

In the multivalent document model, however, the image becomes just one layer of the document. Optical character recognition (OCR) is run on all pages to produce the ASCII equivalents of the characters, and a third layer maps between characters and their bounding boxes on the screen. Incremental layers are added to these core three to produce a hybrid image/OCR document with greater functionality than is possible with either image or OCR alone. To see the demonstration, click here with a Java-compliant Web browser, at this time Netscape 2.0. Space constraints preclude a full explanation of the prototype in this document; click on the "help" button for a guided tour of functionality including "OCR select and paste", in which a region of the image is highlighted and the corresponding text characters are copied out; alternative-text select and paste, in which regions with, say, bibliographic entries or mathematical equations can retrieve alternative representations that may be more appropriate for the task such a BibTeX database entry for the former or Mathematica code or TeX markup for the latter; searching; hyperlinks; definitions; and "table sorting", in which a click on any columns of the table sort the table by rearranging the pixels in the image.

Although a number of these features are novel, especially considering that we are operating on a scanned page image, the key point is that the multivalent model makes possible easy extension in the content and functionality that can be added incrementally. This content and functionality can be added by author or reader-author to the degree needed for that particular document, and at a later time so that, for instance, an investment in a language translation to Swahili can be added if and only if the need arises.

In general multivalent paradigm has nothing to do with images, and indeed an "adaptor" or "driver" is under development for HTML, and TeX DVI is under consideration. Besides being immediately useful in itself, HTML would as tractable proof of concept for SGML; likewise, DVI for appearance-based formats such as PostScript. The hope is to spark general community or interested companies to work on these more involved document description formats. We are currently implementing the type graph and support for user annotations.

API Requirements for Servers and Behaviors

How does an author write a Multivalent Document? At its most abstract level, the Multivalent model comprises four pieces: layers of content, active behaviors, the type graph, and servers of content and behaviors. Of these, layers are kept simple and use existing formats (preserving the investment in editing tools); complexity is built up from multiple simple layers. The type graph is a document-independent piece of the system infrastructure. The key new APIs for the Multivalent author come with the server and behaviors. These APIs are under active development; here we present current thinking.

Although the multivalent framework can work with unenhanced servers, cooperative servers will be running a database and versioning software. The database is used to respond efficiently to queries for all relevant layers and behaviors of a given document, or all relevant behaviors that support or can supplement ones directly declared for the document. It returns a "type descriptor", or summary of the object, that the type graph can use, with the full content of the object supplied on demand. Versioning software is important for handling changes in document layers: with layers mapped to other layers, versioning can help recover the proper mapping. More details can be found in [ 2 ].

To introduce a new data format in a layer, the programmer writes a driver that provides a base set of manipulations common to all documents. These manipulations include: insertion and supression of content, extraction of content, and layout among others. Behaviors are written against this abstraction so that need be writting only once and they operate on everything from scanned page images to HTML.

For most common documents, existing behaviors can be reused satisfactorily. For specialized documents with novel requirements, the author has the opportunity to introduce new a behavior that may either introduce new functionality, replace existing functionality with an improved version. Behaviors are written in Java and therefore require a skilled programmer. Once written, however, they can be marshalled as black boxes by authors.

In general, behaviors can use the information contained in several layers and the services other other behaviors, interact with the user, and manipulate the appearance of the document. Thus in writing a behavior, the programmer must declare required layers and behaviors for use by the type graph. The behavior must declare its presentation in the user interface (though the type graph determines whether behaviors are activated or not and mediates conflicts for keyboard and mouse events) as well as providing online help. For those behaviors that affect a region of a document, the behavior must report is bounding box, so this area can be drawn on the document to aid the user. Because multiple behaviors will be active simultaneously on the same region, a behavior must be able to pass along unrelated events to behaviors of lower priority. For behaviors such as table sorting that transform the layout of a document, the behavior must transform the coordinates of the event on the way down to other behaviors and, likewise, transform the appearance of the document on the way up.

Applications

The Multivalent Documents paradigm would serve well as the next-generation general document model for the Web. Furthermore, we have identified a number of domains ideally suited to a layered model of content such as the following.

User Annotations Today I can comment on your pages by including a link on my page pointing to you. Despite its limitations, this is widely done and in fact some pages are simply indexes to other sites. We believe that a high quality user annotation facility that can merge documents and expressive annotations from a variety of sources will be an area of immense popularity. As previously mentioned, we believe the Multivalent Documents paradigm can support this especially well, and add novel functionality at the client side. Annotations in the multivalent framework can be the familiar text note or graphic markup of the page, but more sophisticated annotations utilize knowledge of the structure of the document. In the case of the prototype on scanned page images, given additional server support for annotation sets from software such as Stanford's ComMentor system or work done by W3C Annotations working group, the reader could add his or her own hyperlinks or natural language translation as alternative text, to give just two illustrations, on the document itself. Considering a more complex document such as the Talmud, with its elaborate page layout with the central text in the center of the page and Rabinnic commentaries surrounding it, the reader is instructed to read each commentary as if it were the only one on the page. We can support this with a structure layer that maps the locations and authors of various commentaries and an incremental behavior can literally erase all commentaries but the desired one.

Geographic Information Systems A layered content is an explicit part of GIS models. A layer comprises a specific type of attribute for a given geographic region, attributes such as political boundaries, bodies of water, roads, electrical wiring, and so on. The user selects the desired attributes and then overlays the layers graphically to compose a customized map. Not only does the multivalent model resonate with this data model, but the fundamentally networked basis of multivalent documents means that dynamic data like temperature readings and precipitation measurements can be incorporated without special accomodation.

Video Subtitling For the forseeable future video is a large data stream that taxes both storage and networks. Were videos to be subtitles as films are--by imprinting the text on the images-- we would have to create a fresh copy of the video for each language or imprint multiple languages on the same copy. By placing the video and various subtitles in various layers and introducing a mapping layer between the two, a single video can be combined with any number of languages at small incremental cost for each language. Moreover, as with film subtitling, the subtitles obscure the image, an image that may be drastically shrunk to reduce resource demand, and the viewer is forced to read the text before it vanishes, even if the visual is especially captivating at the time. By maintaining separate streams for video and text, the text could be placed below the image or the running dialog text could be scrolled in a window that allows the reader to lag the dialog for some time.

References