Jigsaw, An Object-Oriented Web server

Anselm Baird-Smith <abaird@w3.org>,
World Wide Web Consortium,
April 1995
.

Introduction

Most HTTP servers today, are tuned to serve files. This intimate relationship  between HTTP servers and their underlying file system has always affected their design in significant ways: the mapping from URL to their target resource generally relies on the file system hierarchy and the configuration information are generally understood as global to the whole server, or in some cases per directory, but no server today, allows you to fully configure your server on a per-resource basis.

In designing Jigsaw, the new experimental server from the World-Wide-Web consortium (written in Java), we took the opportunity of starting from scratch to test a new approach: each exported resource is a full object. This paper will briefly explain the Resource concept of Jigsaw and its configuration process.

Resources

Jigsaw is made of two different modules: the server engine deals with incoming connections, decoding requests and presenting them to the resource module which will perform them and produce either reply or error messages. The connection between these modules is described by an interface, so that you can replace them if needed. We will focus here on our sample implementation of the resource module.

What are resources

In the HTTP specification, a resource is described  as "An network data object or service which can be identified by a URI". The Jigsaw design takes these words for truth: each resource is an object inheriting from the Resource class. As such, they have an associated behavior, and some internal state:

An HTTPResource, is a resource tuned to handle the HTTP protocol: it inherits from the Resource class the ability to be saved and restored from persistent storage, along with the ability to describe and access its internal state. At this level of the class hierarchy, resources are able to handle an HTTP request by generating an appropriate Reply object: they can be plugged into the server engine. The set of attributes defined by the HTTPResource class is made of:

Content language
The language tag of the resource, describing the natural languages in which the content of the resource is emitted
Content length
The length of the content of the resource
Content type
The type of the content of the resource (e.g. text/html)
Expires
The date at which the resource expires
Icon
Any icon associated with this resource.
Last modified
The date of the last modification of the resource
Quality
A floating point number in the range [0,1], indicating the quality of the content of the resource
Title
The title of the resource

None of these attributes are mandatory: resources that generate their content dynamically, for example, might not have a predefined content-length or content-type; A resource whose content is negotiated will dynamically generate its quality attribute (typically by using the quality attribute of the selected variant).

Jigsaw resource class hierarchy

Although object oriented, we still want Jigsaw to be able to - at least - provide the basic functionality of a Web server: serve files in a filesystem hierarchy. This is handled by two particular sub-classes of HTTPResource: The DirectoryResource class handles file system directories and the FileResource handles files.

The DirectoryResource class inherits from the ContainerResource class the ability to manage a set of children resources. This last class specifies how resources are looked-up. The basic algorithm is the following: parse the requested URL into a sequence of components, then take the first of them, look it up in the server's root ContainerResource instance to obtain a target resource. If they are no more components to look up, then the target is the target for the URL, otherwise, the target should be itself a ContainerResource, to which we delegate the lookup of the remaining components. This lookup process takes a time proportional to the number of components in the requested URL, however, one can define its own ContainerResource subclass, and override the lookup algorithm all together.

The DirectoryResource maintains a ResourceStore to keep track of its children. This resource store manages the pickled version of each child, which is unpickled on-demand when looked-up. An optional ResourceStoreManager instance will keep track of accesses to loaded resource stores, in order to remove from memory the ones that have not been used for a while.

Most children of a directory resource instance will be FileResource instances. The file resource class defines the head, get and optionally the put methods. For the sake of efficiency, the file resource (optionally) caches the content-length and last-modified attributes (which it gets from the file system), so that if-modified-since request can be handled without any disk accesses.

These two classes, defines 90 percent of what a Web server is expected to do: serve files of some underlying file system. Jigsaw provides more resources: the PostableResource class is a basic class for handling the  HTML form's POST method, the CGIResource class handles CGI scripts, etc. Of course, the server maintainer has the ability to defines its own resource sub-classes, and install instances of them as children of, for example, some DirectoryResource instance. This results in a very flexible server extension API. Again Jigsaw provides some basic packages of classes to help you here, the form package, for example, provides a high level API to manage HTML forms, allowing you to register fields of given types, etc.

Resource filters

We have briefly described the basic design of Jigsaw and explained how it allows for common server features. However, at this point, one might wonder how authentication is handled.

Jigsaw defines another important subclass of Resource: the FilteredResource class. Both the ContainerResource class and the FileResource class inherit from this new class, which provides the ability to filter requests to some particular target, or set of targets. A ResourceFilter is itself a resource (i.e. it has its own class, and it defines its own set of attributes). Instances of ResourceFilter are attached to a particular FilteredResource instance, and filtersall access to them: at lookup stage, the filter's ingoingFilter method is called with the request as the only parameter, and on the way back, its outgoingFilter method is (optionally) called with both the request and the target generated reply.

Given this, authentication is implemented as a special filter, whose ingoingFilter method will authenticate the given request, using whatever algorithm it wants. Jigsaw provides one GenericAuthFilter class that allows to authenticate requests by IP addresses, using the Basic authentication scheme or the Digest authentication scheme, or a mix of the first with one of the others.

The concept of filters allows for much more than just authentication. Also provided are a DebugFilter (that will print requests and replies to some given target) and an AccessLimitFilter (that will limit the number of simultaneous requests to some targets). Logging can be implemented as filters too resulting in a powerful mean of getting detailed logging information for only a sub-space of the whole information space exported by the server. A PICSFilter has also been integrated into Jigsaw, allowing it to serve PICS rating labels with documents.

Configuration

Mapping files to resources, editing resource attributes

At this point, one might wonder how all these objects are created. The configuration process for Jigsaw was probably one of the most challenging problem (it is responsible for at least three of the entire re-design of Jigsaw). The main problem is the following: server administrators are used to configure their whole server through one single centralized configuration file, while for Jigsaw each resource might need some specific information. There are two central pieces to the actual solution we have tested:

The first piece allows for the editing of resource attributes, which is one part of the configuration process. The current version of Jigsaw comes with a generic form based editor that allows you to edit any resource on the server. Although not very user-friendly, this generic editor is mainly a proof of concept. It should be noted here, that this provides for a very fine-grain configuration: one is able, for example, to say that the file /pub/foo.html's content type is text/plain and not text/html.

The second piece tackles with the creation process of resource instances associated with directories or files. One trivial way to cope with this would be to ask for the server administrator to declare each files of its system, along with the requested information (e.g. the class of the resource to be created for the file, the file's attributes, etc.). This, of course, is not acceptable. By default (this can be turned off) the DirectoryResource class implements lookup in the following way: it first looks in its resource store for the given child, if this fails, it then goes to the resource indexer, and asks it for a default resource instance to export the file. The sample resource indexer can be configured to create particular instances of resource sub-classes based on the file's extension, or if the file is a directory, on its name.

Global server configuration

Jigsaw's configuration is not only made of its information space configuration: one would like to be able to configure the resource indexer, the authentication realms or the global server properties (such as its port number, etc.). One nice thing about the Jigsaw design, is that if you represent these configuration  information as instances of sub-classes of the Resource class, then they will inherit persistency, caching and the ability to be edited (which is really what you want).

Due to this, the current Jigsaw release provides form-based edition of the following pieces:

You can, in fact, install and configure Jigsaw without having a text editor.

Conclusion

We have briefly described Jigsaw architecture, and have shown how an object based server can implement the basic functionalities expected from a Web server, and more. We believe that the most important characteristics of our design are:

Persistency of resource instances
Web objects should be think of as persistent objects right from the begining: they have to persist across server invocation (e.g. the server should be able to pickle and unpickle them as needed).
Edition of resource instances
The configuration of Jigsaw  is done through the edition of resource attributes. Jigsaw 's design emphasis on this by having each resource embedding a description of itself.

Other design considerations have played an important role in Jigsaw's design, in particular the ability to unpickle resources only on-demand (so that the server don't start by unpickling its whole information space), and the caching mechanism for managing the number of unpickled resources at a given time. To conclude, we would like to emphasis on the fact that given this design, Jigsaw's configuration is no more in one single configuration file, but rather spread across the various resources instances.

Jigsaw is still in its alpha stage, and will be available to members by May, and to the public by June.


Anselm Baird-Smith
$Id: Position.html,v 1.1 1996/04/08 14:03:35 abaird Exp $

-----