Anselm Baird-Smith <abaird@w3.org>,
World Wide Web Consortium,
April 1995.
Most HTTP servers today, are tuned to serve files. This intimate relationship between HTTP servers and their underlying file system has always affected their design in significant ways: the mapping from URL to their target resource generally relies on the file system hierarchy and the configuration information are generally understood as global to the whole server, or in some cases per directory, but no server today, allows you to fully configure your server on a per-resource basis.
In designing Jigsaw, the new experimental server from the World-Wide-Web consortium (written in Java), we took the opportunity of starting from scratch to test a new approach: each exported resource is a full object. This paper will briefly explain the Resource concept of Jigsaw and its configuration process.
Jigsaw is made of two different modules: the server engine deals with incoming connections, decoding requests and presenting them to the resource module which will perform them and produce either reply or error messages. The connection between these modules is described by an interface, so that you can replace them if needed. We will focus here on our sample implementation of the resource module.
In the HTTP specification, a resource is described as "An network data
object or service which can be identified by a URI". The Jigsaw design
takes these words for truth: each resource is an object inheriting from the
Resource
class. As such, they have an associated behavior, and
some internal state:
Resource
). At this basic level, no particular requirements
are made on the behavior of a resource: it can define whatever methods it
wants.
An HTTPResource
, is a resource tuned to handle the HTTP protocol:
it inherits from the Resource
class the ability to be saved
and restored from persistent storage, along with the ability to describe
and access its internal state. At this level of the class hierarchy, resources
are able to handle an HTTP request by generating an appropriate Reply object:
they can be plugged into the server engine. The set of attributes
defined by the HTTPResource
class is made of:
text/html
)
[0,1]
, indicating the quality
of the content of the resource
None of these attributes are mandatory: resources that generate their content
dynamically, for example, might not have a predefined
content-length
or content-type
; A resource whose
content is negotiated will dynamically generate its quality
attribute (typically by using the quality attribute of the selected variant).
Although object oriented, we still want Jigsaw to be able to - at
least - provide the basic functionality of a Web server: serve files in a
filesystem hierarchy. This is handled by two particular sub-classes of
HTTPResource
: The DirectoryResource
class handles
file system directories and the FileResource
handles files.
The DirectoryResource
class inherits from the
ContainerResource
class the ability to manage a set of children
resources. This last class specifies how resources are looked-up. The basic
algorithm is the following: parse the requested URL into a sequence of
components, then take the first of them, look it up in the server's root
ContainerResource
instance to obtain a target resource. If they
are no more components to look up, then the target is the target for the
URL, otherwise, the target should be itself a ContainerResource
,
to which we delegate the lookup of the remaining components. This lookup
process takes a time proportional to the number of components in the requested
URL, however, one can define its own ContainerResource
subclass,
and override the lookup algorithm all together.
The DirectoryResource
maintains a ResourceStore
to keep track of its children. This resource store manages the pickled version
of each child, which is unpickled on-demand when looked-up. An optional
ResourceStoreManager
instance will keep track of accesses to
loaded resource stores, in order to remove from memory the ones that have
not been used for a while.
Most children of a directory resource instance will be
FileResource
instances. The file resource class defines the
head
, get
and optionally the put
methods. For the sake of efficiency, the file resource (optionally) caches
the content-length
and last-modified
attributes
(which it gets from the file system), so that if-modified-since request
can be handled without any disk accesses.
These two classes, defines 90 percent of what a Web server is expected to
do: serve files of some underlying file system. Jigsaw provides more
resources: the PostableResource
class is a basic class for handling
the HTML form's POST method, the CGIResource
class handles
CGI scripts, etc. Of course, the server maintainer has the ability to defines
its own resource sub-classes, and install instances of them as children of,
for example, some DirectoryResource
instance. This results in
a very flexible server extension API. Again Jigsaw provides some basic
packages of classes to help you here, the form package, for example,
provides a high level API to manage HTML forms, allowing you to register
fields of given types, etc.
We have briefly described the basic design of Jigsaw and explained how it allows for common server features. However, at this point, one might wonder how authentication is handled.
Jigsaw defines another important subclass of Resource
:
the FilteredResource
class. Both the
ContainerResource
class and the FileResource
class
inherit from this new class, which provides the ability to filter requests
to some particular target, or set of targets. A ResourceFilter
is itself a resource (i.e. it has its own class, and it defines its own set
of attributes). Instances of ResourceFilter
are attached to
a particular FilteredResource
instance, and filtersall access
to them: at lookup stage, the filter's ingoingFilter
method
is called with the request as the only parameter, and on the way back, its
outgoingFilter
method is (optionally) called with both the request
and the target generated reply.
Given this, authentication is implemented as a special filter, whose
ingoingFilter
method will authenticate the given request, using
whatever algorithm it wants. Jigsaw provides one
GenericAuthFilter
class that allows to authenticate requests
by IP addresses, using the Basic authentication scheme or the Digest
authentication scheme, or a mix of the first with one of the others.
The concept of filters allows for much more than just authentication. Also
provided are a DebugFilter
(that will print requests and replies
to some given target) and an AccessLimitFilter
(that will limit
the number of simultaneous requests to some targets). Logging can be implemented
as filters too resulting in a powerful mean of getting detailed logging
information for only a sub-space of the whole information space exported
by the server. A PICSFilter
has also been integrated into
Jigsaw, allowing it to serve
PICS rating labels with documents.
At this point, one might wonder how all these objects are created. The configuration process for Jigsaw was probably one of the most challenging problem (it is responsible for at least three of the entire re-design of Jigsaw). The main problem is the following: server administrators are used to configure their whole server through one single centralized configuration file, while for Jigsaw each resource might need some specific information. There are two central pieces to the actual solution we have tested:
The first piece allows for the editing of resource attributes, which is one
part of the configuration process. The current version of Jigsaw comes
with a generic form based editor that allows you to edit any resource on
the server. Although not very user-friendly, this generic editor is mainly
a proof of concept. It should be noted here, that this provides for a very
fine-grain configuration: one is able, for example, to say that the file
/pub/foo.html
's content type is text/plain
and
not text/html
.
The second piece tackles with the creation process of resource
instances associated with directories or files. One trivial way to cope
with this would be to ask for the server administrator to declare each files
of its system, along with the requested information (e.g. the class of the
resource to be created for the file, the file's attributes, etc.). This,
of course, is not acceptable. By default (this can be turned off) the
DirectoryResource
class implements lookup in the following way:
it first looks in its resource store for the given child, if this fails,
it then goes to the resource indexer, and asks it for a default resource
instance to export the file. The sample resource indexer can be configured
to create particular instances of resource sub-classes based on the file's
extension, or if the file is a directory, on its name.
Jigsaw's configuration is not only made of its information space
configuration: one would like to be able to configure the resource indexer,
the authentication realms or the global server properties (such as its
port number, etc.). One nice thing about the Jigsaw design, is that
if you represent these configuration information as instances of
sub-classes of the Resource
class, then they will inherit
persistency, caching and the ability to be edited (which is really what you
want).
Due to this, the current Jigsaw release provides form-based edition of the following pieces:
You can, in fact, install and configure Jigsaw without having a text editor.
We have briefly described Jigsaw architecture, and have shown how an object based server can implement the basic functionalities expected from a Web server, and more. We believe that the most important characteristics of our design are:
Other design considerations have played an important role in Jigsaw's design, in particular the ability to unpickle resources only on-demand (so that the server don't start by unpickling its whole information space), and the caching mechanism for managing the number of unpickled resources at a given time. To conclude, we would like to emphasis on the fact that given this design, Jigsaw's configuration is no more in one single configuration file, but rather spread across the various resources instances.
Jigsaw is still in its alpha stage, and will be available to members by May, and to the public by June.
Anselm Baird-Smith
$Id: Position.html,v 1.1 1996/04/08 14:03:35 abaird Exp $