Building Specialized Web Services -- Where to Put the API

Leslie L. Daigle
Sandro Mazzucato
Bunyip Information Systems, Inc.
March 28, 1996.

Before considering how to design an API to facilitate the construction of finely-tuned Web applications, we must first analyze the nature of the level of interaction that is to be supported by these applications. This analysis leads to some suggestions about what part of the interaction stream can be articulated in an API, and the necessary elements of such an API.

Sophisticated Information Transactions

As the Internet first dawned from the original research projects that provided the necessary technology, it was best perceived as a network link between isolated islands of machinery (sites). Information access was done on a point to point basis (by ftp, mail, news). Users had to be able to identify specific sites and addresses to which information was to be sent, where the latest update could be gotten, etc. There was no coherent view of the information space spanned by the resources accessible through the net; its main purpose was for transmission.

Since then, various applications have been designed to provide some sort of sense of the whole information space. First there was Archie [emta92], which provided an index view of files available globally through anonymous FTP. Most recently, the World Wide Web has achieved great popularity as a technology for hooking together parts of islands. The WWW hides the point and shoot aspect of the Internet through presentation of a document facade -- the user no longer needs to know what site to contact for information, as that knowledge is embedded in the linking constructs. This level of abstraction has lead to a transaction model that sees users starting from one web page, selecting a link, and causing some computation at a remote server (a search, some other program, or, most commonly, retrieval of a pre-constructed file).

This model, while an improvement, is still limiting. It provides one view of the Internet information space -- that set out by designers of web pages. The Internet information space of the future must permit the customization of views of its content. Individual users must be able to tailor their interactions with the Internet to support their work tasks. In the same way that retrieving a file has become transparent to users who click on a WWW link anchor, searching for information must be supportable as a component part of a composite task.

In order to support web activities with increasingly-complex information resources (multimedia objects, structured documents, specialized databases, etc), the next generation web services will need to be interoperable at a high level of information activity abstraction. This may be fairly evident in terms of developing information servers and indexes that can interact with one another, or that provide a uniform face to the viewing public (e.g., through standard WWW browsers). However, an information activity is composed of both information resources and needs. It is therefore not enough to make resources more sophisticated and interoperable; we need also to be able to specify more complex, independent client information processing tasks. Note that this client may be a human user, or another software program.

The interaction model supported by this position has three primary components: an information need, an information processing task or activity, and information resources. That is, a client's information need can be satisfied by some processing task which will draw on specific resources. In the current Internet environment, the bulk of the processing associated with satisfying a particular need is embedded in software applications (such as WWW browsers), and these applications access information resources directly through a fixed set of communications protocol standards. The potential danger is that this puts a barrier between clients and resources, because the language of communication is generalized to support so many possible applications that it cannot support any one deeply.

Rather than falling into this dog-bone shaped distribution of information activities, we need to have standard mechanisms for encapsulating client information processing tasks. In this way, client information processing can be delegated to a task defined outside the realm of a particular application. There are activities that are client-centred, not resource-centred, and for these it is not appropriate for it to be buried in information servers. Thus, it must be possible to create a new layer of information processing objects, acting as agents for clients, that must be created out of some universally supported construction standards in order to ensure accessibility by all applications.

The next step is to build a better balance between representation of user interests, and representation of information resources. The latter is taking shape through a variety of mechanisms, not the least of which is the URI infrastructure (URLs, URNs, and URCs) [bern94]. The URI work focuses primarily on identifying resources; more work is needed to be able to identify information service capabilities. This is starting to take shape in work such as Stanford's Infomaster project [gedd95], which provides a virtual information system that supports conceptual and notational diversity in accessing the underlying resources. Representation of user interests is being explored in the work with Uniform Resource Agents (URAs) [daig95]. These are agent objects which encapsulate Internet resource activities. A fully-instantiated URA carries out a task delegated by an invoker (human or otherwise). The nature of the task is determined by the agent that the invoker instantiates; that originating object encapsulates some knowledge of existence of relevant Internet resources and information required in order to access them. When it contacts these resources, the agent is the bearer of the representation of the invoker's interests and background. In this way, URAs can be used to allow invokers to carry out high-level Internet resource activities while insulating them from the details of Internet protocols, etc.

Where this puts the API...

If the future of sophisticated information interactions lies with a model that distributes computation between the client and server, then we must consider mechanisms to encapsulate the Web-specific communications at both those points.

Specialized client applications for today's Web would benefit from:

standard APIs for libraries handling URI (URL/URN, etc) resolution. That is, Internet interactions necessary to resolve a URI should be completely transparent to developer of the client application.
high-level parsers of standard document formats. That is, non-rendering applications are obstructed by the HTML markup code that is designed specifically for indicating presentation features. For example, regular expressions that extract data from HTML pages break as soon as a change is made to the visual presentation of the page -- a change that is barely distinguishable visually causes ruptures in the HTML structure for which the regular expression was designed. Nevertheless, elements of this markup can be used to indicate specific components of the document.

These are fairly straightforward extensions that would make it much easier for applications developers today. However, they become crucial if Web interactions become more sophisticated (complex), as they must if more extensive information services are to be provided by servers.

Many of the systems currently hooked up to the net involve CGI scripts that access specialized software (e.g., database service software). This requires the information to be presented in only one format, and makes it difficult to have flexible access to the data (e.g., attempting to capture the full range of possible Archie searches through a forms interface in a Web page). What would be useful in terms of extensions to HTTP is a richer set of directives (beyond ``get'' and ``post''; search, other), with better parametrization capabilities (e.g., requested return formats, other than HTML -- perhaps application-specific). This would enable more sophisticated communications, allowing the complexity of the computation to be handled close to the information itself, while still permitting the client to express specific needs.

References

[bern94] Berners-Lee, Tim, Larry Masinter, Mark McCahill, ``Uniform Resource Locators (URL)'', RFC1038, December 1994.

[daig95] Daigle, Leslie L., Peter Deutsch, ``Agents for Internet Information Clients'', Proceedings of the Intelligent Information Agent Workshop, CIKM'95, December 1995.
(See <URL:http://www.bunyip.com/products/silk/>)

[emta92] Emtage, Alan, Peter Deutsch, ``archie - An Electronic Directory Service for the Internet'', Proceedings of the 1992 USENIX Winter Technical Conference, January 1992, San Francisco, pp. 93 - 110.

[gedd95] Geddis, Donald F., Michael R. Genesereth, Arthur M. Keller, Narinder P. Singh, ``Infomaster: A virtual Information System'', Proceedings of the Intelligent Information Agent Workshop, CIKM'95, December 1995.

Authors' Contact

Leslie Daigle (leslie@bunyip.com)
Sandro Mazzucato (pedro@bunyip.com)
=============================================================================