SGML (Standard Generalized Markup Language)
An SGML document essentially consists of two parts: a prolog, containing the document type declaration, followed by the document instance, containing the data interspersed with markup.
A document instance is a hierarchical structure of (possibly empty) elements, where each non-empty element contains other elements or character data. Each element has a name (the general identifier) and the start and end of an element are indicated by tags (typically <name>content</name>). Begin or end tags may be mandatory or optional. Moreover, elements contain zero or more attributes. Consider the example document below. The root element memo has one attribute, indicating the security level of the document. The memo element contains five other elements. A double to element stating the addressees, a from element specifying the sender, a subject and a body field. All elements of memo contain character data and no other elements. Note that the tags emphasize the logical structure of the document rather than stating how it should be formatted.
The document instance above must be preceded by a doctype declaration. The main part of the document type declaration is the document type definition or DTD. It defines the elements of a document and the required order of their sub-elements. The elements and their contents are defined by the use of element declarations. The second line declares the memo element, and defines its content as a sequence of one or more to elements, a from, a subject and a body element. The two `O' characters stand for ``omit'', indicating that the begin and end tag may be omitted. The third line defines the elements containing character data only. Their start tags are mandatory, indicated by the `-'. The list of attributes of each element is declared by an attlist declaration. Attributes can be of different types, and be mandatory or optional. The DTD can specify a default value, as is shown in the case of the security attribute. The DTD declaration may be contained within the doctype declaration, but are typically defined by a separate file. In that case, the doctype declaration contains a reference to that file.
Processing instructions are used to pass system dependent information to the application to tell how the document is to be processed. Processing instructions are contained within `<?' and `>' characters and can appear on arbitrary places within the document. In the following examples, we employ processing instructions to indicate the URL of a style sheet.
Fragments of markup and character data can be given a name using an entity declaration. The declaration of an entity is part of the document type declaration, but the entity may be used within the document instance. The contents of an entity may be defined by a string, or may be contained in an external file, in which case the entity declaration contains a reference to the file. External entities may be referenced by a system identifier. Support for these identifiers is system dependent and may include filenames, URLs and database queries. Consider a variant of the previous example: The first line references an external DTD by means of a system identifier, and the second line defines an entity for later usage. Note that the entity definition is enclosed within the square brackets of the doctype declaration. The third line contains a processing instruction to define the location of the style sheet. In contrast to the URL of the DTD, which is resolved by the parser, the URL of the style sheet is passed to the application without further processing by the parser.
To avoid system dependent identifiers such as filenames, an extra indirection is provided by the concept of public identifiers. These identifiers are assumed to be publicly known, and the SGML parser of the target application is expected to be able to resolve them. Typically, a local catalog file is used to map public identifiers onto system dependent ones. Formal public identifiers have a standardized and meaningful inner structure, to facilitate automatic resolving without the use of catalogs. In the following examples, we refer to the HTML 3.0 DTD by means of a formal public identifier.