Library design

One of the great promises of object orientation is what has been called the industrial reuse of software in  [Meyer90]. The idea of objects as reusable components was originally put forward in  [Cox86], who coined the phrase software IC to indicate the analogy with building electronic circuits from off-the-shelf components. Software reuse is certainly not an exclusive interest of object-oriented developers, as testified for instance in  [Biggerstaff89]. However, object orientation holds strong cards, as the mechanisms of encapsulation and inheritance directly support the reuse of software. Encapsulation may be used to define wrapper classes for existing software, and inheritance may be used to refine existing classes to meet specific requirements. Essentially, a well-designed class library may act as a repository of reusable components. The unit of reuse is, however, generally not a single class but rather a collection of classes providing a framework for application development. In this section, we will look at the design decisions underlying the Eiffel libraries, and discuss the development of template classes for abstract data structures in C++.

Taxonomic classification -- the Eiffel libraries

A good example of a well-designed and well-documented collection of class libraries is given in  [Meyer90], who describes the design and implementation of the standard Eiffel libraries. (See also Meyer, 1994.)

Library design

{\em -- industrial reuse \c{of software}}

The Eiffel libraries

-- contracts
slide: The Eiffel libraries

The Eiffel libraries include kernel classes (for arrays, strings and io), support classes (for browsing, persistent storage and debugging), data structures (such as lists, trees and stacks), classes for lexical analysis and parsing, and graphics classes (for windows, mouse handling and figures). See slide 11-eiffel.

Consistency

-- support tools The great Tempter of Perfection exhorts: ``Correct it here and now, before it is to late.'' But, here the Guardian Angel of the Installed Base is really a frontman for the hideous Devil of Eternal Compatibility with the Horrors of the Past, whose nefarious influence is also visible in Computer Science. \zline{ [Meyer90]}
slide: The problem of consistency

The Eiffel libraries may serve as an example of rational and consistent library design. Each class interface is described by a contract stating the class invariant and the pre- and post-conditions for the services provided. An interesting feature of the Eiffel libraries is that the class interface documentation has been extracted automatically from the class implementations, thus guaranteeing consistency between the actual library classes and their description. Like any software system, libraries tend to evolve over time. As the quotation taken from  [Meyer90] in slide 11-obsolete indicates, the developer is faced with the dilemma of keeping the library clean and consistent or satisfying the user of earlier versions by maintaining the original class interfaces. As an intermediate solution, the Eiffel language provides the keyword obsolete which may be used to indicate that particular methods may eventually be removed from the class interface. The user acquiring a new version of the library is then urged to adapt the code without being confronted with a non-functioning system. Good library design is aimed at the development of a stable collection of classes with stable interfaces. Stable interfaces mean that the user of the library need not worry about any changes in the implementation of the library, since as long as class interfaces remain the same user code is not affected.

Naming

  • favor consistency \c{over specificity}

Conventions

  • verbs -- procedures
  • nouns -- attributes, functions
  • {\em adjectives \c{, question-verbs}} -- boolean queries

slide: Naming conventions

An important aspect of developing library classes is to arrive at a consistent and understandable (read predictable) naming scheme for the methods listed in the class interface. The advice given in  [Meyer90] is to use verbs for (state modifying) procedures, nouns for (non-modifying) functions and attributes, and adjectives or question verbs for boolean queries. Note that this advice is not without ambiguity, since words such as empty and count may be used both as nouns and verbs. In addition, rather than adhering to specific method names that are conventionally used (such as push and pop for stacks and insert and extract for queues),  [Meyer90] advises resorting to standard feature or method names that apply to all classes of a particular category. See slide 11-naming.

Standard feature names

containers \nl \indent item -- access operation \n \indent count -- number of items \n \indent has -- membership test\n \indent put -- insert or replace item\n \indent force -- like put\n \indent remove -- remove an item\n \indent {\em wipe_out} -- remove all items\n \indent empty -- absence of items\n \indent full -- no more space


slide: Feature names for containers

As an example, consider the development of a collection of container classes. Our aim, obviously, is to arrive at an inheritance hierarchy representing a classification of container types that allows us to choose the kind of container that most closely fits our needs with respect to access, traversal and storage requirements. The advantage of using standard feature names for the complete collection instead of the specific names conventionally used for the different kinds of containers is that essentially no code has to be changed when replacing one kind of container with another. The standard feature names for container classes in the Eiffel library are listed in slide 11-features, with a short explanation of their meaning. Underlying the implementation of container classes in the Eiffel libraries is a taxonomy of data structures, and a taxonomy of access and traversal methods, that may be combined in a variety of ways using multiple inheritance. From a user's point of view, access methods are the most important attributes of containers, as they determine the order of insertion and retrieval. \newsavebox{\xxcontainers} \sbox{\xxcontainers}{ \it\footnotesize \curl{\c{container}}{ \curl{collection}{set}{\curl{bag}{ \curl{dispenser}{stack}{queue} }{ \curl{cursor}{list}{tree} } } }{ \curl{table}{indexable}{hash} } \n }

Access

{\em -- \c{keys,} order of insertion and retrieval} \nl \usebox{\xxcontainers}
slide: Access taxonomy

The hierarchy associated with access methods most closely corresponds with the traditional classification of containers according to their (mathematical) type. See slide 11-access. For example, a stack may be regarded as a refinement of a bag, imposing additional order. See section refinement. Somewhat peculiar in the hierarchy shown above is that the container type set is not a refinement of bag, which would mathematically be more appropriate. In addition to the hierarchy corresponding to access methods, we also have a hierarchy corresponding to traversal methods (which are either hierarchical or linear) and a hierarchy representing the options for storage requirements. As an example of the possible interplay between access methods and traversal methods, look at the tree container in the access hierarchy. Similar to a list, a tree may be accessed by means of a cursor, which is a pointer to a particular element in the structure. Dependent on the traversal method, the cursor points to the respective elements of the tree either in a linear (that is sorted) order or hierarchically (corresponding with the levels of the nodes).

Storage

-- requirements \nl\it \curl{\bullet}{infinite (lazy)}{\curl{finite}{unbounded}{\curl{bounded}{resizable}{fixed}}} \n
slide: Storage taxonomy

With respect to the storage requirements, we have the choice between finite structures or potentially infinite structures (that must be implemented in a lazy, that is demand-driven, fashion). Bounded structures such as arrays may again be subdivided into fixed or resizable structures. See slide 11-storage. The hierarchies corresponding to access and traversal methods and storage requirements may be regarded as the dimensions along which a variety of container types may be constructed. The properties chosen with respect to these dimensions may be used as type annotations serving as an index for retrieval.

Indexing and retrieval

To support the retrieval of classes from a repository (a database of software components), the Eiffel language provides the keyword index by which attributes associated with a class may be defined. For instance, a container class may be given the attributes fifo (first-in first-out) or lifo (last-in first-out) that characterize respectively the access behavior of a queue and a stack. See slide 11-indexing. The Eiffel system itself, however, does not provide for a software repository facility.

Indexing

-- type annotations
  • Access -- fifo, lifo, key
  • Storage -- bounded, unbounded

Archival and querying tools

Eiffel

  • flat -- flattens \c{the} inheritance structure
  • short -- produces interface
  • good -- browser for class universe

slide: Indexing and retrieval

The retrieval of classes (and software components in general) is one of the unresolved problems in software reuse. Clearly, a browser allowing traversal of the class universe does not suffice. As an anecdote, Alan Kay (the spiritual father of Smalltalk) once remarked that after an absence of a few months in which the class structure of the Smalltalk library had been changed he had considerable difficulty in locating the appropriate classes. The anecdote indicates that browsers are only helpful to refresh our memory, but not to familiarize ourselves with a class library or to select previously unknown classes for reuse. The class browser good that comes with the Eiffel system is a graphical class browser which sets (apart from its speed) an example for a general purpose class browser, as it allows a graphical display of both inheritance relations and client-server relations. In addition to a class browser, the Eiffel system provides a tool flat that allows one to flatten a class to include all features inherited from its ancestors. Moreover, it provides a tool short that allows one to extract an interface description from a (possibly flattened) class implementation. A tool such as short is an invaluable means for producing automatically the documentation specifying the interface of a class. Invaluable, since reuse ultimately depends upon understanding the functionality offered by a library.

Template classes -- bags and sets

Numerous C++ libraries offering data structures for containers are available. Yet, many programmers choose to develop such a library themselves. The reasons for this may vary from matters of taste to considerations with respect to the formal properties of the types offered.
  template< class T >
  class bag { 
\fbox{bag<T>}
public: bag() { s = new list<T>; } bag(const bag<T>& b) { s = b.s; } ~bag() { delete s; } bag<T>& operator=(const bag<T>& b) { s = b.s; return *this; } virtual void insert(const T& e) { s->insert(e); } operator iter<T>() const { return *s; } int count(const T& e) const; void map(T f(const T& e)); protected: list<T>* s;
see section gen-list

};

slide: The implementation of a bag

In the following, we will look at some issues in defining generic classes for the mathematical data types bag, set and powerset. Such data types must, for example, be provided in a library meant to support the use of formal methods as outlined in section 3-design. Naturally, in developing real industrial-strength libraries many more issues play a role. See, for example,  [BV90]. In slide cc-bag, a generic definition for the type bag is given. Mathematically, a bag or multi-set is a set which may contain multiple instances of a particular value. (With respect to the subtype refinement relation, however, a set may be considered as a subtype of bag.) The template class bag offers a default constructor, a copy constructor and a member function for assignment, as required by the canonical class idiom presented in section canonical. An instance of the (generic) list class, developed in section generic, is used to store the elements of the bag. The destructor of bag simply deletes the the protected data member s. To insert an element, bag::insert forwards the call to list::insert and, similarly, when an iterator is requested the list pointer is converted appropriately.
  template< class T >  
\c{\fbox{bag<T>::count}}
int bag<T>::count(const T& e) const { iter<T> it = *this; T* p = 0; int cnt = 0; while ( p = it() ) if ( (T&) e == *p ) cnt++;
$(*)

return cnt; } template< class T >
\c{\fbox{bag<T>::map}
void bag<T>::map(T f(const T& e)) { iter<T> it = *this; T* p = 0; while ( p = it() ) *p = f(*p); }

slide: Bag operations

The function bag::count and bag::map are defined in slide cc-bag-op. The count function tells how many instances of a particular element are in the bag. It employs an iterator to traverse the list and compares its contents with the element given as an argument. The function map may be used to modify the contents of the bag applying some mapping function to each element:
  int S(const int& x) { return x+1; }
  
  bag b;
  b.insert(1); b.insert(2); b.insert(1);
  b.map(S);
  iter it = b; 
get the iterator

int* p = 0;
start

while ( p = it() ) { cout << "item;" << *p << endl;
take the value

}
In the example above, the function S is defined as the successor function for integers. Also, a bag b of integers is declared, into which the (integer) elements 1,2 and 1 are inserted. As a consequence
b.count(1) will deliver 2 and b.count(2) will deliver 1 as a result. The bag::map function is used to apply the successor function S to each element of the bag. Then, an iterator is obtained, simply by assignment with an implicit conversion, and the contents of the bag are written to standard output.
  template< class T >
  class set : public bag<T> { 
\c{\fbox{set<T>}
public: void insert(const T& e) { if (!member(e)) bag<T>::insert(e); } bool member(const T& e) { return count(e) == 1; } };

slide: Deriving a set class

Evidently, the class
bag lacks many of the features required for the mathematical notion of a bag, such as operators for bag union and bag intersection. Nevertheless, bag may conveniently be used to define the class set as a derived class. The set class given in slide cc-set defines an additional function member and redefines insert to check that the element is not already a member of the set (since, in contrast to a bag, a set may not contain multiple instances of a value). The function member delivers true when the number of occurrences of an element is precisely one, otherwise it returns false.
  template< class T > 
\c{\fbox{ set<T> == set<T> }
int operator==(const set<T>& s, const set<T>& b) { iter<T> it = s; T* p = 0; int eq = 1; while ( eq && (p = it()) ) if ( s.count(*p) != b.count(*p) ) eq = false; return eq; }

slide: Equality for sets

Equality between sets may be defined as in slide cc-set-eq. Set equality amounts to element-wise correspondence, irrespective of the order in which the elements occur. The definition given in slide cc-set-eq is somewhat more general than necessary, in that it also applies to bags. The requirement
! member(*p) would have been sufficient. The definition of equality for sets is necessary in order to be able to define instances of set having a set-valued instantiation parameter, such as the class power defined in slide cc-power, which is derived from set< set >. When instantiating set< set > for some type T, equality is required for set by the line $(*) in the function bag::count defined in slide cc-bag-op.
  template< class T > 
\fbox{{\tt operator<<}}
ostream& operator<<(ostream& os, const set<T>& s) { iter<T> it = s; T* p = 0; while ( p = it() ) { cout << *p << endl; } return os; }

slide: Writing a set to a stream

In slide cc-set-op, a generic operator is defined to write an arbitrary set to a stream, by overloading {\tt operator<<} for const set&. To traverse the elements of the set it employs an iterator, obtained by assigning the set reference to an iter instance.
  template< class T >
  class power : public set<set<T> > {
\c{\fbox{power<T>}}
};

slide: The powerset class

To define the class power it suffices to derive the class from set< set > . However, according to the rules given for the canonical class idiom, both power and set should be augmented with a default and copy constructor, and an assignment operator and destructor as well. Note that the map function as defined for bag is potentially unsafe for both set and power instances. For example, the function given to map may result in identical values for different arguments. As a consequence, the restriction that the set does not contain multiple instances of the same value may be violated.
  set s1;
  s1.insert(1); s1.insert(2); 
  set s2;
  s2.insert(2); s2.insert(3);
  cout << s1;
  power b;
  b.insert(s1); b.insert(s2);
  cout << (set< set >&) b; 
\c{// cast is necessary}
An example of employing the powerset is given above. First, two instances of set are created, which are then inserted in an instance of power. The {\tt operator<<} function (defined for set in slide cc-set-op) may be used to write the powerset to standard output. For each (set-valued) element of the powerset, the {\tt operator<<} function instantiated for set is called to write the individual elements to standard output.