Canonical class definitions

The multitude of constructs available in C++ to support object-oriented programming may lead the reader to think that object-oriented programming is not at all meant to reduce the complexity of programming but rather to increase it, for the joy of programming so to speak. This impression is partly justified, since the number and complexity of constructs is at first sight indeed slightly bewildering. However, it is necessary to realize that each of the constructs introduced (classes, constructors and destructors, protection mechanisms, type conversion, overloading, virtual functions and dynamic binding) are in some way essential to support object-oriented programming in a type safe, and yet convenient, way.

Having studied the mechanisms, the next step is to find proper ways, recipes as it were, to use these mechanisms. What we need, in the terminology of  [Coplien92], are idioms, that is established ways of solving particular problems with the mechanisms we have available. In his excellent book, Coplien discusses a number of advanced idioms for a variety of problem domains, including signal processing and symbolic computing.

In this section, we will look at two basic idioms, idioms that every C++ programmer needs to master or at least understand. These idioms concern the definition of concrete data types or representation types and their efficient implementation. It is not immediately obvious what lessons can be drawn for the realization of abstract data types in other languages.

Concrete data types

A concrete data type is the realization of an abstract data type. When a concrete data type is correctly implemented it must satisfy the requirements imposed by the definition of the abstract data type it realizes. These requirements specify what operations are defined for that type, and also their effects. In principle, these requirements may be formally specified, but in practice just an informal description is usually given. Apart from the demands imposed by a more abstract view of the functionality of the type, a programmer usually also wishes to meet other requirements, such as speed, efficiency in terms of storage and error conditions, to prevent the removal of an item from an empty stack, for example. The latter requirements may be characterized as requirements imposed by implementation concerns, whereas the former generally result from design considerations.

To verify whether a concrete data type meets the requirements imposed by the specification of the abstract data type is quite straightforward, although not always easy. However, the task of verifying whether a concrete data type is optimally implemented is rather less well defined. To arrive at an optimal implementation may involve a lot of skill and ingenuity, and in general it is hard to decide whether the right choices have been made. Establishing trade-offs and making choices, for better or worse, is a matter of experience, and crucially depends upon the skill in handling the tools and mechanisms available.


Canonical class

Abstract data types must be indistinguishable from built-in types
slide: Canonical class

When defining concrete data types, the list of requirements defining the canonical class idiom given in slide 2-canonical may be used as a check list to determine whether all the necessary features of a class have been defined. Ultimately, the programmer should strive to realize abstract data types in such a way that their behavior is in some sense indistinguishable from the behavior of the built-in data types. Since this may involve a lot of work, this need not be a primary aim in the first stages of a software development project. But for class libraries to work properly, it is simply essential.

Following  [Coplien92], we will illustrate the notion of concrete data types by a string class. Strings are a well-understood data type, for which many libraries exist, such as the C-library represented by the {\it strings.h} include file. Strings support operations such as copying, concatenation and asking for the length of the string. In the example in slide 2-string, the C-string package is used to implement a string class.

As may be expected, there is a constructor for creating a string from a pointer to char, which is the low-level C representation of a string. The result of evaluating this constructor is to store the argument string in the private char* data member. The definition of a default constructor is mandatory. The default constructor of the string class is easily obtained by employing a default argument for the char* constructor. When no argument is provided the private (low-level) string pointer is set to the empty string. A default constructor is required, since, for instance when creating an array of strings, the user is not allowed to initialize the individual (string) objects created. In such cases, the compiler uses a default constructor, which may be (re)defined by the implementor of the class. The other mandatory constructor is a so-called copy constructor. This constructor is used when creating a string by copying another string object. Copying occurs for example when passing an object by value to a function or when returning an object by value as the result of a function. By default, the compiler defines a standard copy constructor, which makes a shallow copy of the object, that is a copy of only the data members of the object, not what they refer to if they are pointers. However, a shallow copy is not in all cases satisfactory. For instance, in our string example, a shallow copy may cause the object to refer to the same char* string. When deleting the objects, the shared char* string pointer will be deleted twice, which on some systems may lead to a core dump. The copy constructor, as defined in the example, takes care of creating an actual copy of the char* string. Similar considerations apply to the use of the assignment operator and hence this operator needs to be redefined as well.


String class

canonical

  class string {
  public:
  
  string(char* s="") { init(s); }
  string(string& a) { init((char*)a); } 
  ~string() { delete p; }
  
  string& operator=( string& a ) {
     init((char*)a); return *this;
     } 
  
  string operator+( string& a );
  
  int length() { return strlen(p); } 
  
  operator char*() { return p; }
  private:
  void init(char* s) { 
    p = new char[strlen(s)+1]; strcpy(p,s);
    }
  char* p;
  };
  

slide: A string class

In addition to the usual string operations, such as concatenation (for which the addition operator is used) and length (to determine the number of characters a string contains), there is also a type conversion operator that deserves special attention. Together, the constructor for creating a string from a char* string and the type conversion operator for converting a string to a low level (char*) string define a cyclic relation between char* pointers and strings. This may lead to ambiguities with which the compiler cannot cope.

An example of the use of the string class is given below

string s1("hello"), s2("world");
  cout << (char*) s1 << (char*) s2 ;
  string s3(s1); string s4 = s2;
  cout << (char*) s3 << (char*) s4;
  string s5; s5 = s3 + " " + s4;
  cout << (char*) s5;
  
The example shows the creation of two strings from char* pointers. These strings are written to standard output by using an explicit cast. Alternatively, the output operator might have been overloaded for string. Next, two strings are created as a copy from the previously created strings. Note that in both cases the copy constructor is called. The assignment operator is only used to store the result of concatenating the strings just created.

Evidently, the implementation of the string class is far from optimal. Both in performance and storage there is a lot of unnecessary overhead. In the next section, we will look at how to improve the actual behavior of string objects.

Envelope and letter classes

The string class defined previously is correct in the sense that it satisfies the requirements imposed by (the informal specification of) our notion of strings. Moreover it is safe, in the sense that it does not carry the danger of potential core dumps.

String handler

envelope

  class string {
  public:
  
  string(char* s = "") { rep = new stringrep(s); }
  string(string& a) { rep = a.rep; rep->count++; } 
  string& operator=( string& a ) { 
  	   a.rep->count++;
  	   if (--rep->count <= 0 ) delete rep;
  	   rep = a.rep;
  	   return *this;
  	   }
  
  string operator+( string& a );
  
  int length() { return strlen(rep->rep); } 
  
  operator char*() { return rep->rep; }
  private:
  stringrep* rep;
  };
  

slide: A string handler class

However, although it is a correct and safe realization of the abstract data type string (partially, that is), it is not an efficient nor in any sense optimal implementation. To illustrate the second basic idiom, that of envelope/letter pairs, the string example will be extended to include reference counting as a means by which to reduce the storage required and the overhead of creating and deleting strings.

The idea of the envelope/letter idiom is that the program manipulates objects (letters) through special wrappers (envelopes) which contain a pointer to the associated letter. The envelope can deal with some general issues (for example, ensuring that store is managed correctly on assignment), while deferring other operations to the letter. This separation of concerns makes developing a suitable class interface easier. We will also refer to the envelope/letter idiom as the handler/body idiom. A string handler (envelope) class may be defined as in slide 2-handler.


String body

letter

  class stringrep {
  friend string;
  private:
  stringrep(char* s) {
  	rep = new char[strlen(s)+1]; strcpy(rep,s);
  	count = 1;
  	}
  ~stringrep() { delete[] rep; }
  char* rep;
  int count;
  };
  

slide: A string body class

Both the copy constructor and the {\em assignment operator} are defined to make use of the reference counting scheme. For instance, after decrementing the reference counter, when this counter is zero the object representing the actual string is deleted. The details of creating and deleting the storage needed to represent the actual string are hidden by the (body) stringrep class. The body (letter) class may be defined as in slide 2-body.

The class string is declared to be a friend of the stringrep class, to allow the string direct access to the data members. Notice that the class stringrep has no public constructors. It is not intended to be used by others. Only friend classes are allowed to create actual instances of it.

In later versions of C++ it is possible to nest class definitions. This may be convenient for keeping the class name space from being polluted by auxiliary classes. Evidently, our stringrep class may be defined within the scope of the class string. This is left as an exercise for the reader.