Benefits and pitfalls

Having looked at a number of language constructs supporting object-oriented programming, and some of the idioms that apply to them, it is time to establish the potential benefits and pitfalls of the approach supported by these constructs. More particularly, since we have concentrated mostly on C++, we need to explore what advantages C++ offers when we adopt an object-oriented approach, and what disadvantages may adhere to choosing C++ instead of, say, Smalltalk or Eiffel. Before examining the various language features (or absence thereof) in more detail, we may observe that C++ has a number of weaknesses that are directly due to the need to remain compatible with C. C offers many built-in types, such as char, short, unsigned and int for which the language defines implicit yet unsafe conversions. One of the awkward features of C, inherited by C++, is the equation of arrays with pointers. \nop{, as observed in [SakkinenXX],} This equation causes the language to be non-orthogonal when dealing with pointers to arrays. For example
  typedef int intarray[10];
  
  intarray a, *p;
  p = &a; 
warning: illegal pointer combination

results in a warning, whereas
  typedef int Int;
  Int i, *pi;
  pi = &i;
  
is accepted. The fact that arrays are treated as pointers is also one of the obstacles to developing distributed extensions of C++, since it is impossible to know whether we are dealing with simply a pointer or an array. Another problem that may be attributed to the wish to maintain C compatibility is that the choice for defaults is often contrary to what one would expect. For example, the default for member functions is to be non-virtual, whereas from the perspective of object orientation the default should be virtual, as in Eiffel and Smalltalk. Similarly, in the case of multiple inheritance the default should be virtual inheritance, allowing derived classes to share common base classes. For a detailed account of the history of C++ and the motivations underlying its design, the reader is referred to  [St94]. Choosing C++ means choosing a \nop{strongly typed} language with compile-time type checking. As an immediate advantage we then have compiler support for detecting errors, and (in the case of C++) efficiency. However, one of the drawbacks of this choice is a loss of flexibility (when compared with, for example, Smalltalk), and since C++ is a hybrid language, little enforcement of a purely object-oriented approach. In this respect, the language Eiffel, which also offers compile-time type checking, may be regarded as much more strongly enforcing of an object-oriented approach as well as a specific method of developing software based on the notion of contracts as a means of formalizing the dependencies between objects and their clients. See section contracts. An issue raised by  [Snyder86] is whether the principle of encapsulation as employed in data abstraction also applies to derived types or classes, considered as clients. More specifically, what language features support the definition of an interface that protect against illegal access by both regular clients and instances of derived classes?

Visibility and protection

Evidently, unlimited access to inherited instance variables compromises encapsulation. Just imagine that, for example, we decided to change the representation of the origin of our abstract shape in section 2-shape. The entire hierarchy of concrete shapes would then collapse, since instances of circles or rectangles would illegally refer to the origin coordinates when drawing themselves.

OOP = encapsulation + inheritance

Encapsulation -- dependencies (contracts) \n Data-abstraction -- behavior (operations)\n Objects -- external interface (escapes)

Inheritance {\rm - new category of clients}


slide: Encapsulation and inheritance

Rather surprisingly, of the three languages mentioned, Smalltalk, Eiffel and C++, only C++ allows us to differentiate between access by instances of the class itself and access by instances of derived classes. For both Smalltalk and Eiffel, instance variables are also freely accessible by instances of derived classes. Protection in C++ is enforced by the access specifiers private, protected and public. Declaring a section to be private means that it is only accessible for instances of the class itself. The declaration protected on the other hand also allows instances of derived classes access to that section. Naturally, the declaration public offers no protection since it is accessible by anyone. A similar distinction plays a role with respect to the visibility of derived classes when regarded as (polymorphic) types. As a programmer, we sometimes like to use inheritance for the purpose of code sharing only, with no intention to declare a subtyping relation between the two classes. In C++ one can declare a class to be either publicly inherited or privately inherited. The latter is intended to be used when only code sharing is intended and does not affect the type system, whereas the former is only to be used when a subtype relation is explicitly intended, as demonstrated in the shape hierarchy. The C++ compiler offers only protection from mistakes, not from abuse. For example, we may access the private base class by employing casts, as illustrated in slide 2-priv.
class A {
  public:
  A() { s = "XAX"; }
  void print() { cout << s; }
  private:
  char* s;
  };
  
  class B : private A {   }
  

slide: Casting to private base classes

In slide 2-priv we have defined a class A, and a class B that privately inherits from A. As illustrated in the program fragment
  A* a = new B; 
error

A* b = (A*) new B;
accepted

b->print();
XAX

we can access the B part of A by using a cast. However, we can also gain access by completely bypassing the public interface of a class, as illustrated by the fragment below
  char* p = (char*) b;
  while( *p != 'X' ) p++;
  cout << p << endl; 
XAX

The trick is to cast the object pointer to an array of char, which allows us to inspect each byte of the object. Casting is the most dangerous feature offered by C and C++. It allows programmers to circumvent the type system in completely arbitrary ways.

Objects, pointers and references

Perhaps one of the most demanding aspects of C++, when compared to Smalltalk and Eiffel, is the existence of pointers and references to objects. Both in Smalltalk and Eiffel, the programmer needs to handle only a single kind of variable, whereas C++ requires the programmer to indicate explicitly what a variable stands for, a value, a pointer or a reference. Notationally, the difference between values, references and pointers is reflected in the following declaration:
   int n = 7;  
value

int& r = n;
reference to value n

int* p = &n;
pointer to n

When initializing a pointer, the address of the value must be given, whereas for a reference the value suffices. An object (in C++) is a value. Because of the typing rules, the compiler knows what operations an object allows. The representation of a value is just a sequence of bits or bytes. Due to the type information, we may regard that sequence as, say, an integer or an object with structure. A reference is an implicit pointer to a value. The distinction between a pointer and a reference is that the programmer may treat a reference just as an ordinary object, whereas the use of a pointer requires explicit dereferencing.
  class sneaky { 
\fbox{sneaky}
private: int safe; public: sneaky() { safe = 12; } int& sorry() { return safe; } int value() { return safe; } };

slide: Sneaky references

References and pointers are often used for reasons of efficiency. However, the use of pointers is known to be error-prone. Familiar problems that may occur when using pointers are for instance the existence of dangling references and unintended aliasing. References are less error-prone, since they do not require any explicit pointer manipulation. However, the class given in slide 2-sneaky illustrates a problem that is easily overlooked by many programmers. Since a reference is actually an implicit pointer to a value, manipulating a reference may have unexpected results. In the example, the member function sorry returns a reference to a data member safe, instead of its value as the member function value does. The following code fragment illustrates what this means:
sneaky x;
  
  cout << x.value() << endl;
  x.sorry() = 17;
  cout << x.value() << endl;
  
Since a reference may occur on the left-hand side of an assignment, in contrast to a value, the data member safe may be assigned an arbitrary value, despite the fact that it occurs in the private section of the class sneaky. The remedy to this abuse would have been to declare that the member function sorry returns a const int& instead of an int&, as in the example in section 2-const.

Virtual functions versus non-virtual functions

Another important difference between C++ on the one hand and Smalltalk and Eiffel on the other, is that dynamic binding is not the default, as it is for Smalltalk and Eiffel. In C++, a member function has to be declared virtual (higher up in the inheritance hierarchy) in order to profit from the polymorphic behavior that results from dynamic binding. The choice not to make dynamic binding the default is motivated by the philosophy underlying C++ not to affect the performance of a program if not needed. Dynamic binding may be explained as searching for the appropriate method. If the method is not found in the object (class) itself it is searched for in the classes from which the object class has been derived. This search can be eliminated by associating with each object a virtual table that contains the actual functions to be called. The actual cost of this is only the storage required for a pointer and one additional indirection. In the case of multiple inheritance, only two indirections are required (see Ellis and Stroustrup, 1990). Given the fact that dynamic binding is not all that costly, what are the pros and cons of virtual functions?

Cost of inheritance

  • Execution speed: often a misplaced concern
  • Program size: memory cost decreases, optimization
  • Message-passing overhead: {\it reduction as in C++}
  • Program complexity: {\it yo-yo problem: up and down the inheritance graph}

slide: Cost of inheritance

As a first observation, execution speed is almost always a misplaced concern. More important usually is to find the proper structure for a program. After profiling its behavior, it usually suffices to optimize only selected parts to obtain the required execution speed. Another important aspect, from the perspective of program development, is that the actual code size may dramatically decrease when using inheritance with dynamic binding. This allows maintenance and optimizations to be more easily done, since important parts of the code may be localized in a few (shared) ancestor classes. However, when efficiency is of crucial concern, can we still use an object-oriented language? Actually, this is an area of active research. For interpreted languages, still better optimization strategies (involving caching and partial evaluation) are being developed. Adherents of this approach claim to reach an efficiency comparable to the efficiency of C++. On the other hand, for time critical applications each indirection may be one too many. To save on the cost of (member) function invocation, C++ allows the definition of inline functions, which are expanded by the compiler, similar to macro definitions. Also, member functions may be declared inline. However, virtual member functions may, obviously, not be inline expanded at compile time. Therefore it seems reasonable to have a choice between declaring a function as being virtual or non-virtual. However, in contrast to what C++ offers, the default (unmarked) case should probably be virtual. Although the use of inheritance may result in decreasing the size of the code, it may also introduce an additional level of complexity. The problem that adheres to an (excessive) use of inheritance is known as the yo-yo problem. To find what function is actually called during the execution of a program may require an inspection of the entire inheritance graph. Having both virtual and non-virtual functions only adds to the complexity of understanding program behavior. What is required to tackle these problems is adequate browsing tools and tools to monitor the (dynamic) behavior of the program.

Memory management

Perhaps the most annoying feature of C++ (or rather absence of it) is memory management. Whereas both Smalltalk and Eiffel offer automatic garbage collection, the C++ programmer is required to rely on hand-crafted memory management.
class A { 
\fbox{A}
public: A() { cout << "A"; } ~A() { cout << "A"; } }; class B : public A {
\fbox{B}
public: B() { cout << "B"; } ~B() { cout << "B"; } };

slide: Constructors and destructors

Memory management in C++ involves the use of constructors and destructors. In the following, we will look at some examples illustrating the order of invocation of constructors and destructors in relation to single and multiple inheritance.

The first example, given in slide 2-m-1, defines two classes (A and B, with B derived from A), each having a constructor and destructor writing the name of the class to standard output. An example of their use is:

  A* a = new B; delete a; 
ABA

B* b = new B; delete b;
ABBA

Recall that when creating an instance of a class, the constructors of the base classes (if any) are called first. This is exactly what happens above. However, contrary to what is expected, when deleting a, the destructor for B is not called, whereas it is invoked when deleting b.
class A { 
\fbox{A}
public: A() { cout << "A"; } virtual ~A() { cout << "A"; } };

slide: Virtual destructors

The remedy to this is to declare the destructor of A virtual, as in slide 2-m-2, since it dynamically invokes the destructor declared for the actual class type $(B) of the object referenced. The program fragment
  A* a = new B; delete a; 
ABBA

B* b = new B; delete b;
ABBA

now behaves as desired.
class C: public A { 
\fbox{C}
public: C() { cout << "C"; } ~C() { cout << "C"; } }; class D : public B, public C {
\fbox{D}
public: D() { cout << "D"; } ~D() { cout << "D"; } };

slide: Multiple inheritance

Multiple inheritance

When employing multiple inheritance, similar rules are followed, as depicted in slide 2-m-3.

However, one problem we may encounter here is that classes may have a common base class. Look at the following program fragment:

  D* a = new D(); delete a; 
ABACDDCABA

The outcome of creating and deleting a indicates that an instance of D contains two copies of A.
class B: virtual public A { 
\fbox{B}
public: B() { cout << "B"; } ~B() { cout << "B"; } }; class C: virtual public A {
\fbox{C}
public: C() { cout << "C"; } ~C() { cout << "C"; } };

slide: Virtual inheritance

Again, the remedy is to declare A to be virtually inherited by B and C, as depicted in slide 2-m-4. As reflected in the outcome of
  A* a = new D(); delete a; 
ABCDDCBA

instances of the derived class D then have only one copy of A.