Ascii version
The discipline of software engineering provides a number of techniques that aid the software developer (or development team) to construct reliable software. Testing, for example, is a technique to establish in an experimental way the reliability and robustness of software. Another way of validating a system is by means of correctness proofs, checking whether the program (design) meets its (formal) specification.

Software engineering perspectives

4

Additional keywords and phrases: testing, black-box methods, white-box methods, correctness proofs
slide: Software engineering perspectives

Another area in which the discipline of software engineering has made a contribution is in measuring the structural complexity of software. Such measures may be used as indicators, for example to estimate the time needed to develop a system or the cost involved in maintenance. In this chapter, we will explore to what extent and how these techniques may be incorporated in an object-oriented approach. Also, the outline of a formal framework for developing object-oriented software will be sketched, merging the ideas developed when studying object-oriented design and the insights coming from a software engineering perspective.

Validating software

When validating a system a number of aspects play a role. First, it must be determined whether the software satisfies the original requirements and goals set by the user, as specified during analysis. Secondly, it must be established whether the system meets the specification laid down in the design document. The latter is usually referred to as `verification'. Verification is only one of the aspects of validation, since validation is meant to establish whether the system is a good system in terms of user satisfaction, whereas the phrase verification is generally used to describe the process of establishing the correctness of software in a more formal sense. A third aspect of validation concerns the robustness of the system, that is the extent to which it can handle exceptional circumstances, such as excessive amounts of data and heavy workloads. }

Software quality

  • correctness -- satisfies requirements and goals
  • robustness -- handles exceptional circumstances

Structural criteria

  • maintenance -- ease of adapting the software
  • reuse -- reusable components
  • compatibility -- plug-compatible components

slide: Software quality

Validation is primarily concerned with the functional properties of a system. Questions that need to be answered are: Is the system capable of doing what it is expected to do? And, is the user satisfied with how the system does it? In practice, the validation of a system is often restricted to functionality and user interface issues. However, other criteria, related to structural properties, are important as well. See slide 4-quality. For example, a customer may want to know that the system may easily be adapted to changing circumstances or different platforms. Also, the customer may be interested in reusing components of the system to develop other applications. Actually, with the trend shifting from single applications to application frameworks and libraries, structural criteria are becoming increasingly important, since they determine the ease and reliability with which components may be used as building blocks for composing new applications. Correspondingly, the verification of the components constituting a system (or library) will become more important as well. Testing is still the method most used for experimentally verifying the functional properties of a piece of software. In general, testing involves the execution of a program rather than formal methods of proof. In the sections that follow we will investigate what benefits we may expect from an object-oriented approach when it comes to testing.

Test methods

Testing is a relatively well-established discipline. It may be defined as {\em the process of judging experimentally whether a system or its components satisfy particular requirements}. The requirements are often laid down as a set of test data with which the system must comply. Testing, essentially, is a way in which to expose errors. However, passing a test suite may simply indicate that the test suite is a feeble test. Standard definitions of testing usually involve test cases and test procedures. Test cases define the input-output behavior of a program and test procedures specify how a program may be validated against a set of test data.  [Smith90] note that the computation model of input-output transformations underlying traditional conceptions of testing is not entirely adequate for object-oriented systems. Instead, they propose to define the testing of object-oriented programs as {\em the process of exercising the routines provided by an object with the goal of uncovering errors in the implementation of the routines or the state of the object, or both}. Three levels can be distinguished in the process of testing. A strategic level, which is primarily concerned with identifying the risks, that is the potentially dangerous components of a system that need to be validated with extra care. To decide which components involve high risks, metrics such as those described in section metrics may be of great value. Next, we have a tactical level, which for each component defines an appropriate test method and test set. And, finally, we have an operational level, consisting of the actual execution of the tests and evaluation of the test results. See slide 4-testing.

Testing

-- strategy, tactics, operational
  • the process of judging experimentally whether a system or component satisfies its specified requirements

Stop-criteria

-- minimize effort, maximize confidence
  • after detecting N errors
  • based on particular method
  • if the ratio errors / testtime is sufficiently small

Paradigms

  • demonstration, destruction, evaluation, prevention

slide: Testing

As a rule, good testing practice is intended to minimize the effort in producing tests (in terms of time and costs), the number of tests and the effort of performing the tests, while maximizing the number of errors detected and (most importantly) the confidence in the software (in terms of the tests successfully passed). One of the crucial moments in testing is to decide when to stop. Testing may halt either when the test results indicate that the system or component tested needs further improvement, or when the test results indicate that the system or component is sufficiently reliable. In principle, it is impossible to decide with absolute certainty that a piece of software is completely error-free. Usually, the particular method used will indicate when to stop testing. As a general stop-criterion, the ratio between errors and test time may be used. When the effort to detect another error reaches a certain limit, the system may be considered to be reliable. There is no need to say that there is a subjective moment involved in the decision to stop testing. We can distinguish between four different paradigms of testing. We may consider it sufficient to demonstrate that the software behaves as required. However, this must be regarded as a very weak notion of testing. More appropriate, generally, is to construct tests with the actual intention of detecting errors. Although this may seem to be a rather destructive attitude towards software, it is the only way to gain confidence in the reliability of an actual system. However, already in the earlier stages of software development we may look for means to reduce potential errors by evaluation procedures such as are discussed in section static. A step further in this direction would be to adopt a paradigm that actually prevents the occurrence of faults in the design and code. However, this requires a formal framework that for object-oriented programming has not yet been fully developed. See section formal.

Black-box versus white-box testing

Traditionally, two approaches to testing can be distinguished. One approach is concerned only with the functionality, that is the input-output behavior of the component or system. The other approach takes into account the actual structure of the software as well. The first approach is known as black-box testing; the second as white-box testing, since the contents of the box may, as it were, be inspected. See slide 4-box.

Black-box testing

-- functional \c{\zline{\fbox{test design}}}
  • equivalence classes
  • extremes

White-box testing

-- structural (dataflow)
  • instruction coverage
  • branch coverage
  • condition coverage

slide: Black-box and white-box testing

To make black-box testing manageable, equivalent input is usually grouped in classes, from which a representative element is chosen when performing the actual test. In particular, attention needs to be paid to extremes, which may be regarded as equivalence classes with only a single element.

Specification

  • r == sqrt(x) & x >= 0 <=> r^2 == x

Implementation

  float sqrt( float x ) {
  require( x >= 0 );
  const float eps = 0.0001;
  float guess = 1;
  while( abs(x - guess * guess ) ) > eps ) 
  	guess = ( guess + x / guess ) / 2;
  promise( guess * guess - x <= eps );
  return guess;
  }
  

slide: Specification and implementation

For example, when testing the function sqrt as specified in slide 4-sqrt, a distinction may be made between input arguments greater than zero, precisely zero, and less than zero. This results in three cases that must be tested. For example, input values -2, 0 and 4 may be chosen. It could be argued, however, that the value 1 should be treated as another extremum, since sqrt behaves as the identity on 1. As another example, imagine that we wish to test a function that sorts an array of integers, of maximal length say 1000. See slide 4-bubble. First, we need to select a number of different lengths, say 0, 1, 23 and 1000. For the latter two cases, we have the choice of filling the array with random-valued numbers, numbers in increasing order or numbers in decreasing order. For each of these distinct possibilities we need to select a number of test cases. The assumption underlying the use of equivalence classes is that one representative of a class is as good as any other. However, this works only when the assumptions on which our partition is based are sound. Moreover, our confidence will probably be stronger the more tests that are actually carried out.
  void bubble(int r[], int length) { 
\c{\fbox{bubble}}
int k = length; int sorted = 0; while ( ( k > 0 ) && !sorted ) { sorted = 1; for( int j = 0; j < k ; j++ ) if ( r[j] > r[j+1] ) { swap(r[j], r[j+1]); sorted = 0; } k = k - 1; } }
\nop{

Input

  • $100% instruction coverage -- $5,3
  • $100% condition coverage -- $5,3,7

slide: The bubble function

White-box testing usually involves the notion of a computation graph, relating the different parts of the program by means of a flow-diagram. For white-box testing, criteria are used such as instruction coverage (showing that the test set executes each instruction at least once), branch coverage (showing that each possible branch in the program is taken at least once), or condition coverage (showing that the test set causes each condition to be true and false at least once). These criteria impose increasingly stronger metrics on the flow-graph of the program and hence require more extensive testing to result in complete coverage. For example, when we consider the bubble sorting routine above, the array with values $5,3 results in $100% instruction coverage, but not in $100% condition coverage since the condition r[j] > r[j+1] will never be false. However, taking as input the array consisting of $5,3,7 we do have $100% condition coverage as well.

The test cycle

Testing, as so many other things in software development, is usually an iterative process. A complete test cycle may be characterized as consisting of testing the functionality of each module, integration testing (to check whether the combination of modules has the desired effect), testing the system as a whole, and acceptance testing (in order to get user approval). See slide 4-cycle.

Test cycle

  • module testing
  • integration testing
  • functional testing
  • system testing
  • acceptance testing

System testing

  • facilities, volume, stress, usability, security, performance

slide: The test cycle

System testing involves checking whether the system provides all the facilities required by the user and whether the user interface is satisfactory. Other aspects that may be of importance are the extent to which a system can cope with large volumes of data, whether it performs well on a heavily loaded network, whether it provides a sufficient level of security and whether the performance of the system is adequate in ordinary circumstances. For object-oriented software, the criteria of testing as well as the procedures of testing will virtually be the same. However, with respect to component testing (and to some extent, integration testing and functionality testing), we may expect significant differences.

Testing and inheritance

One of the most prominent claims made by adepts of an object-oriented approach is that code may easily and reliably be reused, even without access to the source code. This claim suggests that the inherited part of the code need not be re-tested. An example will be given, however, showing that this is only partially true. See slide 4-inheritance. Like most such examples, it is a contrived one, but what it shows is that the correct behavior of a class can depend upon accidental properties of the class that may no longer hold when the code is being reused in a different context.

Testing and inheritance

  • inherited code must be re-tested!

Because

  • subclass may affect inherited instance variables
  • superclass may use redefined (virtual) methods

slide: Testing and inheritance

As a general rule, inherited code must be re-tested. One reason for this is that a subclass may affect inherited instance variables. This is a problem especially when using a language that does not provide encapsulation for derived classes, such as Eiffel. However, in Eiffel appropriate pre-conditions can save you from violation by derived classes. In contrast, C++ does allow such encapsulation (by means of the keyword private), but inherited instance variables may still be accessed when they are declared protected or when a method returns a (non const) reference. See section 2-references. Another reason not to assume that inherited code is reliable is that the inherited class may employ virtual functions which may be redefined by the derived class. Redefining a virtual function may violate the assumptions underlying the definition of the base class or may conflict with the accidental properties of the base class, resulting in erroneous behavior.

Example -- violating the invariant

The example shown in slide 4-ex-inh-1 illustrates that redefining a virtual function, even in a very minor way, may lead to a violation of the invariant of the base class. Actually, the invariant $( n >= 0 )
is an accidental property of the class, due to the fact that the square of both positive and negative numbers is always positive.
  class A {  
\fbox{invariant A: n >= 0 }
public: A() { n = 0; } int value() { return next(n); } void strange() { next(-3); } protected: virtual int next( int i ) { return n = n + i * i; } int n; };
  class B : public A { 
\fbox{not \ifsli{ inv A }{invariant A } }
public: B() : A() { } protected: virtual int next( int i ) { return n = n + (n + 1) * i; } };

slide: Violating the invariant

Testing instances of class A will not reveal that the invariant is based on incorrect assumptions, since whatever input is used, invoking value() will always result in a positive number. However, when an instance of B is created, invoking strange() will result in an error. See slide 4-ex-inh-2.

Test cases

  A* a = new A; a->value(); a->strange(); a->value(); 
ok

A* b = new B; b->value(); b->strange(); b->value();
\c{//} error

Dynamic binding

  int f(A* a) {
  	a->strange();
  	return a->value();
  }
  

slide: Test cases

The example illustrates what happens when instances of a derived class (B) are behaviorally not conforming with their base class (A). The penalty of non-conformance is, as the example clearly shows, that functions defined for inputs of the base class no longer behave reliably, since instances of derived classes (although legally typed) may violate the assumptions pertaining to the base class. As an aside, it should be noted that the problems illustrated above would not have occurred so easily if the invariant and the behavior of the base and derived classes had been made explicit by means of a client-server contract. Moreover, annotating the methods with the proper pre- and post-conditions would allow automatic monitoring of the runtime consistency of the objects.

A framework for testing object-oriented programs

Presently, we have no generally accepted framework for testing object-oriented systems. However, it seems likely that we can to some extent reuse the insights and methods coming from traditional testing practice. Further, it seems that we may gain great benefits from adopting a contract based design discipline. In the following, we will study what influence the architectural structure of object-oriented systems has on the practice of testing. In particular, we will look at ways in which to test that the actual behavior of an object conforms to our expectations.

Levels of testing

Adopting an object-oriented approach will generally have a significant influence on the (architectural) structure of the program. Consequently, there will be a somewhat different distinction between levels of testing in comparison with a functional approach. The difference arises from the fact that in an object-oriented system the algorithm is distributed over a number of classes, involving multiple methods, whereas in a functional decomposition the components directly reflect the structure of the algorithm. Another difference comes from the fact that the notion of module in an object-oriented system encompasses both the concept of a class and the concept of a cluster, which is to be understood as a collection of (cooperating) classes. See slide 4-levels.

Levels of testing

  • algorithms -- methods
  • class -- interaction between methods and instance variables
  • cluster -- interaction between groups of classes
  • system -- encompasses all classes

Influence of errors

  • error is not executed
  • error is executed but has no effect
  • error results in legal state
  • error results in illegal state

slide: Levels of testing

When testing a system, a collection of objects, or an individual object, the effect that an error may not always be visible should be taken into account. It may be the case that erroneous code is simply not executed, or that the error is executed but without any effect on the results of the computation (as was the case for the instance of class A discussed previously). A further distinction must be made between errors that do have an effect on the computation, but nevertheless result in a legal (although erroneous) state, and errors that leave the computation in an illegal state. To understand what this means, however, we need to delineate more precisely the notion of state.

Testing the behavior of objects

To test the behavior of an object it is necessary to have some knowledge of the internal structure of the object, that is the state the object may be in at successive moments of the computation. For example, a counter object may be regarded as having two states, an initial state zero and a state in which the instance variable is greater than zero. On the other hand, for a bounded counter, bounded by max, three states must be distinguished: an initial state zero, a state characterized by $0 < n < max (where n is the instance variable of the bounded counter), and a state max that represents the terminal state of the counter, unless it can be reset to zero. Although many more states could have been distinguished, it suffices to consider only three states, since all the states (strictly) between zero and max may regarded as being equivalent. Since the actual parameters of a method may influence the transition from one object state to another object state, the values of these parameters must also be taken into account, in a similar way as when testing the extremum input values of a function. See slide 4-methods.

Object test methods -- state transitions

  • equivalence classes -- distinct \c{object} states
  • extrema testing -- includes parameters \c{of methods}

Errors

-- wrong result, illegal state change
  • within object -- invariance
  • involving multiple objects -- interaction protocols

slide: Object test methods

The actual testing may occur with reference to a transition matrix displaying the effect of each method invocation. Inspecting a transition matrix based on the internal state of the (instance variables of) the object may seem to be in contradiction with the principle of encapsulation encouraged in the chapter on design. However, providing a means to observe the state of an object is different from allowing clients unrestricted access to its instance variables. As an example, consider the transition matrices for a counter and a bounded counter displayed in slide 4-matrix. Two states are distinguished for the counter, respectively $(1)
for the state n = 0 and $(2) for the state n > 0, where we assume that the counter has an instance variable n to keep the actual count. For the bounded counter an additional state is added to allow for the possibility that n = max. Checking the behavior of these (admittedly very simple) objects may take place by a sequence of method calls followed by a check to determine whether the expected state changes have taken place.

Transition matrix

-- counter
slide: Transition matrix

For example, when incrementing a counter initialized to zero we must observe a state change from $(1)
to $(2). The important cases to test are the borderline cases. For instance, what happens when we decrement a newly created counter? With regard to the definition of the counter, as expressed by the pre- and post-conditions given in the transition matrix, this operation must be considered illegal since it will lead to an inconsistent state. What to do in such cases depends upon the policy taken when designing the object. When what  [Meyer88] calls a defensive programming approach is followed, calling the method will be allowed but the illegal state change will not occur. When following the (preferred) method of {\em programming by contract} the method call results in a failure due to the violation of a pre-condition, since the user did not conform to the protocol specified in the contract. We will consider this issue further when discussing runtime consistency checking in section consistency.

Identity transitions

Obviously, for other than very simple objects the number of states and the transitions to test for may become quite unwieldy. Hence, a state transition matrix enumerating all the interesting states in general seems not to be a practical solution. A better solution lies in looking for sequences of method calls that have an identical begin and end state. In slide 4-identity, some of the identity transition sequences for the counter are given, but obviously there are many more. One of the interesting features of identity transitions is that they may easily be checked by an automated test tool.

Identity transitions

  counter c; int n1, n2;
  n1 = c.value(); c.inc(1); c.dec(1); n2 = c.value();
  n1 = c.value(); c.inc(1); c.inc(2); c.dec(3); n2 = c.value();
  

Abstract data types

  • stack -- pop( push(s,x) ) = s
  • queue -- remove( insert(q,x) ) != q

Interaction protocols

  • tests all interesting \c{interaction} sequences

slide: Identity transitions and interaction protocols

A tool employing identity transitions is discussed in  [Smith90]. The tool generates arbitrarily many sequences of method calls resulting in an identity transition, and also generates the code to test these sequences, that is whether they actually leave the state of the object unaffected. The idea of identity transitions ultimately derives from the axiomatic characterization of invariance properties of abstract data types. For example, when specifying the behavior of a stack algebraically, one of the axioms will be of the form
pop(push(s,x)) = s, expressing that first pushing an element on the stack and then popping it results in an identical stack. (See section ADT-algebra for a more detailed discussion of abstract data types.) In contrast, we know that this property does not hold for a queue, unless the queue involved is the empty queue. The advantage of the method of testing for identity transitions is that we need not explicitly specify the individual states and state transitions associated with each method. However, to use automated testing tools, the method requires that we are able to specify by what rules sequences of method calls resulting in identity transitions may be constructed. Moreover, we cannot be sure that we have tested all relevant properties of the object, unless we can prove this from its formal specification. Most difficult to detect, however, are errors that result from not complying to some (implicitly stated) protocol related to multiple objects. For an example, think of the model-view protocol outlined in section 3-mvc. When the initialization of the model-view pairs is not properly done, for instance when a view is not initialized with a model, an error will occur when updating the value of the model. Such requirements are hard if not impossible to specify by means of merely client/server contracts, since possibly multiple objects are involved along with a sequence of method invocations. We will look at formal methods providing support for these issues in section formal-coop. Another tool for testing sequences of method invocations is described in  [Doong90]. The approach relies on an algebraic specification of the properties of the object, and seems to be suitable primarily for testing associativity and commutativity properties of methods.

Runtime consistency checking

Debugging is a hopelessly time-consuming and unrewarding activity. Unless the testing process is guided by clearly specified criteria on what to test for, testing in the sense of looking for errors must be considered as ordinary debugging, that is running the system to see what will happen. Client/server contracts, as introduced in section contracts as a method for design, do offer such guidelines in that they enable the programmer to specify precisely the restrictions characterizing the legal states of the object, as well as the conditions that must be satisfied in order for legal state transitions to occur. See slide 4-contracts.

Assertions

-- side-effect free

contracts

  • require -- test on delivery
  • promise -- test during development

Object invariance

-- exceptions
  • invariant -- verify when needed

Global properties

-- requirements
  • interaction \c{protocols} -- formal specification

slide: Runtime consistency checking

The Eiffel language is the first (object-oriented) language in which assertions were explicitly introduced as a means to develop software and to monitor the runtime consistency of a system. Contracts as supported by Eiffel were primarily influenced by notions concerning the construction of correct programs. The unique contribution of  [Meyer88] consists of showing that these notions may be employed operationally by specifying the pragmatic meaning of pre- and post-conditions defining the behavior of methods. To use assertions operationally, however, the assertion language must be restricted to side-effect free boolean expressions in the language being used. Combined with a bottom-up approach to development, the notion of contracts gives rise to the following guidelines for testing. Post-conditions and invariance assertions should primarily be checked during development. When sufficient confidence is gained in the reliability of the object definitions, checking these assertions may be omitted in favor of efficiency. However, pre-conditions must be checked when delivering the system to ensure that the user complies with the protocol specified by the contract. When delivering the system, it is a matter of contractual agreement between the deliverer and user whether pre- and/or post-conditions will be enabled. The safest option is to enable them both, since the violation of a pre-condition may be caused by an undetected violated post-condition. In addition, the method of testing for identity transitions may be used to cover higher level invariants, involving multiple objects. To check whether the conditions with respect to complex interaction protocols are satisfied, explicit consistency checks need to be inserted by the programmer. See also section global-invariants.

Example -- robust programming

As an example of how assertions may be applied to characterize the possible states of an object and to guard its runtime consistency, consider the doubly-bounded counter in slide 4-robust.
  class ctr { 
\fbox{doubly-bounded \c{counter}
int n, lb, ub; public: ctr(int l, int, u) : n(0), lb(l), ub(u) { promise( invariant() ); } void inc() { require( n < ub ); n++; promise( invariant() ); } void dec() { require( lb < n ); n--; promise( invariant() ); } int value() { return n; } protected: bool invariant() { return lb <= n && n <= ub; } };

slide: Robust programming

The counter has both a lower and upper bound that are set when constructing the object. Both the functions inc and dec have pre-conditions, respectively stating that incrementing the counter is legal only when its value is less than its upper bound and, similarly, that decrementing a counter may be done only when its value is greater than its lower bound. This characterization is clearly equivalent to a characterization as given by the transition matrix for a bounded counter. The implementation of the counter is robust, since it guards clients against possible misuse. The advantage of using assertions, apart from providing checks to test legal usage, is that they explicitly state the requirements imposed on the user.

Example -- binary tree

As a slightly less academic example (due to Meyer, 1992b), consider the implementation of a binary tree, consisting of nodes that are kept in a certain order. See slide 4-inv-1.
  template 
\fbox{binary tree}
class tree { public: tree( tree* p, T& n ) : parent(p) { left = right = 0; node = n; } void insert( tree* t ) { require( t != 0 ); insert_node( t ); promise( invariant() ); } virtual bool invariant() { return ( left == 0 || left->parent == this ) && ( right == 0 || right->parent == this ); } protected: tree *left, *right, *parent; void insert_node(tree* t);
\c{// does the real work}
T& node; };

slide: Checking invariants

How a node is actually inserted is not important; the only requirement imposed is simply that the inserted node does exist. However, we must guarantee that, in whatever way the node is inserted, the (ordered) structure of the tree is preserved. This requirement is expressed in the invariant, which states that whenever a child does exist, it points to the current object as its parent. Now, when it comes to testing, we may wish to check more thoroughly whether the ordered structure of the tree is indeed preserved when inserting a node. This may be done in a non-intrusive way by refining the tree class as in slide 4-inv-2.
  template  
\c{\fbox{test version}
class sortedtree : public tree { public: sortedtree( tree* p, T& n ):tree(p,n){} protected: bool invariant() {return sorted()&&tree::invariant();} int sorted();
check for order

};

slide: Test version

Assume that we have defined a function
sorted() to check whether the tree has the right order. Because the original tree invariant has been defined as a virtual function, we may rely on the dynamic binding mechanism to check the strengthened invariant when inserting a node. Thus, without much trouble, we have created a more robust version of the tree that may be used during testing and later be replaced by the original version.

Static testing

The test methods just described, whether organized around transition matrices or contracts, both involve the execution of the program and looking for errors. Either way, this is a laborious and time-consuming task. To avoid dynamic testing, several methods of program validation have been proposed that do not require the program to run. These methods may be referred to as static testing. One of the oldest, and perhaps most fruitful, methods is simply careful reading.  [Fiedler] reports that in an experimental setting most errors were detected by carefully reading the relevant program text. The most explicit proponent of this method is undoubtly  [Knuth92], who has proposed (and demonstrated) the discipline of literate programming. Although tools supporting literate programming in C++ are available (see section pde), no environment is as yet available that fully supports literate programming in an object-oriented style. See slide 4-static.

Methods for static testing

  • careful reading -- most successful (?)
  • code inspection -- looking for errors
  • walkthrough -- simulation game
  • correctness proof -- rigorous, but complex

slide: Static testing

Another method of static testing, based on similar assumptions, is groupwise code inspection. This method may profitably be used by teams of programmers. Although the attention is primarily directed towards the detection of errors, the method has proved to be beneficial for improving the code. An additional advantage is that it provides a background for discussing terminological issues, programming practice and opportunities for code reuse. More directed towards operational issues is the method of walkthroughs. A walkthrough is similar to the simulation game proposed for CRC cards (see section CRC). The idea is that by simulating a computation, while reading the relevant parts of the code, errors will come to light. As for code inspection, walkthroughs are best performed in a group. When employing groupwise code inspection it is advisable to have a small group of about four people, with a chairman to organize the meeting and with the author of the code as a silent observer. Each participant should receive the code and documentation a few days ahead, as well as a checklist with commonly occurring errors. As a goal, the actual meeting should result in a list of faults detected in the code. A quite different method of static testing, which in contrast to the previous methods is not often used in practice, is to provide correctness proofs for the relevant parts of the program. Proving a program correct is by far the most rigorous of all methods discussed, but unfortunately quite complex and demanding with respect to the formal skills of the programmer. However, actually proving a program correct seems to be far more difficult than only annotating the program with appropriate invariants and pre- and post-conditions. I believe that the notion of contracts provides a valuable means both to reason about the program and to check (dynamically) for the runtime consistency of a system, even without detailed correctness proofs. Not as a means of static testing but as a way of increasing our belief in the reliability of software, it may be advisable to take recourse to bottom-up development. According to  [Meyer88], an object-oriented approach lends itself extremely well to bottom-up development. Instead of trying to grasp the functionality of a system as a whole, small well-understood building blocks may be constructed (preferably documented by contracts) which may be used for increasingly complex abstractions.

Guidelines for design

Computing is a relatively young discipline. Despite its short history, a number of styles and schools promoting a particular style have emerged. However, in contrast to other disciplines such as the fine arts (including architecture) and musical composition, there is no well-established tradition of what is to be considered as good taste with respect to software design. There is an on-going and somewhat pointless debate as to whether software design must be looked at as an art or must be promoted into a science. See, for example,  [Knuth92] and  [Gries]. The debate has certainly resulted in new technology but has not, I am afraid, resulted in universally valid design guidelines. The notion of good design in the other disciplines is usually implicitly defined by a collection of examples of good design, as preserved in museums or (art or music) historical works. For software design, we are still a long way from anything like a museum, setting the standards of good design. Nevertheless, a compendium of examples of object-oriented applications such as  [Pinson90] and  [Harmon93], if perhaps not setting the standards for good design, may certainly be instructive.

Development process {\em -- cognitive factors}

  • model -> realize -> refine

Design criteria

-- natural, flexible, reusable
  • abstraction -- types
  • modularity -- strong cohesion (class)
  • structure -- subtyping
  • information hiding -- narrow interfaces
  • complexity -- weak coupling

slide: Criteria for design

The software engineering literature abounds with advice and tools to measure the quality of good design. In slide 3-design-criteria, a number of the criteria commonly found in software engineering texts is listed. In software design, we evidently strive for a high level of abstraction (as enabled by a notion of types and a corresponding notion of contracts), a modular structure with strongly cohesive units (as supported by the class construct), with units interrelated in a precisely defined way (for instance by a client/server or subtype relation). Other desirable properties are a high degree of information hiding (that is narrowly defined and yet complete interfaces) and a low level of complexity (which may be achieved with units that have only weak coupling, as supported by the client/server model). An impressive list, indeed. Design is a human process, in which cognitive factors play a critical role. The role of cognitive factors is reflected in the so-called fractal design process model introduced in  [JF88], which describes object-oriented development as a triangle with bases labeled by the phrases model, realize and refine. This triangle may be iterated at each of the bases, and so on. The iterative view of software development does justice to the importance of human understanding, since it allows for a simultaneous understanding of the problem domain and the mechanisms needed to model the domain and the system architecture. Good design involves taste. My personal definition of good design would certainly also involve cognitive factors (is the design understandable?), including subjective criteria such as is it pleasant to read or study the design? In contrast to the arts, however, software can be subjected to metrics measuring the cohesiveness and complexity of the system. In this section, we will look at a number of metrics which may, if well-established and supported by empirical evidence, be employed for managing software development projects. Also we will look at the Law of Demeter, which is actually not a law but which may act as a guideline for developing class interfaces. And finally, we will have a look at some guidelines for individual class design.

Metrics for object-oriented design

Object-oriented software development is a relatively new technology, still lacking empirical guidance and quantitative methods to measure progress and productivity. In  [ChidK], a suite of metrics is proposed that may aid in managing object-oriented software development projects, and, as the authors suggest, may be used also to establish to what extent an object-oriented approach has indeed been followed. See slide 4-suite.

A metric suite

  • WMC -- weighted methods per class
  • DIN -- depth of inheritance
  • NOC -- number of children
  • CBO -- coupling between objects
  • RFC -- response for a class
  • LCO -- lack of cohesion

Object-oriented design

  • object definition -- WMC, DIN, NOC
  • attributes -- RFC, LCO
  • communication -- RFC, CBO

slide: A metric suite

In general, quantitative measures of the size and complexity of software may aid in project planning and project evaluation, and may be instrumental in establishing the productivity of tools and techniques and in estimating the cost of both development and maintenance of a system. The metrics proposed in  [ChidK] pertain to three distinct elements of object-oriented design, namely the definition of objects and their relation to other objects, the attributes and/or properties of objects, and the (potential) communication between objects. The authors motivate their proposal by remarking that existing metrics do no justice to the notions of classes, inheritance, encapsulation and message passing, since they were developed primarily from a function-oriented view, separating data and procedures.

Definitions

To perform measurements on a program or design, we need to be able to describe the structure of a program or design in language-independent terms. As indicated below, the identifiers x, y and z will be used to name objects. Occasionally, we will use the term
class(x) to refer to the class of which object x is an instance. The term iv(x) will be used to refer to the set of instance variables of the object x, and likewise methods(x) will be used to refer to the set of methods that exists for x. Combined, the instance variables and methods of an object x are regarded as the properties of x. See slide 4-definitions.

Definitions

  • object names x, y
  • iv( x ) = {{ i | i \mbox{\it is \c{an instance} variable of } x }}
  • methods( x ) = {{ m | m \mbox is a method of x }}
  • properties( x ) = iv(x) \bigcup methods(x)

Read/write properties

  • iv( m_x ) = {{ i | m_x \mbox reads or writes i }}
  • methods( i_x ) = {{ m_x | i_x \e iv(m_x) }}

Cardinality

  • | S | = \mbox{ the cardinality of the set } S

slide: Definitions

An important property for an instance variable is whether it is read or written by a method. The set of instance variables read or written by a particular method
m_x will be referred to by the term iv(m_x). Likewise, the set of methods that either read or write a particular instance variable is referred to by the term methods(i_x). A number of metrics are defined by taking the cardinality of some set. The cardinality of a set is simply the number of elements it contains. To refer to the cardinality of a set S, the notation | S | will be used. In addition, we need predicates to characterize the inheritance structure of a program or design. The term root(x) will be used to refer to the root of the inheritance hierarchy of which class(x) is a member. The term descendants(x) will be used to refer to the set of classes of which class(x) is a direct ancestor, and the term distance(x,y) will be used to indicate the distance between class(x) and class(y) in the inheritance hierarchy. The distance will be one if y is a descendant of x and undefined if x and y are not related by inheritance. To describe the potential communication between objects the term x uses y will be used to state that object x calls some method of y. The term x calls m_y is used to specify more precisely that x calls the method m_y.

Evaluation criteria

Before discussing the individual metrics, we need to know by what criteria we may establish that a proposed metric is a valid instrument for measuring properties of a program. One means of validating a metric is gathering empirical evidence to determine whether a metric has predictive value, for instance with respect to the cost of maintaining software. Lacking empirical evidence,  [ChidK] establish the validity of their metrics with reference to criteria adapted from  [Weyuker]. See slide 4-evaluation. The criteria proposed by  [Weyuker] concern well-known complexity measures such as cyclomatic number, programming effort, statement count and data flow complexity.

Evaluation criteria

\E x \E y \bl %m(x) != %m(y) non-uniqueness -- \E x \E y \bl %m(x) = %m(y) permutation -- \E x \E y \bl y is a permutation of x /\ %m(x) != %m(y) implementation -- \A x \A y \bl fun(x) = fun(y) \not\Rightarrow %m(x) = %m(y) monotonicity -- \A x \A y \bl %m(x) <= %m(x + y) & %m(y) <= %m(x + y) interaction -- \A x \A y \E z \bl %m(x) \ifsli{\n}{} = %m(y) /\ %m(x + z) != %m(y + z) combination -- \E x \E y \bl %m(x) + %m(y) \ifsli{\n}{} < %m(x + y) R>
slide: Criteria for the evaluation of metrics

As a first criterion (i), it may be required that a metric has discriminating power, which means that there are at least two objects which give a different result. Another criterion (ii) is that the metric in question imposes some notion of equivalence, meaning that two distinct objects may deliver the same result for that particular metric. As a third criterion (iii), one may require that a permutation (that is a different ordering of the elements) of an object gives a different result. None of the proposed metrics, however, satisfy this criterion. This may not be very surprising, considering that the method interface of an object embodies what  [Meyer88] calls a shopping list, which means that it contains all the services needed in an intrinsically unordered fashion. The next criterion (iv) is that the actual implementation is of importance for the outcome of the metric. In other words, even though two objects perform the same function, the details of the implementation matter when determining the complexity of a system. Another property that a metric must satisfy (v) is the property of monotonicity, which implies that a single object is always less complex than when it is in some way combined with another object. This seems to be a reasonable requirement, however for objects located in distinct branches of the inheritance graph this need not always be the case. \nop{See section DIN.} Another requirement that may be imposed on a metric (vi) is that it shows that two equivalent objects may behave differently when placed in a particular context. This requirement is not satisfied by one of the metrics (RFC), which may be an indication that the metric must be refined. \nop{See section RFC.} Finally, the last property (vii) requires that a metric must reflect that decomposition may reduce the complexity of design. Interestingly, none of the proposed methods satisfy this requirement. According to  [ChidK], this raises the issue "that complexity could increase, not reduce as a design is broken into more objects". To conclude, evidently more research, including empirical validation, is required before adopting any metric as a reliable measure for the complexity of a design. Nevertheless, the metrics discussed below provide an invaluable starting point for such an effort. In the following sections, the individual metrics (WMC, DIN, NOC, CBO, RFC, LCO) will be characterized. For each metric, a formal definition will be given, and the notions underlying the definition characterized. Further, for each metric we will look at its implications for the practice of software development and establish (or disprove) the properties related to the evaluation criteria discussed previously.

Weighted methods per class

The first metric we look at provides a measure for the complexity of a single object. The assumption underlying the metric is that both the number of methods as well as the complexity of each method (expressed by its weight) determines the total complexity of the object. See slide 4-WMC. }

Weighted methods per class

WMC

Measure

-- complexity of an object
  • WMC(x) = \ifsli{\n}{} \sum_{m \e methods(x)} complexity(m)

Viewpoint --

the number of methods and the complexity of methods is an indicator of how much time and effort is required to develop and maintain objects
slide: Weighted methods per class

The WMC measure pertains to the definition of an object. From a software engineering perspective, we may regard the measure as an indicator of how much time and effort is required to develop and maintain the object (class). In general, objects having many (complex) methods are not likely to be reusable, but must be assumed to be tied to a particular application. To illustrate that property (vii) indeed does not hold for this metric, consider objects x and y with respectively
n_x and n_y methods. Assume that x and y have %d methods in common. Then n_x + n_y - %d <= n_x + n_y, and hence %m(x + y) <= %m(x) + %m(y), where x + y denotes the combination of objects x and y.

Depth of inheritance

The second metric (DIN) is a measure for the depth of the (class of the) object in the inheritance hierarchy. The measure is directly related to the scope of properties, since it indicates the number of classes from which the class inherits its functionality. For design, the greater the depth of the class in the inheritance hierarchy the greater will be its expected complexity, since apart from the methods defined for the class itself the methods inherited from classes higher in the hierarchy are also involved. The metric may also be used as an indication for reuse, that is reuse by inheritance. See slide 4-DIN.

Depth of inheritance

DIN

Measure -- scope of properties

  • DIN(x) = distance( root(x), class(x) )

Viewpoint --

the deeper a class is in the hierarchy, the greater the number of methods that is likely to be inherited, making the object more complex
slide: Depth of inheritance

Satisfaction of criteria (i), (ii) and (iv) is easily established. With respect to property (v), the monotonicity property, three cases must be distinguished. Recall that the property states that for any object x it holds that
%m(x) <= %m(x + y). Now assume that y is a child of x and %m(x) = n, then %m(y) = n + 1. But combining x and y will give %m(x + y) = n and %m(x + y) < %m(y), hence property (v) is not satisfied. When x and y are siblings, then %m(x) = %m(y) = %m(x + y) + 1, hence property (v) is satisfied. Finally, assume that x and y are not directly connected by inheritance and x and y are not siblings. Now if x and y are collapsed to the class lowest in the hierarchy, property (v) is satisfied. However, this need not be the case. Just imagine that class(x) is collapsed with root(x). Then, obviously, the monotonicity property is not satisfied.

Number of children

The third metric (NOC) gives the number of immediate subclasses of
class(x) in the class hierarchy. As the previous metric, it is related to the scope of properties. It is also a measure of reuse, since it indicates how many subclasses inherit the methods of class(x). According to  [ChidK], it is generally better to have depth than breadth in the class hierarchy, since depth promotes reuse through inheritance. Anyway, the number of descendants may be an indication of the influence of the class on the design. Consequently, a class scoring high on this metric may require more extensive testing. See slide 4-NOC. }

Number of children

NOC

Measure

-- scope of properties
  • NOC(x) = | descendants(x) |

Viewpoint --

generally, it is better to have depth than breadth in the class hierarchy, since it promotes the reuse of methods through inheritance
slide: Number of children

The reader is invited to check that properties (i), (ii), (iv) and (v) are satisfied. Recall that property (vi) states that for some objects y and z, if
%m(x) = %m(y) then x might behave differently when combined with z, that is %m(x+z) != %m(y+z). Assume that class(x) and class(y) both have n children, that is %m(x) = %m(y) = n, and let class(z) be a child of class(x), and assume that class(z) has r children. Then combining class(x) and class(z) will result in a class with n - 1 + r children, whereas combining class(y) and class(z) will result in a class with n + r children, which means that %m(x+z) != %m(y+z) and hence that property (vi) is satisfied.

Coupling between objects

The next metric (CBO) measures non-inheritance related connections with other classes. It is based on the notion that two objects are related if either one acts on the other, and as such is a measure of coupling, that is the degree of interdependence between objects. As phrased in  [ChidK], {\em excessive coupling between objects outside of the inheritance hierarchy is detrimental to modular design and prevents reuse}. In other words, objects with a low degree of interdependence are generally more easily reused. Note that coupling, as expressed by the metric, is not transitive, that is, if x uses y and y uses z, then it is not necessarily the case that x also uses z. In fact, a famous style guideline discussed in section demeter is based on the intuition underlying this metric. A high degree of coupling may indicate that testing the object may require a lot of effort, since other parts of the design are likely to be involved as well. As a general rule, a low degree of inter-object coupling should be strived for. \nop{whenever possible} See slide 4-CBO.

Coupling between objects

CBO

Measure

-- degree of dependence
  • CBO(x) = | {{ y | x uses y \/ y uses x }} |

Viewpoint --

excessive coupling between objects outside of the inheritance hierarchy is detrimental to modular design and prevents reuse
slide: Coupling between objects

Establishing properties (i), (ii), (iv), (v) and (vi) is left to the (diligent) reader. However, we will prove property (vii) to be invalid. Recall that property (vii) states that there exist objects x and y for which
%m(x) + %m(y) <= %m(x + y), meaning that for those objects the complexity of x combined with y is higher than the total complexity of x and y in isolation. Just pick arbitrary objects x and y, and assume that x and y have %d >= 0 couplings in common, for example both use an object z. Now %m(x + y) = %m(x) + %m(y) - %d, and hence %m(x + y) <= %m(x) + %m(y), contradicting property (vii). Strongly when %d > 0.

Response for a class

Our fifth metric (RFC) is based on the notion of response set. The response set of an object may be characterized as the set of methods it has available, consisting of the methods of its class and the methods of other objects that may be invoked by any of its own methods. This metric may be regarded as a measure of the communication that may occur between the object and other objects. If primarily (potential) extraneous method invocations are responsible for the size of the response set, it may be expected that testing the object will be difficult and will require a lot of knowledge of other parts of the design. See slide 4-RFC.

Response for a class

RFC

Measure

-- complexity of communication
  • RFC(x) = | methods(x) \bigcup {{ m_y | x calls m_y }} |

Viewpoint --

if a large number of methods can be invoked in response to a message, the testing and debugging of the object becomes more complex
slide: Response for a class

Establishing properties (i) and (iii) is left to the reader. To establish property (iv), stating that not only function but also implementation is important, it suffices to see that the actual implementation determines which and how many (extraneous) methods will be called. Property (v), monotonicity, follows from the observation that for any object y, it holds that
%m(x+y) >= max(%m(x),%m(y)) and hence %m(x+y) >= %m(x). According to  [ChidK], property (vi) is not satisfied. To disprove property (vi) it must be shown that given a object x and an object y for which %m(x) = %m(y), there is no object z that provides a context discriminating between x and y, in other words for which %m(x+z) != %m(y+z). The proof given in  [ChidK] relies on the assumption that %m(x+y) = max(%m(x),%m(y)), whereas one would expect %m(x+y) >= max(%m(x),%m(y)). However, assuming the latter, property (vi) indeed holds. Property (vii), nevertheless, may again be proven to be invalid.

Lack of cohesion

The last metric (LCO) we will look at is based on the notion of degree of similarity of methods. If methods have no instance variables in common, their degree of similarity is zero. A low degree of similarity may indicate a lack of cohesion. As a measure for the lack of cohesion the number of disjoint sets partitioning the instance variables is taken. Cohesiveness of methods within a class is desirable, since it promotes encapsulation of objects. For design, lack of cohesion may indicate that the class is better split up into two or more distinct classes. See slide 4-LCO.

Lack of cohesion

LCO

Measure

-- degree of similarity between methods
  • LCO(x) =
where
  • partitions(M,I) = \{ J \subseteq I \ifsli{\n\hspace*{0.0cm}{} | methods(J) \cap methods(I \backsl J) = \0 \}

Viewpoint --

cohesiveness of methods within a class is desirable since it promotes the encapsulation of objects
slide: Lack of cohesion

Establishing properties (i), (ii) and (iv) is left to the reader. To establish the monotonicity property (v), that is
%m(x) <= %m(x+y) & %m(y) <= %m(x + y) for arbitrary y, consider that combining objects may actually reduce the number of different sets, that is %m(x+y) = %m(x) + %m(y) - %d for some %d >= 0. The reduction %d, however, cannot be greater than the number of original sets, hence %d <= %m(x) and %d <= %m(y). Therefore, %m(x) + %m(y) - %d >= %m(x) and %m(x) + %m(y) - %d >= %m(y), establishing property (v). To establish property (vi), the interaction property, assume %m(x) = %m(y) for some object y and let z be another object with %m(z) = r. Now, %m(x+z) = n + r - %d and %m(y + z) = n + r - %r, where %d and %r are the reductions for, respectively, x+z and y+z. Since neither %d nor %r is dependent on n, they need not be equal, hence in general %m(x+z) != %m(y+z), establishing property (vi). To disprove property (vii), consider that %m(x+y) = %m(x) + %m(y) - %d for some %d >= 0, and hence %m(x + y) <= %m(x) + %m(y). The violation of property (vii) seems to indicate that it may indeed be better sometimes to have a single non-cohesive object than multiple cohesive ones, implementing the same functionality.

An objective sense of style

The metrics discussed in the previous section clearly suggest principles for the design of object-oriented systems, but do not lead immediately to explicit guidelines for design. In contrast,  [LH89] present such guidelines, but they are less explicit in their formal approach. The guidelines they presented were among the first, and they still provide good advice with respect to designing class interfaces.

G{\rm ood} O{\rm bject}-O{\rm riented} D{\rm esign}

  • organize and reduce dependencies between classes

Client

-- A method m is a client of C if m calls a method of C

Supplier

-- If m is a client of C then C is a supplier of m

Acquaintance

-- C is an acquaintance of m if C is a supplier of m but not (the type of) an argument of m or (of) an instance variable of the object of m\nl
[] C is a preferred acquaintance of m if an object of C is created in m or C is the type of a global variable\nl [] C is a preferred supplier of m if C is a supplier and C is (the type of) an instance variable, an argument or a preferred acquaintance\nl
slide: Clients, suppliers and acquaintances

In slide 4-good, an explicit definition of the dual notions of client and supplier has been given. It is important to note that not all of the potential suppliers for a class may be considered safe. Potentially unsafe suppliers are distinguished as acquaintances, of which those that are either created during a method call or stored in a global variable are to be preferred. Although this may not be immediately obvious, this excludes suppliers that are accessed in some indirect way, for instance as the result of a method call to some safe supplier. As an example of using an unsafe supplier, consider the call
  screen->cursor()->move();
  
which instructs the cursor associated with the screen to move to its home position. Although screen may be assumed to be a safe supplier, the object delivered by
screen->cursor() need not necessarily be a safe supplier. In contrast, the call
  screen->move_cursor();
  
does not make use of an indirection introducing a potentially unsafe supplier. The guideline concerning the use of safe suppliers is known as the Law of Demeter, of which the underlying intuition is that the programmer should not be bothered by knowledge that is not immediately apparent from the program text (that is the class interface) or founded in well-established conventions (as in the case of using special global variables). See slide 4-demeter.

Law of Demeter

\zline-- ignorance is bliss\nl Do not refer to a class C in a method m unless C is (the type of)
   1. an instance variable
   2. an argument of m
   3. an object created in m
   4. a global variable
  
[] Minimize the number of acquaintances\c{!}

Class transformations

  • lifting -- make structure of the class invisible
  • pushing -- push down responsibility

slide: The Law of Demeter

To remedy the use of unsafe suppliers, two kinds of program transformation are suggested by  [LH89]. First, the structure of a class should be made invisible for clients, to prohibit the use of a component as (an unsafe) supplier. This may require the lifting of primitive actions to the encompassing object, in order to make these primitives available to the client in a safe way. Secondly, the client should not be given the responsibility of performing (a sequence of) low-level actions. For example, moving the cursor should not be the responsibility of the client of the screen, but instead of the object representing the screen. In principle, the client need not be burdened with detailed knowledge of the cursor class. The software engineering principles underlying the Law of Demeter may be characterized as representing a compositional approach, since the law enforces the use of immediate parts only. As additional benefits, conformance to the law results in hiding the component structure of classes, reduces the coupling of control and, moreover, promotes reuse by enforcing the use of localized (type) information.

Individual class design

\c{ We have nearly completed a first tour around the various landmarks of object-oriented design. Identifying objects, expressing the interaction between objects by means of client/server contracts and describing the collaboration between objects in terms of behavioral compositions belong to a craft that will only be learned in the practice of developing real systems. } \nop{ We will conclude this chapter by looking at some informal, pragmatic guidelines for individual class design. } \c{ A class should represent a faithful model of a single concept, and be a reusable, plug-compatible component that is robust, well-designed and extensible. In slide 3-individual, we list a number of suggestions put forward by  [McGregor92]. } \slide{3-individual}{Individual class design}{

Class design {\em -- guidelines}

  • only methods public -- information hiding
  • do not expose implementation details
  • public members available to all classes -- strong cohesion
  • as few dependencies as possible -- weak coupling
  • explicit information passing
  • root class should be abstract model -- abstraction
} \c{ The first two guidelines enforce the principle of information hiding, advising that only methods public and all implementation details hidden. The third guideline states a principle of strong cohesion by requiring that classes implement a single protocol that is valid for all potential clients. A principle of weak coupling is enforced by requiring a class to have as few dependencies as possible, and to employ explicit information passing using messages instead of inheritance (except when inheritance may be used in a type consistent fashion). When using inheritance, the root class should be an abstract model of its derived classes, whether inheritance is used to realize a partial type or to define a specialization in a conceptual hierarchy. } \nop{ The list given above can be used as a checklist to verify whether a class is well-designed. In section 2-metrics we will explore metrics that capture the guidelines given in a more quantitative manner. Such metrics may be an aid in the software engineering of object-oriented systems and may possibly also be used to measure the productivity of object-oriented programmers. } \c{ The properties of classes, including their interfaces and relations with other classes, must be laid down in the design document. Ideally, the design document should present a complete and formal description of the structural, functional and dynamic aspects of the system, including an argument showing that the various models are consistent. However, in practice this will seldom be realized, partly, because object-oriented design techniques are as yet not sufficiently matured to allow a completely formal treatment, and partly because most designers will be satisfied with a non-formal rendering of the architecture of their system. Admittedly, the task of designing is already sufficiently complex, even without the additional complexity of a completely formal treatment. Nevertheless, studying the formal underpinnings of object-oriented modeling based on types and polymorphism is still worthwhile, since it will sharpen the intuition with respect to the notion of behavioral conformance and the refinement of contracts, which are both essential for developing reliable object models. And reliability is the key to reuse! }

Towards a formal approach

Reliability is the cornerstone of reuse. Hence, object-oriented implementation, design and analysis must first and foremost support the development of reliable software, should the original claim to promote the reuse of software ever come true. Validating software by means of testing alone is clearly insufficient. As argued in  [Backhouse], the probability of finding an error is usually too small to view testing as a reliable method of detecting the error. The fallacy of any empirical approach to validating software, which includes quantitative measurements based on software metrics, is that in the end we just have to wait and see what happens. In other words, it is useless as a design methodology.

Formal specification -- contracts

  • type specification -- local properties
  • relational specification -- structural properties, type relations
  • functional specification -- requirements

Verification -- as a design methodology

  • reasoning about program specification/code

Runtime consistency -- invariance

  • behavioral types specify test cases
  • invariants and assertions monitor consistency

slide: Formal specification and verification

Verification should be at the heart of any design method. In addition to allowing us to reason about the specification and the code, the design process should result in an architectural description of the system as well as in a proof that the system meets its requirements. Looking at the various approaches to the specification and verification of software, we can see that the notion of invariance plays a crucial role in developing provably correct solutions for a variety of problems (cf. Gries, 1981; Backhouse, 1986; Apt and Olderog, 1991; Dahl, 1992). Invariance, as we observed when discussing object test methods, also play an important role in testing the runtime consistency of a system. Hence, from a pragmatic point of view, studying formal approaches may help us become aware of the properties that determine the runtime consistency of object-oriented systems. In part III (chapter 10), we will explore what formal methods we have available for developing object-oriented software. Our starting point will be the foundations underlying the notion of contracts as introduced in  [Meyer88]. We will take a closer look at the relation between contracts and the specification of the properties of abstract data types. Also, we will look at methods allowing us to specify structural and functional relations between types, as may occur in behavioral compositions of objects. More specifically, we will study the means available to relate an abstract specification of the properties of a data type to a concrete implementation. These studies are based on an analysis of the notion of abstract data types, and the relation between inheritance and subtyping. In particular, we will look at rules to determine whether a subclass derived by inheritance conforms to the subtype relation that we may define in a formal approach to object types. However, before we delve into the formal foundations of object-oriented languages and develop a formal approach to object-oriented modeling, we will first explore the design space of object-oriented languages and system implementation techniques. These insights will enable us to establish to what extent we may capture a design in formal terms, and what heuristics are available to accomplish the tasks remaining in object-oriented development.

Summary

This chapter looked at system development from a software engineering perspective. How may we establish that software is reliable and to what extent can our experience be generalized to an object-oriented approach?

Validating software

1

  • software quality -- structural criteria
  • testing -- strategy, tactics, operational
  • inheritance -- inherited code must be retested

slide: Section 4.1: Validating software

We discussed the notions of software quality, including structural criteria, and testing, as an empirical way in which to validate software. An example has been given, illustrating that inherited code may need to be retested.

A framework for testing object-oriented programs

2

  • levels of testing -- influence of errors
  • object test methods -- state transitions
  • contracts -- interaction protocols
  • static testing -- careful reading

slide: Section 4.2: A framework for testing object-oriented programs

We developed (the beginnings of) a framework for testing object-oriented software, discussed the influence of errors and looked at object test methods directed at verifying state transitions. We also discussed how contracts may provide a guideline for testing, indicating state invariant interaction protocols. Testing may also be done by carefully reading the code, preferably with a group of colleagues.

Guidelines for design

3

  • metrics -- objects, attributes, communication
  • law of demeter -- class interfaces
  • reuse -- individual class design

slide: Section 4.3: Guidelines for design

A number of metrics for object-oriented design have been proposed. These metrics may be used to establish the complexity of object models. The metrics given are meant only as a starting point for further research and empirical validation. They cover aspects such as the complexity of the relation between the definition and usage of attributes and the complexity of communication patterns between objects. Related to the issues of complexity, we discussed the Law of Demeter which gives a guideline for good object-oriented design, including suggestions for class transformations to improve a particular design. Also we looked at some guidelines for individual class design.

Towards a formal approach

4

  • contracts -- formal specification
  • verification -- as a design methodology
  • runtime consistency -- invariance

slide: Section 4.4: Towards a formal approach

Finally, we reflected on the possible contribution of formal methods to the software engineering of object-oriented systems, and concluded that the notion of contracts may play an invaluable role, both as a design methodology and as a means to establish the runtime consistency of a system.

Questions

  1. What aspects can you distinguish with respect to software quality?
  2. Give an example demonstrating how inheritance may affect tested code.
  3. Between what levels of testing can you distinguish? Discuss the influence of errors for each of these levels.
  4. Discuss the problems involved in testing the behavior of an object. What would be your approach?
  5. Discuss how contracts may be employed to test object behavior.
  6. What methods of static testing can you think of? Do you consider them relevant? Explain.
  7. What metrics can you think of for object-oriented design? What is the intuition underlying these metrics?
  8. What evaluation criteria for metrics can you think of? Are these sufficient for applying such metrics in actual software projects? Explain.
  9. Give a formal definition of the following metrics: WMC, DIT. NOC, CBO, RFC and LOC. Explain their meaning from a software engineering viewpoint.
  10. What would be your rendering of the Law of Demeter? Can you phrase its underlying intuition? Explain.
  11. Define the notions of client, supplier and acquaintance. What restrictions must be satisfied to speak of a preferred acquaintance and a preferred supplier?
  12. Characterize the elements that form part of a formal specification.

Further reading

There is a massive amount of literature on software validation and testing. A standard text is  [Myers]. As research papers, I recommend  [Doong90] and  [Smith92]. For a further study of the Law of Demeter look at  [LH89].