Validation and testing

Instructor's Guide


intro, methods, objects, contracts, formal, summary, Q/A, literature

Subsections


When validating a system a number of aspects play a role. First, it must be determined whether the software satisfies the original requirements and goals set by the user, as specified during analysis. Secondly, it must be established whether the system meets the specification laid down in the design document. The latter is usually referred to as `verification'. Verification is only one of the aspects of validation, since validation is meant to establish whether the system is a good system in terms of user satisfaction, whereas the phrase verification is generally used to describe the process of establishing the correctness of software in a more formal sense. A third aspect of validation concerns the robustness of the system, that is the extent to which it can handle exceptional circumstances, such as excessive amounts of data and heavy workloads.

Software quality

Structural criteria


slide: Software quality

Validation is primarily concerned with the functional properties of a system. Questions that need to be answered are: Is the system capable of doing what it is expected to do? And, is the user satisfied with how the system does it? In practice, the validation of a system is often restricted to functionality and user interface issues. However, other criteria, related to structural properties, are important as well. See slide 4-quality. For example, a customer may want to know that the system may easily be adapted to changing circumstances or different platforms. Also, the customer may be interested in reusing components of the system to develop other applications. Actually, with the trend shifting from single applications to application frameworks and libraries, structural criteria are becoming increasingly important, since they determine the ease and reliability with which components may be used as building blocks for composing new applications. Correspondingly, the verification of the components constituting a system (or library) will become more important as well. Testing is still the method most used for experimentally verifying the functional properties of a piece of software. In general, testing involves the execution of a program rather than formal methods of proof. In the sections that follow we will investigate what benefits we may expect from an object-oriented approach when it comes to testing.

Test methods

Instructor's Guide


intro, methods, objects, contracts, formal, summary, Q/A, literature
Testing is a relatively well-established discipline. It may be defined as {\em the process of judging experimentally whether a system or its components satisfy particular requirements}. The requirements are often laid down as a set of test data with which the system must comply. Testing, essentially, is a way in which to expose errors. However, passing a test suite may simply indicate that the test suite is a feeble test. Standard definitions of testing usually involve test cases and test procedures. Test cases define the input-output behavior of a program and test procedures specify how a program may be validated against a set of test data.  [Smith90] note that the computation model of input-output transformations underlying traditional conceptions of testing is not entirely adequate for object-oriented systems. Instead, they propose to define the testing of object-oriented programs as {\em the process of exercising the routines provided by an object with the goal of uncovering errors in the implementation of the routines or the state of the object, or both}. Three levels can be distinguished in the process of testing. A strategic level, which is primarily concerned with identifying the risks, that is the potentially dangerous components of a system that need to be validated with extra care. To decide which components involve high risks, metrics such as those described in section metrics may be of great value. Next, we have a tactical level, which for each component defines an appropriate test method and test set. And, finally, we have an operational level, consisting of the actual execution of the tests and evaluation of the test results. See slide 4-testing.

Testing -- strategy, tactics, operational

Stop-criteria -- minimize effort, maximize confidence

Paradigms


slide: Testing

As a rule, good testing practice is intended to minimize the effort in producing tests (in terms of time and costs), the number of tests and the effort of performing the tests, while maximizing the number of errors detected and (most importantly) the confidence in the software (in terms of the tests successfully passed). One of the crucial moments in testing is to decide when to stop. Testing may halt either when the test results indicate that the system or component tested needs further improvement, or when the test results indicate that the system or component is sufficiently reliable. In principle, it is impossible to decide with absolute certainty that a piece of software is completely error-free. Usually, the particular method used will indicate when to stop testing. As a general stop-criterion, the ratio between errors and test time may be used. When the effort to detect another error reaches a certain limit, the system may be considered to be reliable. There is no need to say that there is a subjective moment involved in the decision to stop testing. We can distinguish between four different paradigms of testing. We may consider it sufficient to demonstrate that the software behaves as required. However, this must be regarded as a very weak notion of testing. More appropriate, generally, is to construct tests with the actual intention of detecting errors. Although this may seem to be a rather destructive attitude towards software, it is the only way to gain confidence in the reliability of an actual system. However, already in the earlier stages of software development we may look for means to reduce potential errors by evaluation procedures such as are discussed in section static. A step further in this direction would be to adopt a paradigm that actually prevents the occurrence of faults in the design and code. However, this requires a formal framework that for object-oriented programming has not yet been fully developed. See section formal.

Black-box versus white-box testing

Traditionally, two approaches to testing can be distinguished. One approach is concerned only with the functionality, that is the input-output behavior of the component or system. The other approach takes into account the actual structure of the software as well. The first approach is known as black-box testing; the second as white-box testing, since the contents of the box may, as it were, be inspected. See slide 4-box.

Black-box testing -- functional

test design

  • equivalence classes
  • extremes

White-box testing -- structural (dataflow)

  • instruction coverage
  • branch coverage
  • condition coverage

slide: Black-box and white-box testing

To make black-box testing manageable, equivalent input is usually grouped in classes, from which a representative element is chosen when performing the actual test. In particular, attention needs to be paid to extremes, which may be regarded as equivalence classes with only a single element.

Specification

  • r == sqrt(x) & x >= 0 <=> r^2 == x

Implementation

  float sqrt( float x ) {
  require( x >= 0 );
  const float eps = 0.0001;
  float guess = 1;
  while( abs(x - guess * guess ) ) > eps ) 
  	guess = ( guess + x / guess ) / 2;
  promise( guess * guess - x <= eps );
  return guess;
  }
  

slide: Specification and implementation

For example, when testing the function sqrt as specified in slide 4-sqrt, a distinction may be made between input arguments greater than zero, precisely zero, and less than zero. This results in three cases that must be tested. For example, input values -2, 0 and 4 may be chosen. It could be argued, however, that the value 1 should be treated as another extremum, since sqrt behaves as the identity on 1.

As another example, imagine that we wish to test a function that sorts an array of integers, of maximal length say 1000. First, we need to select a number of different lengths, say 0, 1, 23 and 1000. For the latter two cases, we have the choice of filling the array with random-valued numbers, numbers in increasing order or numbers in decreasing order. For each of these distinct possibilities we need to select a number of test cases. The assumption underlying the use of equivalence classes is that one representative of a class is as good as any other. However, this works only when the assumptions on which our partition is based are sound. Moreover, our confidence will probably be stronger the more tests that are actually carried out. White-box testing usually involves the notion of a computation graph, relating the different parts of the program by means of a flow-diagram. For white-box testing, criteria are used such as instruction coverage (showing that the test set executes each instruction at least once), branch coverage (showing that each possible branch in the program is taken at least once), or condition coverage (showing that the test set causes each condition to be true and false at least once).

The test cycle

Testing, as so many other things in software development, is usually an iterative process. A complete test cycle may be characterized as consisting of testing the functionality of each module, integration testing (to check whether the combination of modules has the desired effect), testing the system as a whole, and acceptance testing (in order to get user approval). See slide 4-cycle.

Test cycle

  • module testing
  • integration testing
  • functional testing
  • system testing
  • acceptance testing

System testing

  • facilities, volume, stress, usability, security, performance

slide: The test cycle

System testing involves checking whether the system provides all the facilities required by the user and whether the user interface is satisfactory. Other aspects that may be of importance are the extent to which a system can cope with large volumes of data, whether it performs well on a heavily loaded network, whether it provides a sufficient level of security and whether the performance of the system is adequate in ordinary circumstances. For object-oriented software, the criteria of testing as well as the procedures of testing will virtually be the same. However, with respect to component testing (and to some extent, integration testing and functionality testing), we may expect significant differences.

Testing and inheritance

Instructor's Guide


intro, methods, objects, contracts, formal, summary, Q/A, literature
One of the most prominent claims made by adepts of an object-oriented approach is that code may easily and reliably be reused, even without access to the source code. This claim suggests that the inherited part of the code need not be re-tested. An example will be given, however, showing that this is only partially true. See slide 4-inheritance. Like most such examples, it is a contrived one, but what it shows is that the correct behavior of a class can depend upon accidental properties of the class that may no longer hold when the code is being reused in a different context.

Testing and inheritance

  • inherited code must be re-tested!

Because

  • subclass may affect inherited instance variables
  • superclass may use redefined (virtual) methods

slide: Testing and inheritance

As a general rule, inherited code must be re-tested. One reason for this is that a subclass may affect inherited instance variables. This is a problem especially when using a language that does not provide encapsulation for derived classes, such as Eiffel. However, in Eiffel appropriate pre-conditions can save you from violation by derived classes. In contrast, C++ does allow such encapsulation (by means of the keyword private), but inherited instance variables may still be accessed when they are declared protected or when a method returns a (non const) reference. See section 2-references. Another reason not to assume that inherited code is reliable is that the inherited class may employ virtual functions which may be redefined by the derived class. Redefining a virtual function may violate the assumptions underlying the definition of the base class or may conflict with the accidental properties of the base class, resulting in erroneous behavior.

Example -- violating the invariant

The example shown below illustrates that redefining a virtual function, even in a very minor way, may lead to a violation of the invariant of the base class. Actually, the invariant ( n >= 0 ) is an accidental property of the class, due to the fact that the square of both positive and negative numbers is always positive.
  class A {  
invariant A: n >= 0
public: A() { n = 0; } int value() { return next(n); } void strange() { next(-3); } protected: virtual int next( int i ) { return n = n + i * i; } int n; };
  class B : public A { 
not invariant A
public: B() : A() { } protected: virtual int next( int i ) { return n = n + (n + 1) * i; } };

slide: Violating the invariant

Testing instances of class A will not reveal that the invariant is based on incorrect assumptions, since whatever input is used, invoking value() will always result in a positive number. However, when an instance of B is created, invoking strange() will result in an error.

Test cases

  A* a = new A; a->value(); a->strange(); a->value(); 
ok
A* b = new B; b->value(); b->strange(); b->value();
error

Dynamic binding

  int f(A* a) {
  	a->strange();
  	return a->value();
  }
  

slide: Test cases

The example illustrates what happens when instances of a derived class (B) are behaviorally not conforming with their base class (A). The penalty of non-conformance is, as the example clearly shows, that functions defined for inputs of the base class no longer behave reliably, since instances of derived classes (although legally typed) may violate the assumptions pertaining to the base class.

As an aside, it should be noted that the problems illustrated above would not have occurred so easily if the invariant and the behavior of the base and derived classes had been made explicit by means of a client-server contract. Moreover, annotating the methods with the proper pre- and post-conditions would allow automatic monitoring of the runtime consistency of the objects.

Testing object behavior

Instructor's Guide


intro, methods, objects, contracts, formal, summary, Q/A, literature
Presently, we have no generally accepted framework for testing object-oriented systems. However, it seems likely that we can to some extent reuse the insights and methods coming from traditional testing practice. Further, it seems that we may gain great benefits from adopting a contract based design discipline. In the following, we will study what influence the architectural structure of object-oriented systems has on the practice of testing. In particular, we will look at ways in which to test that the actual behavior of an object conforms to our expectations.

Levels of testing

Adopting an object-oriented approach will generally have a significant influence on the (architectural) structure of the program. Consequently, there will be a somewhat different distinction between levels of testing in comparison with a functional approach. The difference arises from the fact that in an object-oriented system the algorithm is distributed over a number of classes, involving multiple methods, whereas in a functional decomposition the components directly reflect the structure of the algorithm. Another difference comes from the fact that the notion of module in an object-oriented system encompasses both the concept of a class and the concept of a cluster, which is to be understood as a collection of (cooperating) classes. See slide 4-levels.

Levels of testing

  • algorithms -- methods
  • class -- interaction between methods and instance variables
  • cluster -- interaction between groups of classes
  • system -- encompasses all classes

slide: Levels of testing

When testing a system, a collection of objects, or an individual object, the effect that an error may not always be visible should be taken into account. It may be the case that erroneous code is simply not executed, or that the error is executed but without any effect on the results of the computation (as was the case for the instance of class A discussed previously). A further distinction must be made between errors that do have an effect on the computation, but nevertheless result in a legal (although erroneous) state, and errors that leave the computation in an illegal state. To understand what this means, however, we need to delineate more precisely the notion of state.

Object behavior

To test the behavior of an object it is necessary to have some knowledge of the internal structure of the object, that is the state the object may be in at successive moments of the computation. For example, a counter object may be regarded as having two states, an initial state zero and a state in which the instance variable is greater than zero. On the other hand, for a bounded counter, bounded by max, three states must be distinguished: an initial state zero, a state characterized by 0 < n < max (where n is the instance variable of the bounded counter), and a state max that represents the terminal state of the counter, unless it can be reset to zero. Although many more states could have been distinguished, it suffices to consider only three states, since all the states (strictly) between zero and max may regarded as being equivalent. Since the actual parameters of a method may influence the transition from one object state to another object state, the values of these parameters must also be taken into account, in a similar way as when testing the extremum input values of a function. See slide 4-methods.

Object test methods -- state transitions

  • equivalence classes -- distinct object states
  • extrema testing -- includes parameters of methods

Errors

-- wrong result, illegal state change
  • within object -- invariance
  • involving multiple objects -- interaction protocols

slide: Object test methods

The actual testing may occur with reference to a transition matrix displaying the effect of each method invocation. Inspecting a transition matrix based on the internal state of the (instance variables of) the object may seem to be in contradiction with the principle of encapsulation encouraged in the chapter on design. However, providing a means to observe the state of an object is different from allowing clients unrestricted access to its instance variables. As an example, consider the transition matrices for a counter and a bounded counter displayed in slide 4-matrix. Two states are distinguished for the counter, respectively (1) for the state n = 0 and (2) for the state n > 0 , where we assume that the counter has an instance variable n to keep the actual count. For the bounded counter an additional state is added to allow for the possibility that n = max . Checking the behavior of these (admittedly very simple) objects may take place by a sequence of method calls followed by a check to determine whether the expected state changes have taken place.

Transition matrix -- counter

Transition matrix -- bounded counter


slide: Transition matrix

For example, when incrementing a counter initialized to zero we must observe a state change from $(1) to $(2). The important cases to test are the borderline cases. For instance, what happens when we decrement a newly created counter? With regard to the definition of the counter, as expressed by the pre- and post-conditions given in the transition matrix, this operation must be considered illegal since it will lead to an inconsistent state. What to do in such cases depends upon the policy taken when designing the object. When what  [Meyer88] calls a defensive programming approach is followed, calling the method will be allowed but the illegal state change will not occur. When following the (preferred) method of {\em programming by contract} the method call results in a failure due to the violation of a pre-condition, since the user did not conform to the protocol specified in the contract. We will consider this issue further when discussing runtime consistency checking in section consistency.

Identity transitions

Obviously, for other than very simple objects the number of states and the transitions to test for may become quite unwieldy. Hence, a state transition matrix enumerating all the interesting states in general seems not to be a practical solution. A better solution lies in looking for sequences of method calls that have an identical begin and end state. In slide 4-identity, some of the identity transition sequences for the counter are given, but obviously there are many more. One of the interesting features of identity transitions is that they may easily be checked by an automated test tool.

Identity transitions

  counter c; int n1, n2;
  n1 = c.value(); c.inc(1); c.dec(1); n2 = c.value();
  n1 = c.value(); c.inc(1); c.inc(2); c.dec(3); n2 = c.value();
  

Abstract data types

  • stack -- pop( push(s,x) ) = s
  • queue -- remove( insert(q,x) ) != q

Interaction protocols

  • tests all interesting interaction sequences

slide: Identity transitions and interaction protocols

A tool employing identity transitions is discussed in  [Smith90]. The tool generates arbitrarily many sequences of method calls resulting in an identity transition, and also generates the code to test these sequences, that is whether they actually leave the state of the object unaffected. The idea of identity transitions ultimately derives from the axiomatic characterization of invariance properties of abstract data types. For example, when specifying the behavior of a stack algebraically, one of the axioms will be of the form pop(push(s,x)) = s, expressing that first pushing an element on the stack and then popping it results in an identical stack. (See section ADT-algebra for a more detailed discussion of abstract data types.) In contrast, we know that this property does not hold for a queue, unless the queue involved is the empty queue. The advantage of the method of testing for identity transitions is that we need not explicitly specify the individual states and state transitions associated with each method. However, to use automated testing tools, the method requires that we are able to specify by what rules sequences of method calls resulting in identity transitions may be constructed. Moreover, we cannot be sure that we have tested all relevant properties of the object, unless we can prove this from its formal specification. Most difficult to detect, however, are errors that result from not complying to some (implicitly stated) protocol related to multiple objects. For an example, think of the model-view protocol outlined in section 3-mvc. When the initialization of the model-view pairs is not properly done, for instance when a view is not initialized with a model, an error will occur when updating the value of the model. Such requirements are hard if not impossible to specify by means of merely client/server contracts, since possibly multiple objects are involved along with a sequence of method invocations. We will look at formal methods providing support for these issues in section formal-coop. Another tool for testing sequences of method invocations is described in  [Doong90]. The approach relies on an algebraic specification of the properties of the object, and seems to be suitable primarily for testing associativity and commutativity properties of methods.

Runtime consistency checking

Debugging is a hopelessly time-consuming and unrewarding activity. Unless the testing process is guided by clearly specified criteria on what to test for, testing in the sense of looking for errors must be considered as ordinary debugging, that is running the system to see what will happen. Client/server contracts, as introduced in section contracts as a method for design, do offer such guidelines in that they enable the programmer to specify precisely the restrictions characterizing the legal states of the object, as well as the conditions that must be satisfied in order for legal state transitions to occur. See slide 4-contracts.

Assertions -- side-effect free

contracts


  • require -- test on delivery
  • promise -- test during development

Object invariance -- exceptions

  • invariant -- verify when needed

Global properties

-- requirements
  • interaction protocols -- formal specification

slide: Runtime consistency checking

The Eiffel language is the first (object-oriented) language in which assertions were explicitly introduced as a means to develop software and to monitor the runtime consistency of a system. Contracts as supported by Eiffel were primarily influenced by notions concerning the construction of correct programs. The unique contribution of  [Meyer88] consists of showing that these notions may be employed operationally by specifying the pragmatic meaning of pre- and post-conditions defining the behavior of methods. To use assertions operationally, however, the assertion language must be restricted to side-effect free boolean expressions in the language being used. Combined with a bottom-up approach to development, the notion of contracts gives rise to the following guidelines for testing. Post-conditions and invariance assertions should primarily be checked during development. When sufficient confidence is gained in the reliability of the object definitions, checking these assertions may be omitted in favor of efficiency. However, pre-conditions must be checked when delivering the system to ensure that the user complies with the protocol specified by the contract. When delivering the system, it is a matter of contractual agreement between the deliverer and user whether pre- and/or post-conditions will be enabled. The safest option is to enable them both, since the violation of a pre-condition may be caused by an undetected violated post-condition. In addition, the method of testing for identity transitions may be used to cover higher level invariants, involving multiple objects. To check whether the conditions with respect to complex interaction protocols are satisfied, explicit consistency checks need to be inserted by the programmer. See also section global-invariants.