Validating software

When validating a system a number of aspects play a role. First, it must be determined whether the software satisfies the original requirements and goals set by the user, as specified during analysis. Secondly, it must be established whether the system meets the specification laid down in the design document. The latter is usually referred to as `verification'. Verification is only one of the aspects of validation, since validation is meant to establish whether the system is a good system in terms of user satisfaction, whereas the phrase verification is generally used to describe the process of establishing the correctness of software in a more formal sense. A third aspect of validation concerns the robustness of the system, that is the extent to which it can handle exceptional circumstances, such as excessive amounts of data and heavy workloads. }

Software quality

Structural criteria


slide: Software quality

Validation is primarily concerned with the functional properties of a system. Questions that need to be answered are: Is the system capable of doing what it is expected to do? And, is the user satisfied with how the system does it? In practice, the validation of a system is often restricted to functionality and user interface issues. However, other criteria, related to structural properties, are important as well. See slide 4-quality. For example, a customer may want to know that the system may easily be adapted to changing circumstances or different platforms. Also, the customer may be interested in reusing components of the system to develop other applications. Actually, with the trend shifting from single applications to application frameworks and libraries, structural criteria are becoming increasingly important, since they determine the ease and reliability with which components may be used as building blocks for composing new applications. Correspondingly, the verification of the components constituting a system (or library) will become more important as well. Testing is still the method most used for experimentally verifying the functional properties of a piece of software. In general, testing involves the execution of a program rather than formal methods of proof. In the sections that follow we will investigate what benefits we may expect from an object-oriented approach when it comes to testing.

Test methods

Testing is a relatively well-established discipline. It may be defined as {\em the process of judging experimentally whether a system or its components satisfy particular requirements}. The requirements are often laid down as a set of test data with which the system must comply. Testing, essentially, is a way in which to expose errors. However, passing a test suite may simply indicate that the test suite is a feeble test. Standard definitions of testing usually involve test cases and test procedures. Test cases define the input-output behavior of a program and test procedures specify how a program may be validated against a set of test data.  [Smith90] note that the computation model of input-output transformations underlying traditional conceptions of testing is not entirely adequate for object-oriented systems. Instead, they propose to define the testing of object-oriented programs as {\em the process of exercising the routines provided by an object with the goal of uncovering errors in the implementation of the routines or the state of the object, or both}. Three levels can be distinguished in the process of testing. A strategic level, which is primarily concerned with identifying the risks, that is the potentially dangerous components of a system that need to be validated with extra care. To decide which components involve high risks, metrics such as those described in section metrics may be of great value. Next, we have a tactical level, which for each component defines an appropriate test method and test set. And, finally, we have an operational level, consisting of the actual execution of the tests and evaluation of the test results. See slide 4-testing.

Testing

-- strategy, tactics, operational

Stop-criteria

-- minimize effort, maximize confidence

Paradigms


slide: Testing

As a rule, good testing practice is intended to minimize the effort in producing tests (in terms of time and costs), the number of tests and the effort of performing the tests, while maximizing the number of errors detected and (most importantly) the confidence in the software (in terms of the tests successfully passed). One of the crucial moments in testing is to decide when to stop. Testing may halt either when the test results indicate that the system or component tested needs further improvement, or when the test results indicate that the system or component is sufficiently reliable. In principle, it is impossible to decide with absolute certainty that a piece of software is completely error-free. Usually, the particular method used will indicate when to stop testing. As a general stop-criterion, the ratio between errors and test time may be used. When the effort to detect another error reaches a certain limit, the system may be considered to be reliable. There is no need to say that there is a subjective moment involved in the decision to stop testing. We can distinguish between four different paradigms of testing. We may consider it sufficient to demonstrate that the software behaves as required. However, this must be regarded as a very weak notion of testing. More appropriate, generally, is to construct tests with the actual intention of detecting errors. Although this may seem to be a rather destructive attitude towards software, it is the only way to gain confidence in the reliability of an actual system. However, already in the earlier stages of software development we may look for means to reduce potential errors by evaluation procedures such as are discussed in section static. A step further in this direction would be to adopt a paradigm that actually prevents the occurrence of faults in the design and code. However, this requires a formal framework that for object-oriented programming has not yet been fully developed. See section formal.

Black-box versus white-box testing

Traditionally, two approaches to testing can be distinguished. One approach is concerned only with the functionality, that is the input-output behavior of the component or system. The other approach takes into account the actual structure of the software as well. The first approach is known as black-box testing; the second as white-box testing, since the contents of the box may, as it were, be inspected. See slide 4-box.

Black-box testing

-- functional \c{\zline{\fbox{test design}}}
  • equivalence classes
  • extremes

White-box testing

-- structural (dataflow)
  • instruction coverage
  • branch coverage
  • condition coverage

slide: Black-box and white-box testing

To make black-box testing manageable, equivalent input is usually grouped in classes, from which a representative element is chosen when performing the actual test. In particular, attention needs to be paid to extremes, which may be regarded as equivalence classes with only a single element.

Specification

  • r == sqrt(x) & x >= 0 <=> r^2 == x

Implementation

  float sqrt( float x ) {
  require( x >= 0 );
  const float eps = 0.0001;
  float guess = 1;
  while( abs(x - guess * guess ) ) > eps ) 
  	guess = ( guess + x / guess ) / 2;
  promise( guess * guess - x <= eps );
  return guess;
  }
  

slide: Specification and implementation

For example, when testing the function sqrt as specified in slide 4-sqrt, a distinction may be made between input arguments greater than zero, precisely zero, and less than zero. This results in three cases that must be tested. For example, input values -2, 0 and 4 may be chosen. It could be argued, however, that the value 1 should be treated as another extremum, since sqrt behaves as the identity on 1. As another example, imagine that we wish to test a function that sorts an array of integers, of maximal length say 1000. See slide 4-bubble. First, we need to select a number of different lengths, say 0, 1, 23 and 1000. For the latter two cases, we have the choice of filling the array with random-valued numbers, numbers in increasing order or numbers in decreasing order. For each of these distinct possibilities we need to select a number of test cases. The assumption underlying the use of equivalence classes is that one representative of a class is as good as any other. However, this works only when the assumptions on which our partition is based are sound. Moreover, our confidence will probably be stronger the more tests that are actually carried out.
  void bubble(int r[], int length) { 
\c{\fbox{bubble}}
int k = length; int sorted = 0; while ( ( k > 0 ) && !sorted ) { sorted = 1; for( int j = 0; j < k ; j++ ) if ( r[j] > r[j+1] ) { swap(r[j], r[j+1]); sorted = 0; } k = k - 1; } }
\nop{

Input

  • $100% instruction coverage -- $5,3
  • $100% condition coverage -- $5,3,7

slide: The bubble function

White-box testing usually involves the notion of a computation graph, relating the different parts of the program by means of a flow-diagram. For white-box testing, criteria are used such as instruction coverage (showing that the test set executes each instruction at least once), branch coverage (showing that each possible branch in the program is taken at least once), or condition coverage (showing that the test set causes each condition to be true and false at least once). These criteria impose increasingly stronger metrics on the flow-graph of the program and hence require more extensive testing to result in complete coverage. For example, when we consider the bubble sorting routine above, the array with values $5,3 results in $100% instruction coverage, but not in $100% condition coverage since the condition r[j] > r[j+1] will never be false. However, taking as input the array consisting of $5,3,7 we do have $100% condition coverage as well.

The test cycle

Testing, as so many other things in software development, is usually an iterative process. A complete test cycle may be characterized as consisting of testing the functionality of each module, integration testing (to check whether the combination of modules has the desired effect), testing the system as a whole, and acceptance testing (in order to get user approval). See slide 4-cycle.

Test cycle

  • module testing
  • integration testing
  • functional testing
  • system testing
  • acceptance testing

System testing

  • facilities, volume, stress, usability, security, performance

slide: The test cycle

System testing involves checking whether the system provides all the facilities required by the user and whether the user interface is satisfactory. Other aspects that may be of importance are the extent to which a system can cope with large volumes of data, whether it performs well on a heavily loaded network, whether it provides a sufficient level of security and whether the performance of the system is adequate in ordinary circumstances. For object-oriented software, the criteria of testing as well as the procedures of testing will virtually be the same. However, with respect to component testing (and to some extent, integration testing and functionality testing), we may expect significant differences.

Testing and inheritance

One of the most prominent claims made by adepts of an object-oriented approach is that code may easily and reliably be reused, even without access to the source code. This claim suggests that the inherited part of the code need not be re-tested. An example will be given, however, showing that this is only partially true. See slide 4-inheritance. Like most such examples, it is a contrived one, but what it shows is that the correct behavior of a class can depend upon accidental properties of the class that may no longer hold when the code is being reused in a different context.

Testing and inheritance

  • inherited code must be re-tested!

Because

  • subclass may affect inherited instance variables
  • superclass may use redefined (virtual) methods

slide: Testing and inheritance

As a general rule, inherited code must be re-tested. One reason for this is that a subclass may affect inherited instance variables. This is a problem especially when using a language that does not provide encapsulation for derived classes, such as Eiffel. However, in Eiffel appropriate pre-conditions can save you from violation by derived classes. In contrast, C++ does allow such encapsulation (by means of the keyword private), but inherited instance variables may still be accessed when they are declared protected or when a method returns a (non const) reference. See section 2-references. Another reason not to assume that inherited code is reliable is that the inherited class may employ virtual functions which may be redefined by the derived class. Redefining a virtual function may violate the assumptions underlying the definition of the base class or may conflict with the accidental properties of the base class, resulting in erroneous behavior.

Example -- violating the invariant

The example shown in slide 4-ex-inh-1 illustrates that redefining a virtual function, even in a very minor way, may lead to a violation of the invariant of the base class. Actually, the invariant $( n >= 0 )
is an accidental property of the class, due to the fact that the square of both positive and negative numbers is always positive.
  class A {  
\fbox{invariant A: n >= 0 }
public: A() { n = 0; } int value() { return next(n); } void strange() { next(-3); } protected: virtual int next( int i ) { return n = n + i * i; } int n; };
  class B : public A { 
\fbox{not \ifsli{ inv A }{invariant A } }
public: B() : A() { } protected: virtual int next( int i ) { return n = n + (n + 1) * i; } };

slide: Violating the invariant

Testing instances of class A will not reveal that the invariant is based on incorrect assumptions, since whatever input is used, invoking value() will always result in a positive number. However, when an instance of B is created, invoking strange() will result in an error. See slide 4-ex-inh-2.

Test cases

  A* a = new A; a->value(); a->strange(); a->value(); 
ok

A* b = new B; b->value(); b->strange(); b->value();
\c{//} error

Dynamic binding

  int f(A* a) {
  	a->strange();
  	return a->value();
  }
  

slide: Test cases

The example illustrates what happens when instances of a derived class (B) are behaviorally not conforming with their base class (A). The penalty of non-conformance is, as the example clearly shows, that functions defined for inputs of the base class no longer behave reliably, since instances of derived classes (although legally typed) may violate the assumptions pertaining to the base class. As an aside, it should be noted that the problems illustrated above would not have occurred so easily if the invariant and the behavior of the base and derived classes had been made explicit by means of a client-server contract. Moreover, annotating the methods with the proper pre- and post-conditions would allow automatic monitoring of the runtime consistency of the objects.