Test methods

Instructor's Guide


intro, methods, objects, contracts, formal, summary, Q/A, literature
Testing is a relatively well-established discipline. It may be defined as {\em the process of judging experimentally whether a system or its components satisfy particular requirements}. The requirements are often laid down as a set of test data with which the system must comply. Testing, essentially, is a way in which to expose errors. However, passing a test suite may simply indicate that the test suite is a feeble test. Standard definitions of testing usually involve test cases and test procedures. Test cases define the input-output behavior of a program and test procedures specify how a program may be validated against a set of test data.  [Smith90] note that the computation model of input-output transformations underlying traditional conceptions of testing is not entirely adequate for object-oriented systems. Instead, they propose to define the testing of object-oriented programs as {\em the process of exercising the routines provided by an object with the goal of uncovering errors in the implementation of the routines or the state of the object, or both}. Three levels can be distinguished in the process of testing. A strategic level, which is primarily concerned with identifying the risks, that is the potentially dangerous components of a system that need to be validated with extra care. To decide which components involve high risks, metrics such as those described in section metrics may be of great value. Next, we have a tactical level, which for each component defines an appropriate test method and test set. And, finally, we have an operational level, consisting of the actual execution of the tests and evaluation of the test results. See slide 4-testing.

Testing -- strategy, tactics, operational

Stop-criteria -- minimize effort, maximize confidence

Paradigms


slide: Testing

As a rule, good testing practice is intended to minimize the effort in producing tests (in terms of time and costs), the number of tests and the effort of performing the tests, while maximizing the number of errors detected and (most importantly) the confidence in the software (in terms of the tests successfully passed). One of the crucial moments in testing is to decide when to stop. Testing may halt either when the test results indicate that the system or component tested needs further improvement, or when the test results indicate that the system or component is sufficiently reliable. In principle, it is impossible to decide with absolute certainty that a piece of software is completely error-free. Usually, the particular method used will indicate when to stop testing. As a general stop-criterion, the ratio between errors and test time may be used. When the effort to detect another error reaches a certain limit, the system may be considered to be reliable. There is no need to say that there is a subjective moment involved in the decision to stop testing. We can distinguish between four different paradigms of testing. We may consider it sufficient to demonstrate that the software behaves as required. However, this must be regarded as a very weak notion of testing. More appropriate, generally, is to construct tests with the actual intention of detecting errors. Although this may seem to be a rather destructive attitude towards software, it is the only way to gain confidence in the reliability of an actual system. However, already in the earlier stages of software development we may look for means to reduce potential errors by evaluation procedures such as are discussed in section static. A step further in this direction would be to adopt a paradigm that actually prevents the occurrence of faults in the design and code. However, this requires a formal framework that for object-oriented programming has not yet been fully developed. See section formal.

Black-box versus white-box testing

Traditionally, two approaches to testing can be distinguished. One approach is concerned only with the functionality, that is the input-output behavior of the component or system. The other approach takes into account the actual structure of the software as well. The first approach is known as black-box testing; the second as white-box testing, since the contents of the box may, as it were, be inspected. See slide 4-box.

Black-box testing -- functional

test design

White-box testing -- structural (dataflow)


slide: Black-box and white-box testing

To make black-box testing manageable, equivalent input is usually grouped in classes, from which a representative element is chosen when performing the actual test. In particular, attention needs to be paid to extremes, which may be regarded as equivalence classes with only a single element.

Specification

  • r == sqrt(x) & x >= 0 <=> r^2 == x

Implementation

  float sqrt( float x ) {
  require( x >= 0 );
  const float eps = 0.0001;
  float guess = 1;
  while( abs(x - guess * guess ) ) > eps ) 
  	guess = ( guess + x / guess ) / 2;
  promise( guess * guess - x <= eps );
  return guess;
  }
  

slide: Specification and implementation

For example, when testing the function sqrt as specified in slide 4-sqrt, a distinction may be made between input arguments greater than zero, precisely zero, and less than zero. This results in three cases that must be tested. For example, input values -2, 0 and 4 may be chosen. It could be argued, however, that the value 1 should be treated as another extremum, since sqrt behaves as the identity on 1.

As another example, imagine that we wish to test a function that sorts an array of integers, of maximal length say 1000. First, we need to select a number of different lengths, say 0, 1, 23 and 1000. For the latter two cases, we have the choice of filling the array with random-valued numbers, numbers in increasing order or numbers in decreasing order. For each of these distinct possibilities we need to select a number of test cases. The assumption underlying the use of equivalence classes is that one representative of a class is as good as any other. However, this works only when the assumptions on which our partition is based are sound. Moreover, our confidence will probably be stronger the more tests that are actually carried out. White-box testing usually involves the notion of a computation graph, relating the different parts of the program by means of a flow-diagram. For white-box testing, criteria are used such as instruction coverage (showing that the test set executes each instruction at least once), branch coverage (showing that each possible branch in the program is taken at least once), or condition coverage (showing that the test set causes each condition to be true and false at least once).

The test cycle

Testing, as so many other things in software development, is usually an iterative process. A complete test cycle may be characterized as consisting of testing the functionality of each module, integration testing (to check whether the combination of modules has the desired effect), testing the system as a whole, and acceptance testing (in order to get user approval). See slide 4-cycle.

Test cycle

  • module testing
  • integration testing
  • functional testing
  • system testing
  • acceptance testing

System testing

  • facilities, volume, stress, usability, security, performance

slide: The test cycle

System testing involves checking whether the system provides all the facilities required by the user and whether the user interface is satisfactory. Other aspects that may be of importance are the extent to which a system can cope with large volumes of data, whether it performs well on a heavily loaded network, whether it provides a sufficient level of security and whether the performance of the system is adequate in ordinary circumstances. For object-oriented software, the criteria of testing as well as the procedures of testing will virtually be the same. However, with respect to component testing (and to some extent, integration testing and functionality testing), we may expect significant differences.