The original title of this chapter was 'Testing of Implementation'. The reason for the change of name is due to the fact that the word 'testing' is often used in a rather loose fashion, and is therefore open to misinterpretation. Before proceeding, a few definitions are in order.
Verification is the term given to the activity of proving a program correct (Denvir, 1986).
Validation is the demonstration that the latest stage of the design satisfies the requirements imposed by the previous stage, in other words it is the conducting of an experiment to refute the hypothesis that a design meets its requirements. When the design is implemented, the usual form of validation activity is that of testing (based on Denvir, 1986).
Testing is a trial, which is used to demonstrate that the process or structure being tested is not incorrect (my definition). If a test does not show that a requirement has not been met (ie it does not 'fail'), then our confidence in the correctness of a piece of code is increased. Tests can never show that a piece of code is 100% correct, only that it has not been found to contain errors (so far). Alternatively, a test can show that a piece of code is wrong.
Reification is the process of moving from an abstract design to a concrete implementation. An alternative term for this is refinement, but I prefer the former, because as has been pointed out by Jackson, quoted in Jones (1980), the latter is something of a misnomer.
In the initial phase of reification, the formal ADT specification was used in conjunction with the Requirements Document to design the functions. The functions were designed in a rather ad hoc fashion, rather than using a sound methodology, such as 'constructive design', or designing for testability, (both of which are explained later), although in mitigation, there was nothing in the final implementation of the functions that had been specified, but which could not be tested. The reason for this lack of formality was due to user pressure to produce code; to design the functions in a more rigorous manner would have involved a delay longer than the user felt was acceptable.
The verification of the implementation was done by means of tests. The first part of this section discusses how these were selected, the second part explains the exact methods that were used, and the final part reviews the user acceptance tests, which are a separate matter, inasmuch as a product may meet its requirements, and still not be acceptable to the user.
What should we test for? There are many levels of error, some of these are more critical than others. For example, if the specification called for a particular message to be terminated with a full stop, and if in the implementation this full stop were omitted, this would probably be of no consequence (if it were even noticed). Clearly this is of an altogether different nature from situations that cause the system to 'crash'. (A word of warning would not be out of place here. The type of error just mentioned could be disastrous in a different situation—perhaps with an expert system advisor, and the word 'not'). At least two levels of 'correctness' are indicated here, but there could be more, if we so decide.
According to Denvir (1986), since the purpose of tests is to refute the hypothesis that a program is correct, these tests should concentrate on the conditions that are likely to lead to errors, that is, the boundary conditions.
For each function, tests were done to cover all boundary conditions, which was possible because of the limited number of such conditions. An incremental approach to testing was taken [Downs, Clare, Coe 1988], and after each function (or sometimes a group of interdependent functions) had been tested in isolation, it was incorporated into a test program, so that its operation in a realistic setting could be tested. Note that at this stage it was not possible to test all conditions, but recall that the requirements called for development of a set of functions, and whether an author uses these correctly or not is something over which there is only limited control, namely the generation of Author Errors.
A general method that was used to test a function consisted of embedding the function in a loop. For each loop cycle, test data would be obtained via the keyboard, echoed to the screen, and then passed to the function. The action of the function was then observed.
An objection to this method can be raised, for it does not reflect exactly the way in which functions are used in practice. However, since integrated tests were done after this stage of testing, the objection is at least partially refuted. The main reason for choosing this method, rather than using the function to be tested within a test program directly, was that it was far faster (two orders of magnitude) than having to recompile a program for each change made to the parameter values of a function.
The company's project supervisor was unable to attend the user acceptance tests, but his representative, Mr J Metcalf, has seen the operation of the functions contained in the example program (refer to User Manual), and found them acceptable. The User Manual, although not part of the original requirements, was also approved.
Timing considerations were not part of the requirements, but as far as an end-user is concerned, response is instantaneous, and therefore they have been found acceptable.
Chapter eight contains details of the company's revised requirements in the light of the behaviour of the prototype functions, as well as comments on the functions themselves.
The ultimate purpose of verification is to prove that there are no errors in the implementation. The code does what it is designed to do, nothing more, nothing less. This section raises a number of questions about the process of verification, and then briefly mentions some of the current approaches to overcoming the problems of verification.
The user for whom a system is being developed will no doubt have told the developer of the system requirements. A document should have been written which states these, and which will have been 'signed off' by the user. However, the document is written in a natural language, open to misunderstanding or misinterpretation, and for this reason a formal specification should have been developed, which is mathematically provable (as far as self-consistency and complete-ness is concerned, at least). However, in most cases this will not be understandable to the user. How is the developer to be sure that his/her interpretation of the requirements is correct? Conversely, how is the user to know that the specification is correct?
At a lower level, what if the requirements are wrong (ie not consistent), or incomplete, or even inappropriate (maybe there is a better way of doing things)?
Let us assume that there are no problems with either the requirements, or the specification of requirements. Can we then be sure that the design of the code fulfils exactly the specification? And finally, especially with large systems, how can we be sure that the code does exactly what the designer (or designers) thinks it does? Not every pathway can be tested, just as not every combination of inputs and outputs can be checked.
As far as I am aware, there is no guaranteed solution to the problem of interpretation; it is not open to mathematical demonstration. At most, by repeated discussion with a user who is unfamiliar with the specification tools used, a developer can have reasonable confidence that the system design does what the user wants. Similar considerations apply to omissions from the requirements. After all, as many authors have pointed out, the software life cycle is not simply a linear progression, but an iteration. An omission is noted, and rectified; an inconsistency is found, and altered.
Inconsistencies in the requirements should be spotted during the specification stage (assuming that the specifier's interpretation is the same as the user's (inconsistent!) view), since the specification is supposed to be mathematically demonstrable as self-consistent.
Software Requirements Engineering Methodology (SREM) is an approach to program specification based on the idea of 'designing for testability' [Alford 1982, in Downs, Clare, Coe 1988]. Essentially, the idea is to define the requirements in terms of measurable input and output only, which in a way is self-evident—if something cannot be measured (tested) there is not much point in specifying it in the first place.
Denvir suggests that programs should be designed using 'constructive design', that is, designing according to a proof. This is basically a two stage process. First a series of assertions are stated about a program, and then the spaces between assertions are filled with program code which can be proved to fulfil the assertion following them, this assertion being treated as a post-condition, and treating the previous assertion as a pre-condition.
An alternative to trying to reach an error-free state in program testing is that of proving the correctness of programs. The specification has to be written in a formal language, and the logic of the final system is compared to this specification, such that all logical possibilities are covered.
According to Downs, Clare, and Coe (1988) there are three drawbacks with this approach. First there is the problem of knowing that the specification itself is correct. Secondly, proofs of correctness are often long and error prone. Finally, the actual implementation is not being proved, but rather the process logic.
Clearly there are many difficulties involved in the verification of a program, but also, there are a number of partial solutions, which ideally ought to be used where suitable.
Preface | Contents | 1 Introduction | 2 Review | 3 Req. analysis | 4 Req. documents | 5 Specification | 6 Design | 7 Verification | 8 Discussion | 9 PAL manual | Appendix A | Appendix B | Appendix C | Glossary | References | Index