Complete contracts through models

July 2005

While this column often touches on general issues of software engineering and the IT industry, I will this time address a more technical question, which many people ask in connection with Design by Contract. It's the matter of how to formulate contracts that say not just the truth and only the truth, but all the truth, specifying all the relevant properties of a class. We may call them "complete contracts".

It is widely believed that complete contracts are not possible because of the simplicity of the assertion language in Eiffel (boolean expressions plus the "old" notation). Here I intend to show otherwise.

Many contracts that you find today in Eiffel programs and in libraries such as EiffelBase are not complete. More precisely, the preconditions are complete, especially in libraries, since a precondition is a commitment by a routine that it will properly handle any call that satisfies the requirements it states; you'd better then not forget anything you need for the routine to do its job properly. This is usually not a problem. But postconditions and class invariants do not always spell out all the properties ensured or maintained by a routine or class. Sometimes, it's out of laziness; but it may be that the author of the class finds the property too difficult or impossible to express. You will occasionally see an assertion replaced by a comment, as in

          -- No other element of the list has the same value

an admission that the property is important but that the class author couldn't express it formally.

Some people infer from such examples that the assertion language should be extended to support first-order predicate calculus: "for all" and "there exists" operations. But this is not an interesting development, because it loses the simplicity of assertions, the ability to monitor them at run time easily (if we use contracts as a debugging and testing tool), and does not address the core issue anyway: for many properties of interest, first-order is not enough. How do you express

          "The graph has no cycles"

in first-order logic?

A simple solution is available for such examples: use functions. Since a postcondition is a boolean expression, it can include calls to functions, for example to one that, on a graph, returns true if and only if the graph is acyclic. These are functions in the programming sense: routines producing a result. They must of course be of a "pure" nature, not modifying anything in their environment; the notion of Only postcondition clause, introduced by the ECMA standard, helps guarantee this. It is not hard for the author of a GRAPH class to add such a function, making it possible then to write a postcondition such as

            ensure
                       acyclic

which is now just as formal as any other, although it is no longer purely mathematical.

Sometimes one may prefer a clearer separation between the elements used in the implementation and those appearing in the contract. In that case a particularly useful technique is that of models. Let me explain how it works by taking a commonly cited example, that of a "push" operation on a class STACK (actually, STACK [G] with a generic parameter) as in the EiffelBase library. Typically the routine will read

           put (x: G)
                                -- Add `x' on top.
                     require
                                not is_full
                    do
                               ... Implementation (or deferred at the level
                               ... of class STACK)
                     ensure
                               count = old count + 1                 -- Clause 1
                               item = x                                                --
Clause 2
                               end

The precondition is complete. But the postcondition is not: its Clause 1 says that the stack gains an element (`count' is the number of elements) and Clause 2 that the top (as given by `item') is the element x we just pushed. That's the truth, it's only the truth, but it's not all the truth. It doesn't say that the other elements
-- those present before, at position 1 to `old count' -- haven't changed.

The general idea of Design by Contract has always been that -- unlike heavy-duty formal methods -- the method doesn't ask you to specify everything formally to get any benefits. Partial postconditions such as the ones above are already of great use, since they capture the essential elements of a specification. All the usual advantages of contracts are applicable: documentation, testing, disciplined inheritance, reuse support. And indeed the missing property seems the easiest to achieve: who in his right mind would implement a "push" operation by messing around with existing elements?

And yet we can't use this as an excuse if we want to take contracts to their full consequences and benefits. If we are talking about large-scale reuse, with components that can be downloaded from unknown sites, or used through Web Services, arguments of the form "Who in his right mind would...?" can't satisfy us any more, if only because there's a lot of guys nowadays in their wrong minds. A devious implementation could advertise a seemingly nice specification and wreak havoc with the data structure, for the greater enjoyment of hackers. Even in the absence of evil intent, history shows that even the most unlikely mixups will occasionally occur.

Can we make the specification complete in such a case? Extending the contract language might be a possibility, but it's easy to see that it will be very much "ad hoc" and that every new structure might suggest new assertion constructs (See the Object Contraint Language of UML for an example). In their book "Design by Contract, by Example" [2] Richard Mitchell and James McKim propose a technique based on extra routines from the class -- in the spirit of the function `acyclic' above -- but for such examples the specifications are recursive and I have never been able to understand how they can be justified mathematically.

It seems preferable to use the idea of model classes. A model class is an Eiffel class describing purely mathematical concepts; it can then serve to specify the abstract semantics (the contracts) of many other classes.

This technique is easy to illustrate on the STACK example. One of the useful model classes is SEQUENCE. It describes the notion of finite sequence elements, from mathematics. It has feature such as

          - count: s.count, for a sequence s, is the number of
                              items in s.

          - extended: s.extended (x) is another sequence made
                              of the items of s followed by the item x.
                              (As a consequence,
                                        s.extended (x).count = s.count + 1.)

           - is_empty: s.is_empty is true if and only if s has
                              no items (i.e. s.count = 0).

           - head: s.head, for non-empty s, is a sequence made
                              of all the items of s except the last.
                              (As a consequence, s.head.count = s.count - 1).


And so on. A critical requirement on such model classes is that they may contain no commands (other than creation procedures) and no attributes: they are purely "functional", being equipped with queries only. That's why SEQUENCE doesn't have a command `extend' that would modify a sequence to extend it with an item; only a query `extended' that, as specified above, defines a new sequence, without any idea of modifying the original.

In that respect, model classes are pure mathematical concepts, and their text indeed directly reflects a mathematical theory as can be extracted from math textbooks or from specification languages such as Z or B. It is particularly useful, however, to express them in Eiffel syntax, in line with the general notion of seamlessness that lies at the core of the Eiffel method; then we can use these model elements directly in the contracts of normal Eiffel classes, STACK implementations for example. It is also useful to provide an implementation, so that the usual run-time assertion monitoring, as used for debugging and testing, can continue to deliver its benefits -- even though, as you're probably going to suspect, we also want to go beyond testing and start thinking on proofs now that we are moving towards complete contracts. But wait.

Model classes, then, are direct representations of well-defined mathematical concepts, with rigorous semantics, directly usable in the contracts of ordinary Eiffel programs.

How do we take advantage of model classes? It suffices to express the specification of a class such as STACK in terms of a mathematical model of the underlying software objects. We can do this explicitly by introducing, in class STACK [G], a new query

       model: SEQUENCE [G]

Then we simply express the effect of any operation in terms of its effect on the model. Here for example is the new version of the `push' operation; the only change is the addition of a `Clause 0' to the postcondition:

       put (x: G)
                                 -- Add `x' on top.
                require
                          not is_full
               do
                         ... Implementation (or deferred at the level
                         ... of class STACK)
              ensure
                         model ~ old model.extended (x) -- Clause 0
                         count = old count + 1 -- Clause 1
                         item = x
-- Clause 2
                         end

(~, a new notation of the ECMA Eiffel standard, is object equality, with `a ~ b' meaning the same as `equal (a, b)'.)

The new clause simply says that, if the stack is modeled by a sequence with the convention that the last element is the top, then the effect of put (x) on the model is to replace it by the same sequence extended by x.

        -----   -----   -----   -----   -----       -----  
       |     | |     | |     | |     | |     |     |     |    
       |     | |     | |     | |     | |     |     |  x  |    
       |     | |     | |     | |     | |     |     |     |    
        -----   -----   -----   -----   -----       -----  
       <------------------------------------->
                        old model
       <------------------------------------------------->
                        old model.extended (x)


Note that the previous assertion clauses, Clause 1 and Clause 2, still hold -- of course -- but now become consequences of the new Clause 0, thanks to the properties of SEQUENCE (expressed of course by contracts in SEQUENCE itself -- contracts that come from the mathematical theory and that we take for granted, never expecting to have to verify them since they have been proved before us by distinguished mathematicians).

We will call the original clauses "abstract assertions" and the new ones "model assertions". The distinction facilitates our life in the context of inheritance
-- but that's for another article (see reference [1]).

To illustrate further how we specify our classes with model properties, the `pop' operation now becomes
       remove
                              -- Remove top element
                   require
                              not is_empty
                   do
                              ... Implementation (or deferred at the level
                              ... of class STACK)
                   ensure
                              model ~ old model.head             -- Model assertion
                              count = old count - 1                  -- Abstract
assertion
                              end

Here too the old abstract assertion becomes a consequence of the model assertion. It is not hard to see that all the contracts of a class such as STACK and its descendants can be made complete through this technique.

A model library covering the contract needs of many practical cases can be built out of a remarkably small number of concepts. Tobias Widmer has written such a library (see reference[3]) and used it to provide a version of a significant EiffelBase subset with complete contracts. (His work was a master's thesis at ETH; we are currently writing it up as an ordinary paper for wider distribution, but the thesis text already provides a good idea.)

There are many applications of this work, all being pursued. In particular:

       - It will be of great interest to have such a "fully contracted"
       version of the entire EiffelBase. This should be particularly
       attractive to EiffelBase users, who will benefit from having
       a practical library, covering critical daily needs of programming
       with a practical and high-performance implementation,
       and fully specified. The goal is not so far away given
       what Tobias has already done.

       - The application to testing is exciting. Design by Contract
       leads to a whole new way of performing tests, guided
       by assertions; the more extensive the assertions, the
       higher the likelihood of uncovering a bug (the testing
       equivalent of hitting gold).

       - Beyond tests, as already hinted, we really want proofs.
       Here complete contracts are a must, not just a nice addition.

On this exciting topic of proofs, it's useful to add that... But wait. I have to keep some material for later columns.

--Bertrand Meyer

References
----------

[1] Bertrand Meyer: A Framework for Proving Contract-Equipped Classes, in Abstract State Machines 2003, Advances in Theory and Practice, 10th International Workshop, Taormina (Italy), March 3-7, 2003, eds. Egon Börger, Angelo Gargantini, Elvinia Riccobene, Springer-Verlag, 2003, pages 108-125. Available at http://se.ethz.ch/~meyer/publications/proofs/framework.pdf.


[2] Richard Mitchell, James McKim: Design by Contract, By Example, Addison-Wesley, 2001.

[3] Tobias Widmer: Reusable Mathematical Models. Master's thesis, ETH Zurich, available at http://se.inf.ethz.ch/projects/tobias_widmer/.