Component = Contract + Implementation + Proof Obligation

August 2005

If you take a quick look at it, this EiffelWorld column will seem rather theoretical. In fact it's not; I'd like to explain what (in my
view) the most basic notion of computer science, "program", means.

Since we all know by now that a software system doesn't do one thing but provides an array of useful mechanisms -- in Eiffel, through
classes and their features -- we'll deal right away with a more general notion: "component". In pre-O-O programming a component is a program;
in the object world it is a feature for the present discussion, but the same ideas apply to classes.

This column usually broaches more down-to-earth topics of software engineering; I hope that once you get past the surprise of the
math in the present installment you understand its purpose: to highlight the theoretical challenge of responsible software
development, and explain what approaches are available for an effective programming process.


IS THIS EVEN LEGAL?
===================

You'll see a number of strange symbols throughout, but unless you believe math is inappropriate in a newsletter I guarantee that this column is DMCA-compliant, intended for the full family, and completely child-safe; no need to turn on your decency filters!

I do request, however, a bit of sympathy: EiffelWorld is distributed in plain text for all kinds of good reasons, and ASCII is not too conducive to set theory operators. For example the best I can do to suggest the "is a subset of" operator is |=; that's not very good. This and other operators will be introduced as we encounter them. Please bear with me, especially those of you in places where characters that seem legitimate on a US keyboard sometimes display bizarrely.

We'll try to do better in the Web version (typically updated a while after the publication of EiffelWorld - this time it might be in PDF). And of course the whole thing will appear in technical articles typeset properly down to the last pixel.


WOULD YOU HAVE DEFINED IT THAT WAY?
===================================

OK. So what's a program or, in our new world, a component? We'll assume a set S representing possible computation states. Then:

      ----------------------------------
      Definition: Program (or Component)

      A Program on S is the combination of:

                - A subset Pre of States, the "precondition",

                - Two binary relations between states (members of the
                set States <-> States of such relations):
                i (the "implementation"), and post (the "postcondition"),

      such that:

                [T1] Pre |= domain (i) |-| [domain (i - post)]'
      ----------------------------------

I know, this definition looks obscure and arbitrary, but just wait a bit for a form that will directly relate to the ordinary view of a program. Perhaps you can read it again when you have finished the article (if you finish it -- I can only hope). First, the conventions; this is where the ASCII mess kicks in:

      |= is "is a subset of": in other words, [T1] states that every
      element of Pre is an element of the right-hand side.

      |-| is my feeble attempt to draw an intersection symbol: the
      right-hand side contains all the elements in both domain (i)
      and [domain (i-post)]'.

      domain (r), for a relation r, is the set of elements that
      appear first in a pair: "Paris" is in the domain of both of the
      relations "is a capital of country" (because the pair "[Paris,
      France]" is in the relation) and "is first name of celebrity"
      (here I am not listing the full pair because of those decency
      filters), but is not in the domain of "is the name of a planned
      Olympic city".

      i - post, for two relations i and post, is their difference: the set
      of pairs that are in i but not in Post.

      X' is the complement of a subset X: the set of elements, here
      states, not in X.

      Square brackets are just for grouping, so that [domain (i - post)]'
      is the set of states other than those which participate
      (as first item of a pair) in i but don't participate in post.

Some fundamental terminology:

            ---------------------------------
            Definitions: contract, implements

            If Pre, i and post constitute a program:

                    [D2] Pre and post together are the "contract"
                    of the program.

                    [D3] i "implements" the contract.
            ---------------------------------

Now let's see how our basic definition relates to a more operational
view. It's through the following theorem:

      --------------------------------------------
      Theorem: Partial correctness and termination

      Pre, i and post constitute a program if and only if they
      satisfy the following two properties:

            [T4] i \ Pre |= post -- Partial correctness

            [T5] Pre |= domain (i) -- Termination

      --------------------------------------------


There's one more notation here:

      i \ Pre denotes the restriction of i to Pre: the relation
      obtained from i by only retaining pairs whose first element is
      in Pre.

I'll skip proofs of this and other theorems (doing them is a good opportunity to refresh your high-school set theory).

Thanks to the theorem, [T4] and [T5] can also serve as a definition of the notion of program. [T4] expresses that i, the implementation, is compatible with the postcondition post for any input satisfying the precondition Pre; note that both i and post are relations - sets of state pairs - whereas Pre is a set of states. [T5] expresses that i will always produce at least one result for any element satisfying the precondition. The combination of these properties is called total correctness. The definition of "program" assumes total correctness; it is indeed useless to write programs that either do not terminate or do not satisfy their contract.

We may, or not, limit f to delivering just one result for every input state:

      ---------------------------------
      Definition: Deterministic program

      A program is deterministic if its implementation i is a
      function.
      ---------------------------------

This framework yields simple definitions for some fundamental notions:

      ---------------------------------
      Definitions: Weakest precondition, strongest postcondition,
      Most Abstract Implementation

      Given two of Pre, i and post, respectively an arbitrary subset
      of S and two arbitrary relations in S <-> S:

               [D6] i \ Pre, written Pre sp i, is the "strongest
               postcondition" of Pre and i.

               [D7] [i |-| post]-1 (S), written i wp post, is the "weakest
               precondition" of i and post.

               [D8] post \ Pre is the "Most Abstract Implementation" of
               the contract defined by Pre and post.
      ---------------------------------

Notations used here (the last ones we'll need):


      r-1 is the inverse of a relation r: the relation made of the same
      pairs, but all reversed.

      r (A), for a set A of states, is the image of A by r: the set of
      states y such that the pair [x, y] is in r for some element x of A.

      (Reminder) |-| is intersection.

The following theorems capture the importance of these definitions:

      ----------------------------­
      Theorems: contract properties

      [T9] Pre sp i is the smallest relation post such that
                    Pre, i and post define a program.

      [T10] i wp post is the largest set Pre such that Pre, i and
                     post define a program.

      [T9] The Most Abstract Implementation of a contract defined
                    by Pre and post is the largest relation i such that Pre,
                    i and post define a program.
      ----------------------------­

Another theorem tells us that we are entitled to use the term "Implementation" in "Most Abstract Implementation":

      ----------------------------­--------­
      Theorems: Most Abstract Implementation
      [T10] The Most Abstract Implementation of a contract
      implements that contract.
      ----------------------------­--------­

("Implements" was defined above: [D3].)

The following notion helps understand this concept better:

      ----------------------
      Definition: Refinement

      A program of precondition Pre1 and implementation i1 refines
      a program of precondition Pre and implementation i if it
      satisfies the following conditions:

      [T11] Pre |= Pre1

      [T12] i1 \ Pre |= i
      ----------------------

The postcondition does not intervene in this definition. [T12] indicates that the refining program will give compatible values for any state in Pre.

As an example of refinement, consider a program with a non-deterministic implementation i. We can obtain a deterministic implementation of the same contract by removing pairs from i: for every x such that i contains two or more pairs starting with x, we keep only one of them (chosen in any way we like). This new program, with a deterministic implementation, is a refinement of the original.

We note three important and straightforward properties:

      ------------------------------
      Theorem: Refinement properties

      [T13]  Refinement is an order relation.

      [T14]  Any refinement of a program implementing a given contract
      also implements that contract.

      [T15]  Any program implementing a given contract is a refinement
      of its Most Abstract Implementation.
      ------------------------------

[T15] is the justification for the name "Most Abstract Implementation".


PROGRAMMING AND PROGRAMMING LANGUAGES
=====================================

I hope you appreciate the simplicity of the framework, based on elementary mathematical concepts (and that you are still with me). Let's look at what they mean from a software engineering perspective.

To program is to write programs satisfying the above definition, [T2] or [T4]/[T5]. The contract represents the goal of the program, as advertised to its users; the implementation represents theoperations that will run on the computer. The definition ensures that the implementation matches the contract.

If the contract is given, programming consists of solving an equation, [T2], of which i is the unknown. The equation has a trivial solution:

      i = post \ Pre

usually non-deterministic. The reason why that solution
is uninteresting, and programming stands as an interesting problem, is
the practical difference between contract and implementation:

      - Pre and post, describing a problem to be solved, can make use
      of arbitrary mathematical mechanisms.

      - As to i, it represents a computation to be carried out by a computer:
      to express it, we limit ourselves to a set of fixed mechanisms.

Such a set of mechanisms has a name: Programming language.

      --------------------------------
      Definition: Programming language

      A programming language is a set of relations; it is deterministic
      if all these relations are functions.

      In practice, a programming language is defined in terms of:

      [D16]  The specification of the set S of possible states.

      [D17]  A set of basic relations (functions in the
      deterministic case).

      [D18] A set of operations for obtaining new relations
      (resp. functions) from known ones.
      --------------------------------

The idea is that the basic relations [D17] and the operations [D18] can all be executed by computers using available technology.


APPROACHES TO PROGRAMMING
=========================

The almost universal approach to programming today outside of the Eiffel community ignores the Pre and post elements of the definition, concentrating only on building implementations i from a programming language with the hope that in some informal sense they will match the corresponding user needs. We may call this the "hacking approach"; it has little in principle to commend itself.

At the other extreme, a "refinement approach" has emerged. If we set out to implement a given contract, theorem [T9] tells us that we may use the contract itself - specifically, post \ Pre - as its own firstimplementation: it is indeed the "Most Abstract Implementation", as defined above, for the contract. We may define refinement as the process of starting with this first version and repeatedly taking advantage of theorems [T13] and [T14] to choose a subset of the previous implementation, and extend Pre if convenient, until reaching an implementation that belongs to the programming language - that is to say, is a combination of its basic relations through its operations.

This approach is inappropriate in practice since it neglects three important concerns of software development as an engineering activity:

            - Hindsight: the specification is seldom known completely in advance. This is not necessarily a mark of incompetent software engineering, but a result of the phenomenon, known to anyone who has developed significant software, that the very process of implementation suggests new elements of specification -"esprit de l'escalier" as discussed in "Object-Oriented Software Construction".

            - Extendibility: even if the specification is originally clear, it usually changes as a project progresses and after initial deliveries. If a change affects a property that was used in an early step of the refinement process, it becomes necessary to redo much of the work and rearchitecture the system, even if to an outsider
the change is conceptually small. This was one of the problemsof older top-down approaches to software construction, which object technology set out to address.

            - Reusability: Few projects can afford to develop all their software from scratch; reuse is one of the most productive practices of modern software development. A top-down refinement process, starting from the specification of a problem to get an implementation, does not easily take into account implementations produced previously for variants or subsets of the problem.

While the "hacking" approach ignores the contract; the refinement approach fails to recognize the importance of implementations. In the view proposed here, contracts and implementations are equally important parts of the notion of contract.

An effective software development process must accommodate a bottom-up element, supporting reuse. Such a process should reject both the impracticality of the refinement approach and the lack of rigor of the "hacking" approach. Instead one should take programming as what the name suggests: building programs; not just specifications to be refined into implementations, but also not just implementations.

From the preceding discussion a program is defined by the combination of three elements:

      1. A specification, the contract Pre, post.

      2. An implementation, i.

      3. A proof obligation; the requirement to prove that the
      implementation meets the contract, as stated by [T2].

This provides a definition of the general process:

      -----------------------
      Definition: Programming

      Programming is the process of devising interesting contract-
      implementation pairs and discharging the associated proof
      obligations.
      -----------------------


The starting point for any step in the process may indifferently be:

- A contract element, for which we have to devise a satisfactory implementation; this is the top-down style.

- An existing implementation element with no contract or (as is often the case in current Eiffel libraries) an incomplete contract; this is a common situation in practice and there is nothing wrong with it as long as we eventually provide a proper contract and discharge the proof obligation. This is the bottom-up style.

This approach seems to yield the necessary flexibility while accommodating the need for rigor and proofs. We don't yet do automated formal proofs in the practice of Eiffel software development, but Eiffel research will increasingly make this goal realistic; in the meantime we can apply to program and components, with more or less mathematical rigor as circumstances permit, the slogan that serves as the title for this article and embodies, I think, much of what software development is about:

      --------------------------------------------------------
      Component = Contract + Implementation + Proof obligation
      --------------------------------------------------------

--Bertrand Meyer