Component = Contract + Implementation + Proof Obligation
August 2005
If you take a quick look at it, this EiffelWorld column will seem rather
theoretical. In fact it's not; I'd like to explain what (in my
view) the most basic notion of computer science, "program", means.
Since we all know by now that a software system doesn't do one thing but
provides an array of useful mechanisms -- in Eiffel, through
classes and their features -- we'll deal right away with a more general notion:
"component". In pre-O-O programming a component is a program;
in the object world it is a feature for the present discussion, but the same
ideas apply to classes.
This column usually broaches more down-to-earth topics of software engineering;
I hope that once you get past the surprise of the
math in the present installment you understand its purpose: to highlight the
theoretical challenge of responsible software
development, and explain what approaches are available for an effective
programming process.
IS THIS EVEN LEGAL?
===================
You'll see a number of strange symbols throughout, but unless you believe math
is inappropriate in a newsletter I guarantee that this column is DMCA-compliant,
intended for the full family, and completely child-safe; no need to turn on your
decency filters!
I do request, however, a bit of sympathy: EiffelWorld is distributed in plain
text for all kinds of good reasons, and ASCII is not too conducive to set theory
operators. For example the best I can do to suggest the "is a subset of"
operator is |=; that's not very good. This and other operators will be
introduced as we encounter them. Please bear with me, especially those of you in
places where characters that seem legitimate on a US keyboard sometimes display
bizarrely.
We'll try to do better in the Web version (typically updated a while after the
publication of EiffelWorld - this time it might be in PDF). And of course the
whole thing will appear in technical articles typeset properly down to the last
pixel.
WOULD YOU HAVE DEFINED IT THAT WAY?
===================================
OK. So what's a program or, in our new world, a component? We'll assume a set S
representing possible computation states. Then:
----------------------------------
Definition: Program (or Component)
A Program on S is the combination of:
- A subset Pre of States, the "precondition",
- Two binary relations between states (members of the
set States <-> States of such relations):
i (the "implementation"), and post (the "postcondition"),
such that:
[T1] Pre |= domain (i) |-| [domain (i - post)]'
----------------------------------
I know, this definition looks obscure and arbitrary, but just wait a bit for a
form that will directly relate to the ordinary view of a program. Perhaps you
can read it again when you have finished the article (if you finish it -- I can
only hope). First, the conventions; this is where the ASCII mess kicks in:
|= is "is a subset of": in other words, [T1]
states that every
element of Pre is an element of the right-hand
side.
|-| is my feeble attempt to draw an intersection
symbol: the
right-hand side contains all the elements in both
domain (i)
and [domain (i-post)]'.
domain (r), for a relation r, is the set of
elements that
appear first in a pair: "Paris" is in the domain
of both of the
relations "is a capital of country" (because the
pair "[Paris,
France]" is in the relation) and "is first name
of celebrity"
(here I am not listing the full pair because of
those decency
filters), but is not in the domain of "is the
name of a planned
Olympic city".
i - post, for two relations i and post, is their
difference: the set
of pairs that are in i but not in Post.
X' is the complement of a subset X: the set of
elements, here
states, not in X.
Square brackets are just for grouping, so that
[domain (i - post)]'
is the set of states other than those which
participate
(as first item of a pair) in i but don't
participate in post.
Some fundamental terminology:
---------------------------------
Definitions:
contract, implements
If Pre, i and
post constitute a program:
[D2] Pre and post together are the "contract"
of the program.
[D3] i "implements" the contract.
---------------------------------
Now let's see how our basic definition relates to a more operational
view. It's through the following theorem:
--------------------------------------------
Theorem: Partial correctness and termination
Pre, i and post constitute a program if and only
if they
satisfy the following two properties:
[T4] i \ Pre
|= post -- Partial correctness
[T5] Pre |=
domain (i) -- Termination
--------------------------------------------
There's one more notation here:
i \ Pre denotes the restriction of i to Pre: the
relation
obtained from i by only retaining pairs whose
first element is
in Pre.
I'll skip proofs of this and other theorems (doing them is a good opportunity to
refresh your high-school set theory).
Thanks to the theorem, [T4] and [T5] can also serve as a definition of the
notion of program. [T4] expresses that i, the implementation, is compatible with
the postcondition post for any input satisfying the precondition Pre; note that
both i and post are relations - sets of state pairs - whereas Pre is a set of
states. [T5] expresses that i will always produce at least one result for any
element satisfying the precondition. The combination of these properties is
called total correctness. The definition of "program" assumes total correctness;
it is indeed useless to write programs that either do not terminate or do not
satisfy their contract.
We may, or not, limit f to delivering just one result for every input state:
---------------------------------
Definition: Deterministic program
A program is deterministic if its implementation
i is a
function.
---------------------------------
This framework yields simple definitions for some fundamental notions:
---------------------------------
Definitions: Weakest precondition, strongest
postcondition,
Most Abstract Implementation
Given two of Pre, i and post, respectively an
arbitrary subset
of S and two arbitrary relations in S <-> S:
[D6] i \ Pre, written Pre sp i, is the "strongest
postcondition" of Pre and i.
[D7] [i |-| post]-1 (S), written i wp post, is the "weakest
precondition" of i and post.
[D8] post \ Pre is the "Most Abstract Implementation" of
the contract defined by Pre and post.
---------------------------------
Notations used here (the last ones we'll need):
r-1 is the inverse of a relation r: the relation
made of the same
pairs, but all reversed.
r (A), for a set A of states, is the image of A
by r: the set of
states y such that the pair [x, y] is in r for
some element x of A.
(Reminder) |-| is intersection.
The following theorems capture the importance of these definitions:
----------------------------
Theorems: contract properties
[T9] Pre sp i is the smallest relation post such
that
Pre, i and post define a program.
[T10] i wp post is the largest set Pre such that
Pre, i and
post define a program.
[T9] The Most Abstract Implementation of a
contract defined
by Pre and post is the largest relation i such that Pre,
i and post define a program.
----------------------------
Another theorem tells us that we are entitled to use the term "Implementation"
in "Most Abstract Implementation":
------------------------------------
Theorems: Most Abstract Implementation
[T10] The Most Abstract Implementation of a
contract
implements that contract.
------------------------------------
("Implements" was defined above: [D3].)
The following notion helps understand this concept better:
----------------------
Definition: Refinement
A program of precondition Pre1 and implementation
i1 refines
a program of precondition Pre and implementation
i if it
satisfies the following conditions:
[T11] Pre |= Pre1
[T12] i1 \ Pre |= i
----------------------
The postcondition does not intervene in this definition. [T12] indicates that
the refining program will give compatible values for any state in Pre.
As an example of refinement, consider a program with a non-deterministic
implementation i. We can obtain a deterministic implementation of the same
contract by removing pairs from i: for every x such that i contains two or more
pairs starting with x, we keep only one of them (chosen in any way we like).
This new program, with a deterministic implementation, is a refinement of the
original.
We note three important and straightforward properties:
------------------------------
Theorem: Refinement properties
[T13] Refinement is an order relation.
[T14] Any refinement of a program
implementing a given contract
also implements that contract.
[T15] Any program implementing a given
contract is a refinement
of its Most Abstract Implementation.
------------------------------
[T15] is the justification for the name "Most Abstract Implementation".
PROGRAMMING AND PROGRAMMING LANGUAGES
=====================================
I hope you appreciate the simplicity of the framework, based on elementary
mathematical concepts (and that you are still with me). Let's look at what they
mean from a software engineering perspective.
To program is to write programs satisfying the above definition, [T2] or
[T4]/[T5]. The contract represents the goal of the program, as advertised to its
users; the implementation represents theoperations that will run on the
computer. The definition ensures that the implementation matches the contract.
If the contract is given, programming consists of solving an equation, [T2], of
which i is the unknown. The equation has a trivial solution:
i = post \ Pre
usually non-deterministic. The reason why that solution
is uninteresting, and programming stands as an interesting problem, is
the practical difference between contract and implementation:
- Pre and post, describing a problem to be
solved, can make use
of arbitrary mathematical mechanisms.
- As to i, it represents a computation to be
carried out by a computer:
to express it, we limit ourselves to a set of
fixed mechanisms.
Such a set of mechanisms has a name: Programming language.
--------------------------------
Definition: Programming language
A programming language is a set of relations; it
is deterministic
if all these relations are functions.
In practice, a programming language is defined in
terms of:
[D16] The specification of the set S of
possible states.
[D17] A set of basic relations (functions
in the
deterministic case).
[D18] A set of operations for obtaining new
relations
(resp. functions) from known ones.
--------------------------------
The idea is that the basic relations [D17] and the operations [D18] can all be
executed by computers using available technology.
APPROACHES TO PROGRAMMING
=========================
The almost universal approach to programming today outside of the Eiffel
community ignores the Pre and post elements of the definition, concentrating
only on building implementations i from a programming language with the hope
that in some informal sense they will match the corresponding user needs. We may
call this the "hacking approach"; it has little in principle to commend itself.
At the other extreme, a "refinement approach" has emerged. If we set out to
implement a given contract, theorem [T9] tells us that we may use the contract
itself - specifically, post \ Pre - as its own firstimplementation: it is indeed
the "Most Abstract Implementation", as defined above, for the contract. We may
define refinement as the process of starting with this first version and
repeatedly taking advantage of theorems [T13] and [T14] to choose a subset of
the previous implementation, and extend Pre if convenient, until reaching an
implementation that belongs to the programming language - that is to say, is a
combination of its basic relations through its operations.
This approach is inappropriate in practice since it neglects three important
concerns of software development as an engineering activity:
- Hindsight:
the specification is seldom known completely in advance. This is not necessarily
a mark of incompetent software engineering, but a result of the phenomenon,
known to anyone who has developed significant software, that the very process of
implementation suggests new elements of specification -"esprit de l'escalier" as
discussed in "Object-Oriented Software Construction".
-
Extendibility: even if the specification is originally clear, it usually changes
as a project progresses and after initial deliveries. If a change affects a
property that was used in an early step of the refinement process, it becomes
necessary to redo much of the work and rearchitecture the system, even if to an
outsider
the change is conceptually small. This was one of the problemsof older top-down
approaches to software construction, which object technology set out to address.
-
Reusability: Few projects can afford to develop all their software from scratch;
reuse is one of the most productive practices of modern software development. A
top-down refinement process, starting from the specification of a problem to get
an implementation, does not easily take into account implementations produced
previously for variants or subsets of the problem.
While the "hacking" approach ignores the contract; the refinement approach fails
to recognize the importance of implementations. In the view proposed here,
contracts and implementations are equally important parts of the notion of
contract.
An effective software development process must accommodate a bottom-up element,
supporting reuse. Such a process should reject both the impracticality of the
refinement approach and the lack of rigor of the "hacking" approach. Instead one
should take programming as what the name suggests: building programs; not just
specifications to be refined into implementations, but also not just
implementations.
From the preceding discussion a program is defined by the combination of three
elements:
1. A specification, the contract Pre, post.
2. An implementation, i.
3. A proof obligation; the requirement to prove
that the
implementation meets the contract, as stated by
[T2].
This provides a definition of the general process:
-----------------------
Definition: Programming
Programming is the process of devising
interesting contract-
implementation pairs and discharging the
associated proof
obligations.
-----------------------
The starting point for any step in the process may indifferently be:
- A contract element, for which we have to devise a satisfactory implementation;
this is the top-down style.
- An existing implementation element with no contract or (as is often the case
in current Eiffel libraries) an incomplete contract; this is a common situation
in practice and there is nothing wrong with it as long as we eventually provide
a proper contract and discharge the proof obligation. This is the bottom-up
style.
This approach seems to yield the necessary flexibility while accommodating the
need for rigor and proofs. We don't yet do automated formal proofs in the
practice of Eiffel software development, but Eiffel research will increasingly
make this goal realistic; in the meantime we can apply to program and
components, with more or less mathematical rigor as circumstances permit, the
slogan that serves as the title for this article and embodies, I think, much of
what software development is about:
--------------------------------------------------------
Component = Contract + Implementation + Proof
obligation
--------------------------------------------------------
--Bertrand Meyer
|