The end of Configuration Management as we know it
May 2007
Because software engineering research sometimes appears remote or
even esoteric, it is easy to forget that some ideas once
considered leading-edge have forever altered, for the better, the way we
produce software. Hardly a better example may be found than
configuration management. Not so long ago software engineering experts
had to hector programmers to use CM tools, and remind them of the
dangers of disregarding that advice, as attested by the many examples of
software catastrophes due to nothing more subtle than deploying version
5.2 of module A together with version 5.1 of module B, no longer
compatible with it. If there's a success story of software engineering
principles this is it: the hectoring is no longer necessary, as the
industry has understood and internalized the practice of configuration
management. Even teams who have never considered process disciplines
such as Capability Maturity Models (of which CM is a required component)
know to keep all their code and other essential project artifacts in a
version management repository; this spread of the CM culture has wiped
out an entire class of errors.
Yet I believe that CM as we practice it today will soon go away.
But first here is a short reminder (not a comprehensive definition,
there are textbooks for that) of what it's all about. Configuration
management involves tool-supported control of the principal artifacts of
a project, including (in the software case) the program modules but not
limited to them. The mention of tools is essential, as one can hardly
conceive of CM without automated help. The classical tools are of two
general categories, each with roots going back to now legendary tools
developed at Bell Labs in the grand days of Unix.
The first category covers automated production tools that recreate the
elements necessary for system deployment, such as executable code, from
the elements necessary to produce them automatically, such as source
code, by applying the necessary tools, such as a compiler. The seminal
idea is to provide a descriptive, non-procedural specification: rather
than a script to compile, link and so on -- in other words a program
prescribing the exact sequence of production steps -- you specify the
dependencies between artifacts and let the CM tool schedule the
operations. The epitome of such tools is Stuart Feldman's original
"Make", whose current avatars, still very close to the original idea,
remain among the most universally used software tools today. Eiffel
users, by the way, don't need Make for the Eiffel part of their
software, because the concepts are built into EiffelStudio: when you
have changed some part of the software (from within EiffelStudio or
through some outside tool), EiffelStudio knows exactly, based on its
analysis of the dependencies between classes, the minimum subset to
recompile.
CM tools of the second category address revision control: the ability to
record successive versions of a product or product part, making it
possible to revive any earlier version ("what was the text of class XYZ
on 24 April 2004?"), for example to find out which change broke
something that was previously working. Revision control also allows
various people to work on the same product part, such as a class or any
other file; each of them will "check out" the part to produce a personal
copy, work on that copy, then "check in" the result back into the shared
repository. This includes the possibility of creating different
variants, or "branches", of a product. At check-in time the CM tool will
signal divergent changes made by different people, so that they can be
reconciled or arbitrated on the spot. Here the original tool was SCCS,
built by Marc Rochkind, but it has been supplanted by Walter Tichy's RCS
(Revision Control System) and newer variants such as CVS (built on top
of RCS) and Subversion, as well as commercial tools such as Microsoft's
SourceSafe. Note that the key idea that made SCCS possible is that the
tool does not need to store all successive versions, only their
differences, more commonly called "diffs", and then rely on appropriate
algorithms to reconstruct older versions on demand by applying these
diffs in reverse order. Even with today's greatly relaxed space
constraints, this remains essential to make revision control practical:
just think of how many terabytes it would take to store the full version
history of just one release of EiffelStudio (not to mention Windows or
Linux).
I have heard configuration management presented as the one software
engineering principle that no serious practitioner can afford to ignore.
This is largely true. I must admit, however, that while always
advocating revision control -- not just for code, but for any shared
documents, including for example course slides -- it has been a bit in
the "do as I say" mode since personally I lack the patience to use tool
interfaces that I find awkward (although I will refrain from naming
products here), and to go through the repeated routine of check-in and
check-out. I just want to work on a document, and assume that someone is
doing the bookkeeping for me. I used to be uneasy about this view, but a
new generation of tools shows that it will be the way of the future.
It is indeed not conceptually necessary to do the check-ins and
check-outs. If you are working on a shared product part with the help of
a sufficiently sophisticated tool, you should not have to worry about
the possibility that someone else is also changing that part. The tool
can give you the impression that you are the only one manipulating it,
while carefully tracking all that you -- and any others sharing the part
with you -- are doing, in particular all the "diffs". In that way you
can still benefit from the major advantages of a revision control
system: the availability of a full history of the part's evolution, the
ability to undo any change and go back to any earlier version, the
economy of space permitted by the use of diffs. The reason this approach
is realistic is that in practice even when two people are working on the
same part it's only rarely that they actually step on each other's toes;
most of the time they are working on different pieces of the product,
for example different features of a class. So it makes sense to take an
optimistic approach (as in transaction processing) which only requires
clarification from the users when it detects an actual conflict, and the
rest of the time just handles all modifications silently and
efficiently.
Two examples of current frameworks show that this approach is practical,
at least for traditional documents (rather than programs). One is the
Wiki phenomenon (Wikis in general, not just Wikipedia): you edit a page
to your heart's content and then just save it. Only occasionally do you
run into an edit conflict. The key is that behind this appearance of
free modification there is a full CM infrastructure: all changes are
logged, essentially as with a revision control system. But while this
approach fundamentally relies on configuration management it is an
unobtrusive form of CM, which happens automatically, behind the scenes,
providing the fundamental advantages of a shared repository with full
history and rollback facilities without the hassle of check-out and
check-in.
The Wiki mechanism still imposes a reduced form of this hassle ("Edit"
is a simplified check-out and "Save" a simplified check-in). In the
second example, even these have disappeared. With Google Docs, you can
share a document with colleagues, and then edit it concurrently. There
is no explicit check-out or check-in. This is a powerful collaboration
mechanism that we use extensively at Eiffel Software; it is well adapted
to our distributed mode of development.
These two examples are still imperfect and are only initial steps in the
evolution, which I believe to be inexorable, towards what we may call
"configuration management for the masses" or "invisible configuration
management". They are designed for traditional documents and will need
to be adapted for programs, which have slightly different requirements.
Such adaptations will need to retain some of the mechanisms of old-style
CM tools; for example a programmer may want to work on a local version
of a class for a while, perhaps without access to the network, then
check the class back in. But the general mindset is the right one: don't
bother users with CM concepts when they don't need them; just let them
work on their products, silently keep track of all their changes to
allow going back to any earlier state, and intervene only when detecting
a conflict that cannot otherwise be resolved.
It will be exciting to introduce such mechanisms into EiffelStudio and
of course we will be happy to talk with any open-source contributor
willing to help with this.
For both facets of configuration management -- automated production à la
Make, silently handled by EiffelStudio with no user intervention, and
automated version management à la SCCS/RCS/CVS -- the new solutions will
soon, while keeping the same fundamental techniques in the background,
put an end to configuration management as we know it.
-- Bertrand Meyer |