The end of Configuration Management as we know it

May 2007

Because software engineering research sometimes appears remote or even  esoteric, it is easy to forget that some ideas once considered leading-edge have forever altered, for the better, the way we produce software. Hardly a better example may be found than configuration management. Not so long ago software engineering experts had to hector programmers to use CM tools, and remind them of the dangers of disregarding that advice, as attested by the many examples of software catastrophes due to nothing more subtle than deploying version 5.2 of module A together with version 5.1 of module B, no longer compatible with it. If there's a success story of software engineering principles this is it: the hectoring is no longer necessary, as the industry has understood and internalized the practice of configuration management. Even teams who have never considered process disciplines such as Capability Maturity Models (of which CM is a required component) know to keep all their code and other essential project artifacts in a version management repository; this spread of the CM culture has wiped out an entire class of errors.

Yet I believe that CM as we practice it today will soon go away.

But first here is a short reminder (not a comprehensive definition, there are textbooks for that) of what it's all about. Configuration management involves tool-supported control of the principal artifacts of a project, including (in the software case) the program modules but not limited to them. The mention of tools is essential, as one can hardly conceive of CM without automated help. The classical tools are of two general categories, each with roots going back to now legendary tools developed at Bell Labs in the grand days of Unix.

The first category covers automated production tools that recreate the elements necessary for system deployment, such as executable code, from the elements necessary to produce them automatically, such as source code, by applying the necessary tools, such as a compiler. The seminal idea is to provide a descriptive, non-procedural specification: rather than a script to compile, link and so on -- in other words a program prescribing the exact sequence of production steps -- you specify the dependencies between artifacts and let the CM tool schedule the operations. The epitome of such tools is Stuart Feldman's original "Make", whose current avatars, still very close to the original idea, remain among the most universally used software tools today. Eiffel users, by the way, don't need Make for the Eiffel part of their software, because the concepts are built into EiffelStudio: when you have changed some part of the software (from within EiffelStudio or through some outside tool), EiffelStudio knows exactly, based on its analysis of the dependencies between classes, the minimum subset to recompile.

CM tools of the second category address revision control: the ability to record successive versions of a product or product part, making it possible to revive any earlier version ("what was the text of class XYZ on 24 April 2004?"), for example to find out which change broke something that was previously working. Revision control also allows various people to work on the same product part, such as a class or any other file; each of them will "check out" the part to produce a personal copy, work on that copy, then "check in" the result back into the shared repository. This includes the possibility of creating different variants, or "branches", of a product. At check-in time the CM tool will signal divergent changes made by different people, so that they can be reconciled or arbitrated on the spot. Here the original tool was SCCS, built by Marc Rochkind, but it has been supplanted by Walter Tichy's RCS (Revision Control System) and newer variants such as CVS (built on top of RCS) and Subversion, as well as commercial tools such as Microsoft's SourceSafe. Note that the key idea that made SCCS possible is that the tool does not need to store all successive versions, only their differences, more commonly called "diffs", and then rely on appropriate algorithms to reconstruct older versions on demand by applying these diffs in reverse order. Even with today's greatly relaxed space constraints, this remains essential to make revision control practical: just think of how many terabytes it would take to store the full version history of just one release of EiffelStudio (not to mention Windows or Linux).

I have heard configuration management presented as the one software engineering principle that no serious practitioner can afford to ignore. This is largely true. I must admit, however, that while always advocating revision control -- not just for code, but for any shared documents, including for example course slides -- it has been a bit in the "do as I say" mode since personally I lack the patience to use tool interfaces that I find awkward (although I will refrain from naming products here), and to go through the repeated routine of check-in and check-out. I just want to work on a document, and assume that someone is doing the bookkeeping for me. I used to be uneasy about this view, but a new generation of tools shows that it will be the way of the future.

It is indeed not conceptually necessary to do the check-ins and check-outs. If you are working on a shared product part with the help of a sufficiently sophisticated tool, you should not have to worry about the possibility that someone else is also changing that part. The tool can give you the impression that you are the only one manipulating it, while carefully tracking all that you -- and any others sharing the part with you -- are doing, in particular all the "diffs". In that way you can still benefit from the major advantages of a revision control system: the availability of a full history of the part's evolution, the ability to undo any change and go back to any earlier version, the economy of space permitted by the use of diffs. The reason this approach is realistic is that in practice even when two people are working on the same part it's only rarely that they actually step on each other's toes; most of the time they are working on different pieces of the product, for example different features of a class. So it makes sense to take an optimistic approach (as in transaction processing) which only requires clarification from the users when it detects an actual conflict, and the rest of the time just handles all modifications silently and efficiently.

Two examples of current frameworks show that this approach is practical, at least for traditional documents (rather than programs). One is the Wiki phenomenon (Wikis in general, not just Wikipedia): you edit a page to your heart's content and then just save it. Only occasionally do you run into an edit conflict. The key is that behind this appearance of free modification there is a full CM infrastructure: all changes are logged, essentially as with a revision control system. But while this approach fundamentally relies on configuration management it is an unobtrusive form of CM, which happens automatically, behind the scenes, providing the fundamental advantages of a shared repository with full history and rollback facilities without the hassle of check-out and check-in.

The Wiki mechanism still imposes a reduced form of this hassle ("Edit" is a simplified check-out and "Save" a simplified check-in). In the second example, even these have disappeared. With Google Docs, you can share a document with colleagues, and then edit it concurrently. There is no explicit check-out or check-in. This is a powerful collaboration mechanism that we use extensively at Eiffel Software; it is well adapted to our distributed mode of development.

These two examples are still imperfect and are only initial steps in the evolution, which I believe to be inexorable, towards what we may call "configuration management for the masses" or "invisible configuration management". They are designed for traditional documents and will need to be adapted for programs, which have slightly different requirements. Such adaptations will need to retain some of the mechanisms of old-style CM tools; for example a programmer may want to work on a local version of a class for a while, perhaps without access to the network, then check the class back in. But the general mindset is the right one: don't bother users with CM concepts when they don't need them; just let them work on their products, silently keep track of all their changes to allow going back to any earlier state, and intervene only when detecting a conflict that cannot otherwise be resolved.

It will be exciting to introduce such mechanisms into EiffelStudio and of course we will be happy to talk with any open-source contributor willing to help with this.

For both facets of configuration management -- automated production à la Make, silently handled by EiffelStudio with no user intervention, and automated version management à la SCCS/RCS/CVS -- the new solutions will soon, while keeping the same fundamental techniques in the background, put an end to configuration management as we know it.

-- Bertrand Meyer