Values and references

March 2006

Among the most interesting language adaptations provided by the ECMA standard is a better semantics for expanded types, addressing a central problem of object-oriented programming. These ideas have not been publicized that much yet, but they are worth taking a look at and, better yet, are fully implemented in the forthcoming release of EiffelStudio.

The basic issue was first addressed in Eiffel 3 (I don't know of any other major O-O language having tackled it until recently). It's about how to reconcile an object-oriented type system with the needs of basic, classical types such as INTEGER and CHARACTER, and to the notion of subobject. For many application (programmer-defined) objects it's natural to expect reference semantics: objects are accessible through references, and themselves may contain references to other objects. This convention gives considerable flexibility in building appropriate data structures and is part of the core of object-oriented programming. But it doesn't apply to two important cases:

    - Basic types. If I have an integer variable x, I want the value
    of x to be an integer, say 3, not a reference to some cell
    containing that value. This is not just for performance but
    also for semantic reasons: if y is another such variable, I
    expect the assignment y := x simply to give x the value 3,
    and not to establish any durable relationship between x and
    y, as would be the case if we interpreted this as a reference
    assignment, making x and y refer to the same cell ("dynamic
    aliasing").

    - Subobjects. Sometimes an object refers to another object;
    sometimes it contains another object. This is again not just
    a matter of implementation, but a conceptual difference. A
    house belongs to a street, and it contains a front door; in
    the first case we need a reference (as attested by the
    observation that sharing is possible: several houses may
    belong to the same street) ; in the second case we want a
    "door" subobject in the "house" object.

Eiffel 3 introduced the notion of expanded classes and types to handle these needs. Basic classes such as INTEGER are expanded; and objects can have subobjects simply through the provision of an attribute of expanded type in the corresponding class, as with

    class HOUSE feature
    ...
    front_door: DOOR
    ...
    end

where class DOOR is expanded. In Eiffel 3 you can achieve this effect even with a reference class DOOR (a reference class is the default case, non-expanded) by declaring front_door of type `expanded DOOR'.

It is this scheme that makes it possible to have a fully consistent type system, where even the most basic types such as INTEGER and BOOLEAN are classes, with all the trappings; you may bring them up under EiffelStudio. In contrast, languages like C++ and Java treat basic types as magic, outside of the object-oriented type system, complicating the writing of generic data structures that can hold objects of any types, expanded or reference. Only in the more recently introduced .NET framework has an Eiffel-like (but more restricted) notion of "value type" been introduced.

So what was there to change? Well, we didn't get the semantics of attachment quite right in Eiffel 3.A type system with both expanded and reference variants has to define the meaning of mixed assignments, such as x := y where one is expanded and the other reference, and the corresponding questions for argument passing and comparison. The Eiffel 3 solution raises several issues:

    - The rules are fairly complex; in particular assignment of
    expanded to reference causes cloning (the most reasonable
    answer since we don't want to have references to subobjects);

    - Assignment of reference to expanded is not always permitted
    since we can't know whether y will be of its declared type
    (which should then match the type of x) or, through polymorphism,
    a descendant type with more attributes, giving fields that don't
    fit in x.

    - When the assignment is permitted, it might cause an exception
    because of the possibility that y is void.

    - Finally there is the problem of genericity: how can we combine
    an ARRAY [SOME_REFERENCE_TYPE] and an ARRAY [SOME_EXPANDED_TYPE]?

So the Eiffel 3 techniques have served us well, for more than ten years, but somewhat clumsily,

The ECMA Eiffel update keeps the essentials of the expanded mechanism but dramatically simplifies the construction. The basic idea is now that "expanded" becomes a property not just of a type and the associated entities and expression but of each run-time object. More precisely, an object has either "copy semantics" (that's what you get after `create x' where the type of x is expanded) or "reference semantics". Then the precise effect of an assignment x := y is entirely determined by whether the source object, the one attached to y, is "reference" or "copy". In the first case the operation attaches to x a reference to that object; in the second case it performs a copy. Simple, general and clear!

Considering reference or copy semantics as a property of the object generalizes the basic O-O idea of dynamic binding: the precise effect of y.f depends on the type of the actual object attached to y; so now does the effect of x := y. So it's all consistent.

It turns out that as a result of this semantics we don't need expanded
*types* any more (types such as `expanded DOOR' where DOOR is reference); we only have expanded *classes*, declared as `expanded class DOO' from the start. This is not really restrictive, as it is easy, if DOOR is a reference class, to declare a small class

    expanded class E_DOOR inherit DOOR end

although in fact we have found that this is seldom necessary; most concepts have a clear expanded or reference semantics from the start.

The simplification of the language description resulting from these innovations is remarkable. Assignment and argument passing rules are now straightforward; and a number of restrictions on how you can mix reference and expanded types disappear. An indirect consequence is the "Class ANY Principle": not only are all classes descendants of ANY, all types conform to ANY. This is a useful property to rely on, in particular for handling generic data structures.

We feel that the expanded mechanism as it now exists achieves the full integration of copy and reference semantics, and of the traditional basic types, into a pure object-oriented framework.

-- Bertrand Meyer