Values and references
March 2006
Among the most interesting language adaptations provided by the ECMA standard is
a better semantics for expanded types, addressing a central problem of
object-oriented programming. These ideas have not been publicized that much yet,
but they are worth taking a look at and, better yet, are fully implemented in
the forthcoming release of EiffelStudio.
The basic issue was first addressed in Eiffel 3 (I don't know of any other major
O-O language having tackled it until recently). It's about how to reconcile an
object-oriented type system with the needs of basic, classical types such as
INTEGER and CHARACTER, and to the notion of subobject. For many application
(programmer-defined) objects it's natural to expect reference semantics: objects
are accessible through references, and themselves may contain references to
other objects. This convention gives considerable flexibility in building
appropriate data structures and is part of the core of object-oriented
programming. But it doesn't apply to two important cases:
- Basic types. If I have an integer variable x, I want the value
of x to be an integer, say 3, not a reference to some cell
containing that value. This is not just for performance but
also for semantic reasons: if y is another such variable, I
expect the assignment y := x simply to give x the value 3,
and not to establish any durable relationship between x and
y, as would be the case if we interpreted this as a reference
assignment, making x and y refer to the same cell ("dynamic
aliasing").
- Subobjects. Sometimes an object refers to another object;
sometimes it contains another object. This is again not just
a matter of implementation, but a conceptual difference. A
house belongs to a street, and it contains a front door; in
the first case we need a reference (as attested by the
observation that sharing is possible: several houses may
belong to the same street) ; in the second case we want a
"door" subobject in the "house" object.
Eiffel 3 introduced the notion of expanded classes and types to handle these
needs. Basic classes such as INTEGER are expanded; and objects can have
subobjects simply through the provision of an attribute of expanded type in the
corresponding class, as with
where class DOOR is expanded. In Eiffel 3 you can achieve this effect even with
a reference class DOOR (a reference class is the default case, non-expanded) by
declaring front_door of type `expanded DOOR'.
It is this scheme that makes it possible to have a fully consistent type system,
where even the most basic types such as INTEGER and BOOLEAN are classes, with
all the trappings; you may bring them up under EiffelStudio. In contrast,
languages like C++ and Java treat basic types as magic, outside of the
object-oriented type system, complicating the writing of generic data structures
that can hold objects of any types, expanded or reference. Only in the more
recently introduced .NET framework has an Eiffel-like (but more restricted)
notion of "value type" been introduced.
So what was there to change? Well, we didn't get the semantics of attachment
quite right in Eiffel 3.A type system with both expanded and reference variants
has to define the meaning of mixed assignments, such as x := y where one is
expanded and the other reference, and the corresponding questions for argument
passing and comparison. The Eiffel 3 solution raises several issues:
- The rules are fairly complex; in particular assignment of
expanded to reference causes cloning (the most reasonable
answer since we don't want to have references to subobjects);
- Assignment of reference to expanded is not always permitted
since we can't know whether y will be of its declared type
(which should then match the type of x) or, through polymorphism,
a descendant type with more attributes, giving fields that don't
fit in x.
- When the assignment is permitted, it might cause an exception
because of the possibility that y is void.
- Finally there is the problem of genericity: how can we combine
an ARRAY [SOME_REFERENCE_TYPE] and an ARRAY [SOME_EXPANDED_TYPE]?
So the Eiffel 3 techniques have served us well, for more than ten years, but
somewhat clumsily,
The ECMA Eiffel update keeps the essentials of the expanded mechanism but
dramatically simplifies the construction. The basic idea is now that "expanded"
becomes a property not just of a type and the associated entities and expression
but of each run-time object. More precisely, an object has either "copy
semantics" (that's what you get after `create x' where the type of x is
expanded) or "reference semantics". Then the precise effect of an assignment x
:= y is entirely determined by whether the source object, the one attached to y,
is "reference" or "copy". In the first case the operation attaches to x a
reference to that object; in the second case it performs a copy. Simple, general
and clear!
Considering reference or copy semantics as a property of the object generalizes
the basic O-O idea of dynamic binding: the precise effect of y.f depends on the
type of the actual object attached to y; so now does the effect of x := y. So
it's all consistent.
It turns out that as a result of this semantics we don't need expanded
*types* any more (types such as `expanded DOOR' where DOOR is reference); we
only have expanded *classes*, declared as `expanded class DOO' from the start.
This is not really restrictive, as it is easy, if DOOR is a reference class, to
declare a small class
although in fact we have found that this is seldom necessary; most concepts have
a clear expanded or reference semantics from the start.
The simplification of the language description resulting from these innovations
is remarkable. Assignment and argument passing rules are now straightforward;
and a number of restrictions on how you can mix reference and expanded types
disappear. An indirect consequence is the "Class ANY Principle": not only are
all classes descendants of ANY, all types conform to ANY. This is a useful
property to rely on, in particular for handling generic data structures.
We feel that the expanded mechanism as it now exists achieves the full
integration of copy and reference semantics, and of the traditional basic types,
into a pure object-oriented framework.
-- Bertrand Meyer
|