Void Safety: Putting an End to the Plague of Null Dereferencing

Dr. Dobb's Digest September 2009

Void Safety: Putting an End to the Plague of Null Dereferencing

Almost every program runs the risk of hitting a void call in some execution

By Bertrand Meyer, Alexander Kogtenkov, and Emmanuel Stapf

One of the benefits of progress in programming languages is to help us sleep better. Static typing is one example. If you are using a statically typed object-oriented language you know that certain errors will never occur at run time: Once the compiler accepts a call of the following form (the fundamental operation of object-oriented programming):


x.f (args)

you have the guarantee -- almost as good as a mathematical proof -- that whatever happens to x at run time, even if x is polymorphic, the corresponding object will be equipped with an operation for f, able to accept arguments args. No more waking up in the middle of the night with visions of "Method not understood" failures. A whole class of nasty, unpredictable runtime errors goes away; so does, for sleep therapists, some of the programmer clientele.

Well, not entirely. To talk of "the corresponding object" is wishful thinking: in many cases the runtime value of x is not object but a "reference", which normally denotes an object but can also be null, or "void" (the terminology we'll use). If x is void, the call x.f (args) is a "void call" (or "null dereference"). A void call will fail, causing an exception and often a program crash.

However hard we try, testing cannot eliminate the specter of void calls. As a consequence, almost every program runs the risk of hitting a void call in some execution. This time bomb is not just ticking in object-oriented programs -- almost all C programs use pointer dereferencing and have the same risk.

Since testing can never be exhaustive, the only acceptable solution is to provide a static (compile-time) guarantee of "void safety", by tightening the rules on the programming language and equipping the compiler with the corresponding checks. That's what we have done with the Eiffel language, through the ISO-ECMA standard [3], and the EiffelStudio compiler. We took inspiration from earlier work by Microsoft Research on Spec#'s "nullable types" [2]. Our solution has some significant differences; in addition, we are dealing with a production language rather than a research language, and are facing a major issue of backward compatibility.

The initial design of the Eiffel "Attached type" mechanism was described in a 2005 paper [5]; making it practical, in particular making all existing libraries void-safe, turned out to require a considerable engineering effort. In this article we'll describe both the theoretical concepts and their practical implications.

As testimony to the critical nature of this issue it is interesting to note Tony Hoare's recent comment [4]:

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.

As far as we know, the solution we describe to the "one-billion dollar mistake" is the first one in a production language and environment. It was progressively introduced in releases of EiffelStudio beginning with 6.1, and is fully implemented in 6.4, released in June of 2009.

Overview

We start with an outline of the solution. The goal is to have a static guarantee that no void calls will ever happen. The core concept is to require conforming compilers to ascertain whether a variable or expression is "attached" and to apply the following rule:


Void safety rule: A call x.f (...) is only permitted if x is attached.

So far it's only words; we have to define "attached" (a static property of variables and expressions) to rule out void calls. The definition provides three ways for a variable x, of some type T, to be attached:

Attached type. A type can be "attached" or "detachable". An attached type (the new case) does not admit Void (the void value, called "null" in some other languages) among its values. If T is attached the call is "void-safe". This is the most drastic solution; we'll see that it raises the problem of initialization. Cases 2 and 3 assume that T is detachable.
CAP. The context of a call may guarantee that x has a non-void value. For example if x is a local variable the call in if x /= Void then x.f (...) end is void-safe. Such a pattern is known as a "Certified Attachment Pattern" (CAP).
Object test. The last example is not necessarily void-safe if x is an arbitrary expression, because of the possibility of side effects and also in the presence of multithreading. The scheme becomes void-safe if rewritten as if attached x as l then l.f (...) end. The object test is the boolean expression attached x as l; it has value True if an only if the value of x is non-void, and also binds l (a fresh name) to that object over the scope of the object test, which in this case is the then clause.

In addition, special care must be taken when handling variables of generic types.

Before getting to the details, we note the constraints that the mechanism had to satisfy:

Static: We wanted void safety to be entirely guaranteed by compile-time rules, with no remaining run-time check.
General: the mechanism had to be compatible with all the other components of a modern programming language, such as generic types, and the practical constraints of program execution, such as multithreading.
Simple: no mysterious rules; for programmers, easy to learn; for compiler writers, realistically implementable.
Compatible: minimal language extension; respects the spirit of the language; fits well with other constructs; does not limit programmer expressiveness; minimum change for existing code.

The last requirement was critical and caused the most difficulties. There are systems with millions of lines of Eiffel code out there, including many running mission-critical applications. They may have to be updated, if only because the new mechanisms will catch genuine cases of potential void calls. But the conversion effort should be reasonable. It should minimize "false alarms" (cases in which the rules force the programmer to modify code even though there is no actual risk).

In practice, of course, compilers offer compatibility options allowing old code to compile even if it has not yet been updated to void safety.

Attached Types

Here now are the details of the first mechanism -- attached types. In the same spirit as static typing, we extend the type system of the language to handle void safety.

A reference type is now either "attached" or "detachable". The difference is that Void is a valid value for detachable types only. Syntactically, a detachable type is marked by the keyword detachable, as in:


right: detachable LINKABLE [G]

Attached types can similarly use attached, but this is usually not needed as "attached" is the default status. This convention was a key design decision, resulting from the observation that application-oriented types -- BANK_ACCOUNT, NETWORK_CONNECTION, FONT, RUNWAY -- do not normally need a void value. Mostly, "void" is needed for the implementation of linked data structures, often handled in ervice layers of a program or in libraries, rather than in the business-oriented layers.

Defining a type T as attached would only be wishful thinking in the absence of semantic rules guaranteeing that any run-time use of a variable x of type T will find x non-void. This requires the mechanism to enforce two properties:

The value of x on first use, as defined by language initialization rules, must be non-void.
Any assignment to x must leave the value of x non-void.

The second one is easy to achieve through a rule that prohibits an assignment x := y (or the corresponding argument passing) if the type of x is attached and the type of y detachable. Note that assigning from attached to detachable is okay.

This leaves the issue of initialization. Eiffel includes initialization rules for all types; for example, all integer variables are initialized to zero and all booleans to False. For reference types, the earlier initialization value was Void: the most obvious one -- but also the worst possible choice for an attached type! The void safety mechanism requires a new policy. The key notion is that of a variable being "properly set", meaning that it has been given a non-void value. Then for any use of a variable of an attached type at a certain position, one of the following must hold:

The variable is properly set at that position. This possibility applies to both kinds of variable: attributes (fields) of a class, and local variables of a routine.
The variable is an attribute, and is properly set at the end position of every creation procedure (constructors) of the class.

A variable x of type T is properly set at a certain position if one of the instructions preceding that position is an assignment to x or a creation instruction for x. So it is valid to use x in the last instruction in:

 

r 

  do

     create y

     ...

     x := y

     print (x)

 end

x is properly set since the preceding instruction is an assignment to x. In that assignment, y itself is properly set since it was earlier the target of a create instruction.

The simplicity of the rules governing attached types suggests using these types as much as possible, especially in business-oriented layers. The remaining cases, detailed next, are mostly relevant to programmers in charge of low-level data structure implementations.

Certified Attachment Patterns

In the interest of simplicity and compatibility, it is desirable to accept without change some program schemes that are demonstrably void-safe even though they involve detachable variables or expressions. Such a scheme is called a Certified Attachment Pattern. One CAP actually suffices in the language standard; it states that a call x.f (...), where x is a formal argument or local variable of a routine of a detachable type (for an attached type, we don't need any new rule!), is void-safe if it both:

Appears in the scope of a void test involving x.
Is not preceded, in that scope, by a setter for x.

A "setter for x" is an assignment to x or a control structure that (recursively) includes a setter for x. A void test is one of the following, possibly involving a boolean expression e:

Positive void test: the simple form x = Void, or the composite form x = Void or else e.
Negative void test: the simple form x /= Void, or the composite form x /= Void and then e

(and then and or else are the non-commutative variants of and and or: If a is False, evaluating a and then b will not evaluate b; if a is True, evaluating a or else b will not evaluate b.) The scope of a void test v includes:

S1 If v is of a composite form: the rest of the expression, e.
S2 If v is a negative void test appearing in if v then ... end (where the instruction may include elseif clauses and an else clause): the then clause.
S3 For a positive void test appearing in if v then ...else ...end: the else and elseif clauses.
S4 For a positive void test appearing in a loop from ... until v loop ... end: the body (loop clause).

The CAP makes it possible to sanction a wide range of fundamental programming schemes. Examples include not only the simple conditional if x /= Void then x.f (...) end but algorithms for traversing or searching linked structures, such as:




from 

         l := first_element

until

         l = Void or else l.item ~ sought  --The operator "~" denotes object equality

loop

         l := l.right

end

Such schemes occur frequently in data structure libraries and reflect natural ways of expressing search and traversal operations. It was critical, as part of the compatibility and simplicity requirements, to accept them without change.

The CAP also allows other frequent schemes arising in boolean expressions, in particular contracts (preconditions, postconditions, class invariants), such as:


x /= Void and then x.some_property

The CAP is limited to formal arguments and local variables, excluding attributes. Cases such as the following are indeed not void-safe if x is an attribute:


if x /= Void then

     routine_call

     x.f (...)

end

In a multithreading execution, another thread could falsify the property x /= Void before the call. Even with single threading, a routine call may perform an assignment to x. Such cases require the last technique -- object test.

Object Test

The "object test" language construct provides, under a simple syntax, a general solution to the problem known as runtime type identification (RTTI) and variants such as "downcasting", with particular application to void safety. An object test checks that a reference denotes an object of a specified type, and if so catches the object under a local name to allow safe processing over a certain syntactic scope. For example:


if attached {T} exp as l then

    ...Operations on l, such as l.f (...)...

 end

The object test is attached {T} exp as l. It is a boolean expression, evaluating to True if and only if the value of exp is a reference to an object of type T or conforming. Then the "object-test local" l, a fresh name, will denote a reference to that object throughout the scope of the object-test local, defined by the same rules as the scope of a void test. Here, the scope is the then clause; the effect of the conditional instruction is to determine whether exp denotes an object of type T and, if so, to apply the given operations to that object. Because the object is captured under the temporary name l at the time of the object test, no interference may arise from multithreading or from operations that could cause the value of exp to change.

When x is an attribute, we can use an object test to get a void-safe version of the last example:


if attached x as l then

      routine_call

      l.f (...)

end

(We may omit the type specification {T} if T is the declared type of x.)

Genericity and Arrays

How does the mechanism handle generic classes? The extension is natural, but reserved a surprise for us: arrays were the last nut to crack, and one of the hardest.

Genericity, first introduced into O-O programming by Eiffel, is now widely recognized as one of the fundamental object-oriented mechanisms on a par with classes and inheritance. A generic class LIST [G] makes it possible to define types LIST [T] for any type T. Or you can make the genericity "constrained": class VECTOR [G "> NUMERIC] ... will make VECTOR [T] valid if and only if T is a descendant of NUMERIC, called the "generic constraint".

In pre-void-safe mode, the unconstrained form C [G] was considered as an abbreviation for C [G -> ANY], where ANY is the library class serving as ancestor to all other classes. Void safety requires a slight adaptation of this rule: we now treat LIST [G] as a shorthand for LIST [G -> detachable ANY].

In a generic class, a variable x of type G (the formal generic parameter) will be considered attached if the generic constraint (explicit for VECTOR, implicit for LIST) is an attached type. If this would cause x to be detachable, you can make the call x.f () valid through a CAP or object test. If you want x to be attached, you can declare it as:


x: attached G

or declare the class as:


C [G -> ANY]       -- ANY

making x attached. ANY means attached ANY; other attached types would also work.

These rules have achieved one of our original goals: enabling an easy conversion of the EiffelBase library of fundamental data structures and algorithms -- lists, stacks, queues, trees -- with minimal rewriting.

These rules are, however, insufficient for arrays. With what we have seen so far it is impossible to accept a declaration:


a: ARRAY [T]

where T is attached. Normally, an array is created through:


create a.make (l, h)

(In C++/Java syntax: a := new ARRAY (l, h).) The arguments l and h are the initial bounds. After that creation, any operation is possible on the array, including access through a [i], a shortcut for the object-oriented form a.item (i). But this is not void-safe: make has no provision for initializing the array elements, so a [i] of type T (attached) might be void.

Other generic classes take care of their own initialization; but arrays are special, since they rely on a contiguous memory area beyond programmers' control.

The solution is that for attached types array creation may no longer use the make creation procedure, but must use make_filled, with an extra argument representing a default value:


make_filled (low, high: INTEGER; value: G)

so that a valid way to create the array is:

create a.make (l, h, val)

which conceptually initializes all array entries to val (a non-void value); "conceptually" because the implementation can be smarter and avoid storing val explicitly. This new rule raises some incompatibilities in existing code, but is inevitable to guarantee void safety in the presence of arrays.

This concludes the description of the mechanism. Some fine-tuning (Check instruction and notion of "stable attribute") is useful for the conversion of large programs and libraries; see [5] for details.

The Conversion Effort

How hard is it to move existing code to void safety? We have recently converted thousands of classes, providing some interesting measures:

Introduction of object tests: for 215 classes of Free ELKS (the Eiffel Library Kernel Standard) the code included, before conversion, 100 cases of assignment attempt (the predecessor to object test). After conversion, it has 115 object tests. All the added object tests are in contract elements.
Overall scope of the changes: in the EiffelBase library (of which ELKS is a subset), the percentage of lines changed between versions 6.1 and 6.4 is 11% (9093 out of 82,459). These changes span three release cycles over an eighteen-month period, during which many non-void-safety-related changes also took place.
Since EiffelBase library covers fundamental data structures, it makes extensive use of void references and is probably a worst-case example. The much larger EiffelVision multi-platform graphical library was converted in a single shot between versions 6.3 and 6.4, with almost no other changes. The number of changed lines was 10,909 out of 376,592 -- less than 3%.

The experience is encouraging, especially given that these libraries perform complex data structure manipulations, and hence needed void values much more than the typical application program.

Assessment and Conclusion

Predictably, the need to convert existing code has caused some concerns among users. At the time of writing, it seems that these objections are subsiding; we hope that users will come to view the benefits of void safety as justifying the costs of conversion.

Many discussions of language design consider small languages and individual features. What we have seen in this discussion is a challenging variant of this problem: evolving an industrial language, with a rich set of existing features that coexist in a tight equilibrium, and with millions of existing lines of production code that cannot be discarded or rewritten overnight.

Our own experience with void safety has involved many trials and many errors. The outcome includes not just lessons learned but, concretely, the availability for the first time of guaranteed void safety in a mainstream object-oriented language -- and for many developers, we hope, far more peaceful nights.

References

[1] Eiffel community: Void safety migration guide, at http://dev.eiffel.com/Void-Safe_Library_Status.

[2] Manuel Fhndrich and Rustan Leino: Declaring and Checking Non-null Types in an Object-Oriented Language; in OOPSLA 2003, SIGPLAN Notices, vol. 38 no. 11, November 2003, pp. 302-312.

[3] ECMA International: Eiffel Analysis, Design and Programming Language, Standard ECMA 367, 2005; also ISO/IEC standard 25436:2006.

[4] C.A.R. Hoare: Null References: The Billion Dollar Mistake, abstract of talk at QCon London, 9-12 March 2009, at http://tinyurl.com/8wmgtt.

[5] Bertrand Meyer, Attached Types and their Application to Three Open Problems of Object-Oriented Programming, in ECOOP 2005, ed. Andrew Black, LNCS 3586, Springer Verlag, 2005, pages 1-32.

A longer version of this paper, submitted to the volume in honor of Tony Hoare's 75th birthday, is available in draft here.