October, 2005: Walking Down Memory Lane

Memory: Not Just Any Resource

At first blush, it would seem that treating memory just as any other resource—say, file handles—is a valid approach. But unlike anything else, memory is a structured medium on top of an unstructured (better said, less structured) resource. The unstructured resource consists of memory words, chunks, and everything that has to do with manipulating raw storage. The structure is given by the type system and tells you things like, "These four bytes represent an int, those other four bytes represent a float, and those other four bytes store a pointer to a value that in turn has its own structure," and so on. This structuring (typing) of memory is famously useful, and C++ establishes and partially enforces it during compilation. If you think of it, no other resource has a static type system on top of it, although appearances can be deceiving. Say, for example, you think of a socket as an unstructured resource, and of a class HTTPStream as a structured resource on top of it. But what does HTTPStream's structure rely on? Things like encapsulation, abstraction, and polymorphism—all of which are possible because of the type system living in memory! (Virtual tables can't be overwritten, private data can't be tampered with, function bodies can't be modified during runtime...these all are wonderful features of the type system.) It turns out that every abstraction your program ever creates builds on typed memory (and the guarantees it offers) as a back end.

It should come as no surprise that such structure had better be solid. After all, every program abstraction ultimately rests on the type system. A nice property of languages is the so-called "memory safety," which basically means it's impossible for a program to corrupt the memory structures. Languages such as Java or LISP are memory-safe. Languages such as C and C++ are not memory-safe because they also offer access to the unstructured fabric that typed memory rests on: the raw bytes in the computer's memory. Pointer arithmetic, casts, unions, unchecked array accesses, and memcpy are all examples of ways in which you can arbitrarily peek or poke into typed memory. But there are correct uses of such features as well, and that's why memory-unsafe languages are useful in applications that must manipulate raw memory, such as memory allocators, system-level code, and garbage collectors.

Most important for the subject at hand, the structure of the memory can be violated by using "dangling" pointers to memory that has been deallocated (and potentially reallocated to host objects of some other type). That's why garbage collection is more than just a mere preference, a fad, or the staple of lazy programmers who can't clean up after themselves: Today, we have no other general memory-management technology that preserves memory safety. Freeing memory by hand is dangerous in the general case because you might leave dangling pointers behind. No wonder, then, that languages claiming to be "safe" invariably come with garbage collection out of the box. There's no other known way around it, unless you restrict expressiveness or you take the risk of potentially breaking memory safety. Proving that a chunk of memory is unused takes time and space, so garbage collection usually incurs a runtime cost. But with the rising rate and costs of safety-related exploits and bugs, one can't stop wondering what the short end of the deal really is. The challenge is to create language features that harmoniously combine the strengths of garbage collection and deterministic termination: Ideally, HTTPStream has a deterministic destructor that closes the socket and nullifies it, while the actual memory in which HTTPStream sits is garbage collected, thus never allowing tampering via dangling pointers. q