A Secure Storage Allocator Template

Edward Elliott

A policy-based approach to protecting sensitive data.

Secure storage of secrets is a long-standing issue in software design. One tiny aspect of this general problem is storing sensitive data (cryptographic keys, passwords, etc.) in physical memory. System-dependent factors in the handling of virtual memory make relying on the standard virtual memory abstractions insecure [1].

Fortunately, STL containers use allocators, allowing precise user control of memory allocations and deallocations. A secure allocator template can be devised to deal with some of the security issues of memory management. Using such an allocator, STL containers can hold sensitive data, bringing their reliability and ease-of-use to the management of cryptographic keys and passwords.

I will discuss the design of a thread-safe secure allocator template suitable for use with STL containers. The allocator is policy-based for easy extensibility. I will provide source code for Windows/Microsoft Visual C++ and Linux/gcc (available for download at <www.cuj.com/code>). Adapting the code for other platforms (especially Unix-like platforms) and compilers should be easy. Those interested only in using the allocator can skip to the "Usage" section below, but be aware of this approach's limitations given in the "Conclusion."

Motivation

Handling sensitive information requires particular attention to detail. Data such as cryptographic keys, passwords, and secret access tokens must be handled in a way that minimizes the chances of an adversary learning the data. An "adversary" here could be another process on the same machine, an eavesdropping host along a communications path, or even a person glancing at a console over the shoulder of a system administrator.

Consider a program holding a secret key in memory at run time. What happens when the process is done with the key and releases the memory back to the operating system? The system makes this memory available to other processes. On some systems, an attacker can run a trivial program on the target machine that continually requests memory from the system, inspects its contents for keys, and releases it. As a result, a process should overwrite sensitive data before releasing its memory.

Data written to magnetic storage media suffers even more vulnerabilities. "Deleting" a file usually involves nothing more than removing an entry from a directory table. The blocks containing the file's data remain on disk, where they can be read at any time by another program [2]. Commercial undelete programs exploit this behavior to recover files. More sophisticated attacks using special equipment can even read overwritten data off a disk [3].

To counter these threats, the prudent course is to ensure sensitive data never hits the disk. Secrets stored in physical memory can potentially be paged out to a swap file by the virtual memory manager. A security-conscious program should therefore try to prevent sensitive memory from being swapped to disk. Otherwise, copies of the data may remain on the disk long after the program has terminated.

The benefits of an allocator that can perform these tasks are obvious. As with any type of data, STL containers greatly simplify the chores of management.

Design

The design of allocators in general is beyond the scope of this article. Comprehensive treatment of that topic may be found in Chapter 15 of [4] and Items 10 and 11 of [5]. For now, it is enough to note that allocators are invoked to handle four events in an object's life cycle: memory allocation, object construction, object destruction, and memory deallocation. The methods that perform these tasks are shown in pseudocode below:

class SecureAllocator
{
  pointer allocate(size_type num);
  void deallocate(pointer pmem);

  void construct(pointer pmem,
       const value_type& val);
  void destroy(pointer pmem);
}

Following the example of the Loki library, the secure allocator will be built around orthogonal policies (see Chapter 1 of [6] for an introduction to policy-based design). Briefly, a policy defines an interface that abstracts a design choice or feature [7]. A policy class implements that interface in one particular way. A host class (template, usually) uses a policy class to accomplish a task without knowing how it is done.

Allocators themselves are a good example of a policy. The interface an allocator must provide is only described in design documents. A concrete allocator class implements that interface in one particular way. A host class such as std::vector uses an allocator to obtain memory without knowing the details. Changing the memory allocation behavior of std::vector is as easy as changing a template parameter.

Here, the allocator will be hosting policies. As already noted, a secure allocator should enforce two behaviors. First, it should wipe the contents of any memory it has allocated before returning it to the operating system. Second, it should prevent the memory it allocates from being paged out to disk by the virtual memory manager.

The first of these, wiping memory, makes a good candidate for a policy. For one, it is independent of an allocator's other tasks: the method of wiping memory has no effect on how to allocate memory or construct an object. Secondly, there are several good ways to wipe memory depending on the situation, including sometimes not at all.

Each concrete choice of a wiping method makes the allocator suitable for some applications but unsuitable for others. A policy is a good way to isolate this choice so the allocator can be used in a wide variety of applications.

The policy that determines how to wipe sensitive data from memory will be called ScrubPolicy. It will have a method, doScrub, that wipes the contents of a memory block passed to it.

The only remaining question is when to perform a scrub. The allocator provides two opportunities to do so: when an object is destructed or when its memory is deallocated. A conservative approach dictates that sensitive data be stored for as short a time as possible to minimize risk of compromise [8]. On the other hand, deallocations are much less frequent than destructions and can process the memory for multiple objects at one time. From an efficiency standpoint, memory should be scrubbed on deallocation.

Rather than arbitrarily choosing one or the other, ScrubPolicy will allow both methods. SecureAllocator will call two ScrubPolicy methods, onDestroy and onDeallocate, at the appropriate times instead of calling doScrub directly. Each ScrubPolicy class will choose when to perform a scrub by having one of these methods invoke doScrub and the other do nothing. Since deallocate should seldom be on the critical execution path of a program, SecureAllocator will default to a conservative ScrubPolicy.

In pseudocode, SecureAllocator's use of ScrubPolicy will look like this:

class SecureAllocator
{
  void destroy(pointer pmem)
  {
    call *pmem's destructor
    ScrubPolicy::onDestroy(
      pmem, size of *pmem);
  }
  void deallocate(pointer pmem)
  {
    ScrubPolicy::onDeallocate(
      pmem, size of *pmem);
    deallocate pmem
  }
}

At first glance, whether to allow paging of sensitive memory to disk also appears suitable for a policy. There are at least two incompatible choices that make sense under certain conditions: disable paging or do nothing [9]. In fact, the first iteration of my design had a SwapPolicy class that handled all paging issues. But during implementation, I realized the allocation method and SwapPolicy interact quite a bit.

Most modern operating systems limit the amount of memory an unprivileged process can lock into RAM. Additionally, since the unit of swapping is a physical page, locking a memory block causes at least an entire page to be locked in place. As a result, all sensitive data should be allocated on separate pages from ordinary data. Otherwise, fragmentation (of sorts) will reduce the memory available for sensitive data.

This makes some allocation strategies, such as malloc and the global new operator, incompatible with a no-swap strategy. Allocation and swapping are not truly independent functions. Therefore I decided to use a single policy, AllocPolicy, which handles both the allocation and swap duties. SecureAllocator simply hands off memory requests to AllocPolicy, as the following pseudocode demonstrates:

class SecureAllocator
{
  pointer allocate(size_type num)
  {
    return AllocPolicy::doAlloc(
      num * size of object);
  }
  void deallocate(pointer pmem)
  {
    AllocPolicy::doFree(pmem);
  }
}

Almost all of SecureAllocator's functionality is now in place. The only remaining piece is thread safety. In keeping with the policy-based design, SecureAllocator will use a third policy, LockPolicy, to provide synchronization when appropriate.

The threading policies covered in [6] are much more general than needed here and only implemented on the Windows platform. However, the basics are the same. LockPolicy will define an internal class called Lock. At the beginning of any method that may require synchronization, SecureAllocator will create a LockPolicy::Lock local variable.

Thread-safe implementations of LockPolicy will lock a synchronization object (most likely a mutex) in Lock's constructor and unlock it in the destructor, which will be called when the method terminates. Single-threaded implementations of LockPolicy will have an empty Lock class, which does nothing. The following pseudocode demonstrates the usage:

void someMethod(args)
{
  LockPolicy::Lock lock;
  // method is now synchronized
  // method body goes here
  // lock is destroyed on return
}

SecureAllocator will only use LockPolicy to synchronize two methods this way: allocate and deallocate. Thus AllocPolicy's doAlloc and doFree methods will always be synchronized, as will ScrubPolicy's onDeallocate. SecureAllocator itself uses no member, static, or global variables, so there is no need to synchronize any of its other code. This leaves only the code in ScrubPolicy::onDestroy.

The decision not to synchronize the destroy method, which calls onDestroy, is based on two factors. One, ScrubPolicy is not likely to contain or access any non-const variables. Its doScrub method simply overwrites the contents of sensitive memory. In many cases, scrubbing with a fixed byte such as zero (or not scrubbing at all) will suffice.

Two, the performance penalty of synchronizing destroy is significantly higher than synchronizing allocate and deallocate. allocate and deallocate can operate on memory for many objects per invocation, whereas destroy must be called once per object.

Vectors, for instance, commonly ensure that growing the container is an amortized constant-time operation by doubling storage when more room is needed. This means allocate will only be called (on average) log(N) times as the vector grows to size N. But destroy must be called at least N times, once for each object in the vector. Synchronizing every destroy is thus much more expensive than synchronizing every allocate and deallocate.

If a ScrubPolicy implementation does require synchronization for onDestroy, it can be done separately in that method itself.

Implementation

While the design was kept sufficiently generic to accommodate any application, an implementation must consider its target audience. The assumed scenario from here on out will be an application handling small amounts of sensitive data. That is, assume the program holds tens, at most hundreds, of passwords, cryptographic keys, or secret access tokens in memory at any one time.

I feel this scenario covers the vast majority of applications likely to benefit from the allocator. It encompasses software as diverse as instant messengers, web browsers, peer-to-peer clients, compression software, and office suites [10]. It can even support a PGP-type program, which was the initial motivation for the project. Applications for which this particular implementation may be unsuitable, such as high-volume e-commerce servers, can follow the approach herein to craft an implementation that fits their needs.

SecureAllocator

SecureAllocator's implementation should be straightforward by now. Like any allocator, SecureAllocator will be a template with a parameter for the type of object being allocated. It will also have three other template parameters for the three policies discussed: AllocPolicy, ScrubPolicy, and LockPolicy. The relevant parts of SecureAllocator are shown in Listing 1.

While Loki often uses templates as template parameters, SecureAllocator's template parameters must be concrete classes. This is a concession made for widely used compilers that don't support template template parameters, such as Microsoft Visual C++.

SecureAllocator's usage of its AllocPolicy and ScrubPolicy requires some explanation. The C++ Standard says the semantics of two instances of the same allocator class that compare as non-equal are implementation-defined (Section 20.1.5). Basically this means cross-platform allocators can only have static variables.

This leaves several options for how SecureAllocator can use AllocPolicy and ScrubPolicy. Having SecureAllocator subclass these policy classes just pushes the problem off on them. Another inelegant, brute-force approach is to make the policy interface methods static. Both these solutions force the policy class to store all its variables statically as well.

A better approach is for SecureAllocator to define static variables of each policy class type, like static AllocPolicy ap, and call the policy methods through ap. The problem here is the order of dynamic initialization for variables in different translation units is undefined by the Standard. SecureAllocator may be called on to allocate memory at any phase of the program, including before ap is constructed [11].

To solve this, SecureAllocator can instead store a pointer, static AllocPolicy *ap. Because pointers are primitive types, their initialization is guaranteed during the loading phase, before any dynamic initialization occurs. A simple method to retrieve the object can test ap for null and construct a new AllocPolicy the first time:

static AllocPolicy *ap = NULL;
AllocPolicy& getAllocPolicy()
{
  if (ap == NULL)
  {
    ap = new AllocPolicy();
  }
  return *ap;
}

There are still two remaining hurdles. First, recall that SecureAllocator is a template parameterized by the type of object it's allocating. SecureAllocator<char>, SecureAllocator<int>, and SecureAllocator<Foo> are all different classes, each with its own copy of the static AllocPolicy pointer. For most scenarios, all memory should be allocated from the same AllocPolicy object to reduce fragmentation.

To achieve this, an external class will store the AllocPolicy and ScrubPolicy pointers. The PolicyHolder template in Listing 1 provides this functionality. Any SecureAllocator class can retrieve the same AllocPolicy pointer from PolicyHolder. PolicyHolder is essentially a Singleton manager. Loki's SingletonHolder class does the same thing more flexibly, but doesn't compile on nearly as many platforms.

Two down, one to go. The getAllocPolicy method shown above (now in PolicyHolder) is not thread-safe. SecureAllocator could synchronize before every call to PolicyHolder, but the destroy method needs to use ScrubPolicy. Synchronizing every destroy call was already deemed too expensive. Also, since PolicyHolder is generic enough to be useful outside SecureAllocator, it would be nice if PolicyHolder provided its own synchronization.

PolicyHolder could take a LockPolicy template parameter for synchronization. Since SecureAllocator does not accept template template parameters, it would have no clean way to provide PolicyHolder with a LockPolicy class other than the one passed to it. But SecureAllocator and PolicyHolder can't use the same LockPolicy class because SecureAllocator sometimes locks that mutex before PolicyHolder is even called (as in the SecureAllocator::allocate method).

Instead, PolicyHolder will always use a multithreaded LockPolicy class of its own to implement the Double-Checked Locking pattern. This ensures PolicyHolder can be used in multithreaded contexts. Even single-threaded programs will incur the overhead for the mutex. However, since the mutex is only locked once, the first time getPolicy is called, the expense is negligible.

ScrubPolicy

Listing 2 contains several classes implementing ScrubPolicy. Most applications should use the BasicScrubber class for scrubbing, which overwrites a memory block with the byte 0x00. The NoScrubber class doesn't perform any scrubbing at all; it is primarily intended for systems in which the operating system zeros out memory pages as they are released. HeavyScrubber overwrites memory multiple times with complementary bit patterns. It is generally not considered any more effective than BasicScrubber for wiping solid-state DRAM memory. However, other memory types (SRAM, flash, etc.) with different electromagnetic properties may benefit from HeavyScrubber. Those truly paranoid about data retention may find it useful even with DRAM

These three classes only implement the doScrub method. The templates ScrubOnDestroy and ScrubOnDeallocate adapt them for use in SecureAllocator. Each passes one of the onDestroy/onDeallocate calls on to doScrub while ignoring the other. This separates the task of implementing a scrubber from deciding when it will be called.

The default ScrubPolicy for SecureAllocator is ScrubOnDestroy<BasicScrubber>.

AllocPolicy

AllocPolicy proves a little trickier to implement. As already noted, an implementation that disables paging should make allocations from a contiguous region of virtual memory. The only way to obtain such a region with Standard C/C++ library functions is via malloc or new, which not only reserve a region of virtual memory but also allocate and commit it. In this case, AllocPolicy would have to acquire chunks of memory in page-sized increments and then break them into smaller chunks to satisfy SecureAllocator's requests. In other words, this approach involves implementing another memory manager on top of malloc or new.

Fortunately, we can avoid this road by taking advantage of system-specific features. The Win32 virtual memory interface includes a set of heap functions. Windows heaps provide exactly the features AllocPolicy needs. A heap reserves a region of the virtual memory space for itself, but only commits pages when needed. As the heap grows, its virtual memory region may not remain contiguous, but fragments will at least be multiples of the page size. A heap also satisfies allocations of arbitrary size, handling the bookkeeping itself. In fact, malloc and new on Win32 are usually implemented on top of a heap object.

Win32 also provides functions to disable and enable swapping of memory pages: VirtualLock and VirtualUnlock. Using these functions, implementing AllocPolicy with no swapping becomes trivial. Listing 3 shows a Win32 version of the class NoSwapHeap, which allocates memory and prevents it being paged out to disk.

NoSwapHeap's constructor creates a new heap and retrieves the system page size. The doAlloc method requests numBytes of memory from the heap and locks any pages that the memory block straddles into place with VirtualLock. If the allocation or lock fails, it throws a std::bad_alloc exception, freeing the memory first when necessary. The doFree method unlocks pages that are empty (through the removeBlock call) and frees the memory.

VirtualLock doesn't keep track of how many times it locks a page. If five memory blocks are allocated from one page, and VirtualLock is called five times, the first call to VirtualUnlock will unlock the entire page. Obviously the page cannot be unlocked until all five blocks are released and the page is empty.

The PageTracker class keeps track of the pages in use. The addBlock method, called on every allocation, stores the page index and increments an allocation count. When a block is freed, removeBlock decrements the counter and unlocks the page when it reaches zero. PageTracker can be subclassed by any AllocPolicy implementation for this purpose.

Data is stored in a vector sorted by page index for fast lookups. This adds 2 * sizeof(size_t) memory overhead to each page in use (stored on a different page, though), which on a typical system with a four-byte size_t and 4 KB pages is negligible. Occasionally as new pages are used, the vector will be reallocated, but pages will usually be added at the end of the vector, amortizing the insertion cost down to a constant factor.

NoSwapHeap defines no destructor and never releases its heap with HeapDestroy. This is a small but necessary resource leak, which the system cleans up on process termination. Because SecureAllocator can potentially be called to allocate memory at any time, there is no way to properly schedule the destruction of the AllocPolicy held by PolicyHolder. This is why PolicyHolder never deletes the AllocPolicy object it creates, and why AllocPolicy classes can't rely on being destructed.

NoSwapHeap can be easily adapted to other systems that allow users to create their own heaps, like AIX. Linux, however, provides no memory interfaces akin to heaps. The only system calls for managing memory are low-level functions like mmap and sbrk. Even user-space libraries providing higher-level memory constructs are hard to come by. Thus a different approach is called for.

Fortunately, Doug Lea's excellent memory manager, sometimes called dlmalloc, provides a solution. Normally, dlmalloc is used as a drop-in replacement for malloc. However, it can optionally use different names for its own memory functions, so it may be used side-by-side with the system malloc implementation. In this mode, it effectively functions as a separate memory heap, obtaining raw pages of memory from the system, which it manages itself.

NoSwapDLMalloc, shown in Listing 4, implements a no-swap allocation policy suitable for use on Linux (and many other Unix systems). The doAlloc and doFree methods are sufficiently like NoSwapHeap's to warrant little discussion. The differences are that dlmalloc and dlfree are called to allocate memory, and mlock and munlock are used to disable and enable paging. NoSwapDLMalloc also subclasses PageTracker to track allocations on each page.

A word of caution: NoSwapDLMalloc may not be suitable for some systems even though it compiles on them. For example, at least some versions of BSD require memory addresses passed to mlock and munlock to be multiples of the page size. Adapting the class for these systems should be trivial.

LockPolicy

Listing 5 shows two implementations of LockPolicy. The MutexLocking class provides a Lock inner class suitable for synchronizing multiple threads. MutexLocking contains a static mutex from the Boost library to provide cross-platform synchronization. The template parameter Target only serves to create different MutexLocking classes (with different mutexes) for different uses.

The NoLocking class defines an empty do-nothing inner Lock class. The template parameter Target only serves as a dummy parameter here to maintain similarity with MutexLocking.

The VolatileType typedef in both templates is used to add the volatile keyword to types in a multithreaded environment, while omitting it in single-threaded programs to allow for compiler optimizations.

Usage

Fortunately, using SecureAllocator is easy. Simply provide a SecureAllocator class as the allocator parameter to any STL container. SecureAllocator's policy template parameters all have good defaults; most applications won't need to change them. Often a single line will suffice:

typdef std::vector<char,
  SecureAllocator<char> > SecureCharVector;

Those with more advanced needs can choose from the predefined policy classes or define their own. Just make sure the LockPolicy passed to SecureAllocator is tied to AllocPolicy. Usage is still quite simple, if lengthy:

typedef std::vector<Foo, SecureAllocator<Foo,
  CustomAllocPolicy, ScrubOnDestroy<CustomScrubber>,
  MutexLocking<CustomAllocPolicy> > > SecureFooVector;

Other concerns involve how certain containers are implemented. Some versions of vector allocate large chunks (up to 2 KB) of memory unless reserve is called before inserting data (see page 149 in [4]). Obviously storing 16-byte cryptographic keys in such vectors can waste a lot of precious locked memory if not prevented.

Many std::string implementations employ small local buffers to avoid allocating small strings from the free store [5]. Such string objects may store all their data on the stack and never use their allocator. Unless you are certain your platform's string implementation does not do this, the std::string container is best avoided.

Conclusion

The SecureAllocator template can mitigate some of the easiest attacks on sensitive data stored in memory. By disabling paging, exploits that obtain data from the hard disk are prevented. Overwriting data before releasing memory to the operating system precludes another process from obtaining the same memory block and reading the data.

However, there are still many ways to compromise such data. Sophisticated attacks described in [3] can retrieve overwritten data from solid-state memory even hours or days after the power is removed. Usually, compromise is even easier. On most systems, a process run as the same user ID or administrator (root) can view the contents of another process's memory. Also, next to nothing can be done in software to protect machines with poor physical security or untrustworthy administrators.

Likewise, it is important to remember that the security of a program depends on many more factors than just how well it protects sensitive data. Everything from choice of cryptographic algorithms and access control to race conditions and random number generation has the potential for security vulnerabilities. Two excellent books, [12] and [13], provide a good overview of security issues to keep in mind when building software. Of course, there is still no substitute for the trained eye of an experienced security developer.

Notes

[1] This is the case under very common threat models in a multi-user environment.

[2] This usually involves bypassing the file-system driver and accessing the disk directly.

[3] The classic (and most comprehensive) treatment of problems with data storage is found in Peter Gutmann's "Secure Deletion of Data from Magnetic and Solid-State Memory," Sixth USENIX Security Symposium Proceedings, 1996. Available at <www.cs.auckland.ac.nz/~pgut001/pubs/secure_del.html>.

[4] Nicolai M. Josuttis. The C++ Standard Library: A Tutorial and Reference (Addison-Wesley, 1999).

[5] Scott Meyers. Effective STL (Addison-Wesley, 2001).

[6] Andrei Alexandrescu. Modern C++ Design (Addison-Wesley, 2001).

[7] Because policies are only design concepts, the interface definition never appears in code outside a concrete policy implementation.

[8] As will be seen, memory is still open to other attacks besides the ones SecureAllocator aims to prevent.

[9] Paging sensitive data is allowable on some systems, like those that encrypt their swap files.

[10] This says nothing about the security of how passwords and keys are actually used in such applications. Secure memory storage will do nothing to alleviate poor passwords or weak cryptographic algorithms.

[11] Imagine a global variable of type vector<byte, SecureAllocator> declared in another translation unit.

[12] John Viega and Gary McGraw. Building Secure Software (Addison-Wesley, 2002).

[13] Michael Howard and David LeBlanc. Writing Secure Code (Microsoft Press, 2002).

About the Author

Edward Elliott has spent time as a software developer and college lecturer He may be reached at secpub@eddeye.net.