C/C++ Contributing Editors


Sutter’s Mill: Containers in Memory: How Big Is Big?

Herb Sutter

If you are basing your selection of standard containers on memory requirements, then Herb has some bad news and some good news.


Copyright © 2000 by Herb Sutter

This column addresses two topics: an update on what’s going on in the C++ standards process, and a technical question. For information about the former, see the accompanying "Standards Update" sidebar for breaking news about the 1998 C++ Standard’s first official update, whose ink should be drying as you read this, and how it affects you.

The technical question is this: how much memory do the various standard containers use to store the same number of objects of the same type T? To answer this question, we have to consider two major items:

Let’s begin with a brief recap of dynamic memory allocation and then work our way back up to what it means for the Standard library.

Memory Managers and Their Strategies: a Brief Survey

To understand the total memory cost of using various containers, it’s important to understand the basics of how the underlying dynamic memory allocation works — after all, the container has to get its memory from some memory manager somewhere, and that manager in turn has to figure out how to parcel out available memory by applying some memory management strategy. Here, in brief, are two popular memory management strategies. Further details are beyond the scope of this article; consult your favorite operating systems text for more information:

In practice, you’ll often see combinations of the above. For example, perhaps your memory manager uses a general-purpose allocation scheme for all requests over some size S, and as an optimization provides a fixed-size allocation scheme for all requests up to size S. It’s usually unwieldy to have a separate arena for requests of size 1, another for requests of size 2, and so on; what normally happens is that the manager has a separate arena for requests of multiples of a certain size, say 16 bytes. If you request 16 bytes, great, you only use 16 bytes; if you request 17 bytes, the request is allocated from the 32-byte arena, and 15 bytes are wasted. This is a source of possible overhead, but more about that in a moment.

The obvious next question is, who selects the memory management strategy? There are several possible layers of memory manager involved, each of which may override the previous (lower-level) one:

Thus memory allocators come in various flavors, and can or will vary from operating system to operating system, from compiler to compiler on the same operating system, from container to container — and even from object to object, say in the case of a vector<int> object which uses the strategy implemented by allocator<int>, and a vector<int, MyAllocator> object which could express-mail memory blocks from Taiwan unless it’s a weekday night and the Mets are playing, or implement whatever other strategy you like.

"I’ll Take ‘Operator New’ For 200 Bytes, Alex"

When you ask for n bytes of memory using new or malloc, you actually use up at least n bytes of memory because typically the memory manager must add some overhead to your request. Two common considerations that affect this overhead are:

1. Housekeeping Overhead

In a general-purpose (i.e., not fixed-size) allocation scheme, the memory manager will have to somehow remember how big each block is so that it later knows how much memory to release when you call delete or free. Typically the manager remembers the block size by storing that value at the beginning of the actual block it allocates, and then giving you a pointer to "your" memory that is offset past the housekeeping information. (See Figure 1.) Of course, this means it has to allocate extra space for that value, which could be a number as big as the largest possible valid allocation and so is typically the same size as a pointer. When freeing the block, the memory manager just takes the pointer you give it, subtracts the number of housekeeping bytes and reads the size, then performs the deallocation.

Of course, fixed-size allocation schemes (i.e., ones that return blocks of a given known size) don’t need to store such overhead because they always know how big the block will be.

2. Chunk Size Overhead

Even when you don’t need to store extra information, a memory manager will often reserve more bytes than you asked for because memory is often allocated in certain-sized chunks. For one thing, some platforms require certain types of data to appear on certain byte boundaries (e.g., some require pointers to be stored on 4-byte boundaries) and either break or perform more slowly if they’re not. This is called alignment, and even plain old built-in C-style arrays are affected by this need for alignment; see Figure 2. (Incidentally, this is why it’s surprisingly tricky to wordsmith the requirement that "vectors must be contiguous" in the same sense as arrays — in Figure 2, the memory is considered contiguous, so what is "contiguous," really? See also the "Standards Update" sidebar accompanying this article.) The C++ Standard guarantees that all memory allocated via operator new or malloc will be suitably aligned for any possible kind of object you might want to store in it, which means that operator new and malloc have to respect the strictest possible alignment requirement of the native platform.

Alternatively, as described earlier, a fixed-size allocation scheme might maintain memory arenas for blocks of certain sizes that are multiples of some basic size m, and a request for n bytes will get rounded up to the next multiple of m.

Memory and the Standard Containers: the Basic Story

Now we can address the original question: how much memory do the various standard containers use to store the same number of elements of the same type T? Each standard container uses a different underlying memory structure and therefore imposes different overhead per contained object:

Table 1 summarizes this basic overhead for each container.

Memory and the Standard Containers: the Real World

Now we get to the interesting part: don’t be too quick to draw conclusions from Table 1. For example, judging from just the housekeeping data required for list and set, you might conclude that list requires less overhead per contained object than set — after all, list only stores two extra pointers, whereas set stores three. The interesting thing is that this may not be true once you take into consideration memory sizes. To dig a little deeper, consider Table 2, which shows the node layouts typically used internally by list, set/multiset, and map/multimap.

Next, consider what happens in the real world under the following assumptions, which happen to be drawn from a popular platform:

Looking at Table 3, we immediately spy one interesting result: For many cases — that is, for about 75% of possible sizes of the contained type Tlist and set/multiset actually incur the same memory overhead in this particular environment. What’s more, here’s an even more interesting result: even list<char> and set<int> have the same actual overhead in this particular environment, even though the latter stores more object data and more housekeeping data in each node. If memory footprint is an important consideration for your choice of data structure in specific situations, take a few minutes to do this kind of analysis and see what the difference really is in your own environment — sometimes it’s less than you might think!

Summary

Each kind of container chooses a different space/performance tradeoff. You can do things efficiently with vector and set that you can’t do with list, such as O(log N) searching [3]; you can do things efficiently with vector that you can’t do with list or set, such as random element access; you can do things efficiently with list, less so with set, and more slowly still with vector, such as insertion in the middle; and so on. To get more flexibility often requires more storage overhead inside the container, but after you account for data alignment and memory allocation strategies, the difference in overhead may be significantly different than you’d think! For related discussion about data alignment and space optimizations, see also Item 30 in Exceptional C++ [4].

Acknowledgments

Thanks to Pete Becker for the discussion that got me thinking about this topic.

References

[1] L. Marrie. "Alternating Skip Lists," Dr. Dobbs Journal, 25(8), August 2000.

[2] Jon Bentley. Programming Pearls, Second Edition (Addison-Wesley, 2000).

[3] This is possible if the vector’s contents are sorted.

[4] Herb Sutter. Exceptional C++, 47 Engineering Puzzles, Programming Problems, and Solutions (Addison-Wesley, 2000).

[5] Herb Sutter. “Standard Library News, Part 1: Vectors and Deques,” C++ Report, 11(7), July/August 1999.

[6] Herb Sutter. “When is a Container not a Container?” C++ Report, 11(5), May 1999.

Herb Sutter (hsutter@peerdirect.com) is chief technology officer of PeerDirect Inc. and the architect of their heterogeneous database replication products. He is secretary of the ISO/ANSI C++ standards committees, a member of the ANSI SQL committee, and author of the book Exceptional C++ (Addison-Wesley).