Features


Fast Memory Allocation Scheme

Steve Weller


Steven Weller is president of Windsor Systems, specializing in OS-9, system-level and real-time software, and computer graphics. An electronics engineer from England, Steve has been in software for nine years and has particular interest in parallel computer applications, modern computer languages, operating systems, and the management of technology. He may be contacted at 2407 Lime Kiln Lane, Louisville, KY 40222 (502) 425-9560.

In applications requiring the dynamic allocation of a large number of small objects, the overhead associated with general-purpose allocation schemes can be large: between 20 and 200 percent of the actual stored data. To minimize this problem I use a layered allocation system in which standard system calls allocate relatively large blocks of memory to a simpler memory management subsystem.

All of the smaller objects belonging to a single data structure (e.g., a tree or linked list) are then "borrowed" (using a low overhead allocation scheme) from one (or a set) of these layer blocks. Unlike generalized allocation routines, the "borrowing" system doesn't attach allocation information to any of the borrowed objects, potentially reducing memory overhead. Moreover, because the entire data structure is freed as a unit, I avoid the overhead of attempting to coalesce adjacent, freed objects (except for the underlying large blocks).

Why Not malloc()?

malloc() and free() are the most commonly used standard C function calls for memory allocation and deallocation. They are general and easy to use, but inefficient for small amounts of memory, both in terms of storage overhead and speed.

malloc() collects the requested amount of memory, allocates it, and returns a pointer to the allocated area. On my machine malloc() adds an overhead of eight bytes to every piece of memory allocated. malloc() is also not particularly fast since it must manipulate the links it maintains between allocated blocks each time memory is allocated.

free() deallocates the memory block whose address is passed by undoing malloc()'s links and adding the block to the list of free blocks, merging adjacent blocks if possible. Using free() to deallocate a large number of small blocks is very inefficient.

Borrowing Memory

Memory borrowing, as I call it, allows the user to obtain memory in small pieces, but return it all in one go. A call to iniz_borrow() sets up the system:

if ((id=iniz_borrow(2000))==0)
     error("Can't get memory\n");
iniz_borrow() accepts a number which represents the block size to be allocated from the system when memory is required, in this case 2000 bytes. The routine allocates either one block and returns a memory ID, or returns zero indicating that an error has occurred.

All subsequent allocations and deallocations use the unique memory ID number. Any number of memory IDs may be created, each with its own allocation size, but all the memory associated with one ID must be returned at the same time. Normally each memory ID is associated with a separate large data structure. Each time memory is needed for a small object within one of these structures, borrow() is called:

if ((new=borrow(id,size))==0)
     error("Can't get memory\n");
borrow() allocates memory from the block defined by the ID and returns a pointer to it (here assigned to new), or a zero on error. Additional blocks are acquired from the system if necessary.

Two functions free "borrowed" memory; both return all memory allocated with one ID. return_borrow() returns all but the first block to the system, leaving the memory ID valid and reusable. deiniz_borrow() returns all memory to the system, making the memory ID invalid and unusable.

The Borrow Functions

Listing 1 contains the header information and the initialization routine. iniz_borrow() allocates a block of memory and places the allocation information in the MemBlock structure at the start of the block. The routine returns the block's address as the memory ID. As more blocks of memory are required, they will be linked to the first.

The allocate() routine shown in Listing 1 can be any allocation routine you have, probably sbrk() or malloc(). Your routine must, however, return a zero on failure.

Listing 2 shows the borrow() routine itself. The requested amount of memory is rounded up to an even number of bytes, keeping the allocated memory addresses on even byte boundaries. This restriction can be dropped if not required, or changed to

need=(need+3)&~3
or

need=(need+7)&~7
to ensure that even word or even long word alignment is maintained.

Next the amount of memory requested is compared with the amount remaining in the current block. If the remaining memory is insufficient, another block is allocated and linked to the current block. borrow() updates the MemBlock structure in the first allocated block to identify the newly allocated block as the current block, and adjusts the offset to allow for a pointer at the start of the new block. It is not necessary for any block other than the first to contain the whole MemBlock structure.

The offset mb_offs identifies the amount of memory that has been allocated in the current block. To satisfy a memory request, the address of the allocated memory is computed (by adding the current block pointer mb_pres to the offset), the offset is incremented (by the amount of memory allocated), and the original memory address returned.

Listing 3 shows the deallocation routines. deiniz_borrow() returns all the blocks allocated by the system by running down the allocated list, calling deallocate() as it goes. deallocate() can be any deallocation routine complementary to the allocation routine used in iniz_borrow(). It must, as before, return a zero on failure.

return_borrow() is similar to deiniz_borrow() except that it does not return the first block, and hence keeps the memory ID valid. The MemBlock structure at the start of the first block is reset to show an empty first block — the same state that it was left in by iniz_borrow().

Block Size

Using a large block size results in fewer allocations and deallocations from system memory, and hence greater speed, but at the expense of greater memory overhead. If the block size is only a few times greater than the memory being allocated by borrow(), then large amounts at the end of each block will remain unused.

Conclusion

This simple memory allocation system takes advantage of the way that many applications allocate and deallocate memory. It can be tailored to different data structures by grouping memory allocation for each type of structure under separate memory IDs, each with a different block size. The simple allocation mechanism produces a fast and efficient system.