Columns


Questions & Answers

Heap or Stack — Which Should You Use?

Kenneth Pugh


Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C and C++ language courses for corporations. He is the author of C Language for Programmers and All On C, and was a member of the ANSI C committee. He also does custom C programming for communications, graphics, image databases, and hypertext. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@allen.com (Internet) and on Compuserve 70125,1142.

Those of you who are communicating with me via email should note my new Internet address, kpugh@allen.com. In the spirit of downsizing, the mainframe on which I had my mail account is closing down due to lack of users.

Q

First, I want to take this opportunity to thank you for your contributions to the C Users Journal. I would also like to mention that I very much appreciate the FAX number for sending you questions.

I want to create a global array on the heap, of 300 strings of 83 characters each. For example:

    char scrnbuf[300][83];
except I want the storage on the heap. I have been attempting to use the C++ new and delete operators, but with little success, and even less understanding of what is actually happening. Naively, I tried:

    char scrnbuf = new char
[300][83];
but the compiler complained that it could not convert char[83]* to char. Also, the statement

    char *scrnbuf = new char [300][83];
evoked a similar message. I stumbled onto

    char (*scrnbuf)[83] = new char[300][83];
which seemed to allocate the correct amount of memory on the heap, but even after reviewing Christopher Skelly's "Pointer Power in C and C++" in the February and March issues of CUJ (by the way, in my opinion, every issue should have at least two articles of this kind) I am not sure I understand exactly what happened.

Thus, I have two questions: 1. How do I correctly allocate the array on the heap, and 2. What are the advantages of using heap memory rather than the stack, or is using the heap better than using the stack in this case, and why?

Tom Parke
Reno, Nevada

A

Since you asked two questions, I will answer in two parts. The first part will cover how memory can be allocated for arrays and the second will cover what kind of memory you would like to allocate.

The quick answer to your question is that your solution does work. Sometimes experimentation is the key to the solution of a problem. For the benefit of our C readers, I'll convert your example to C, and then switch back to C++ for the final answer.

Let's start by reviewing some array basics and then move on from there. If you simply used an array to hold a single string, you might code it something like:

    #define NUMBER_OF_CHARS 10
    char char_array[NUMBER_OF_CHARS] = "MNOPQRSTUV";
The type of char_array[0] is char and its value is 'M'. The type of char_array is array of 10 char. In most expressions this becomes pointer to char (also written as char *), and this pointer contains the address of the first element in the array. If char_array had a value of 500 (a hypothetical address), then char_array + 1 would have a value of 501. The address of char_array is a constant, so you cannot perform operations which attempt to modify its value, such as char_array++.

You could code a series of strings as:

    #define NUMBER_OF_STRINGS 5
    char char_matrix[NUMBER_OF_STRINGS]
                  [NUMBER_OF_CHARS] = {
        "ABCDEFGHIJ",
        "",
        "",
        "",
        "QRUSTVWZYZ",
        };
The type of char_matrix[4][0] is char and its value is 'Q'.

The type of char_matrix[4] is pointer to char (char *) and its value is the address of the fifth element, which contains "QRUSTVWZYZ".

The type of char_matrix is array of NUMBER_OF_STRINGS arrays of NUMBER_OF_CHARS char. In most expressions this does not become char, nor does it become pointer to char. Instead, it becomes a pointer to objects which are arrays of char that have NUMBER_OF_CHARS elements. This type is symbolized by char (*) [NUMBER_OF_CHARS] or char (*) [10]. char_matrix is also the address of the first element in the array of strings. According to pointer arithmetic, if the value of char_matrix is 1000, then the value of char_matrix + 1 is 1010.

If an array is static or global, the program allocates it from the data segment of the executable. If the array is automatic, the program allocates it from the stack segment. To use the heap, you need to use pointers. Let's declare some variables corresponding to those in my previous code samples.

The declaration

    char *p_char;
states that p_char is a pointer to objects of type char (char *). p_char is of the same type as char_array but its value is undefined until it is allocated memory. You can use malloc to allocate memory. The following code allocates memory for an array and stores a pointer to this memory in p_char.

    int NUMBER_OF_CHARS = 10;
    p_char = (char *) malloc(NUMBER_OF_CHARS);
(In ANSI C, it is not necessary to use a cast for the (void *) return type of malloc, but it is necessary in C++.) In this example, I'm using a variable for the array size, NUMBER_OF_CHARS, to emphasize that the size of the memory allocation is not fixed at compile time. You can put data into this memory with:

    strncpy(p_char, "MNOPQRSTUV", NUMBER_OF_CHARS);
Let's lay out some expressions using p_char:

     Expression     Meaning

     p_char         pointer to char (char *)
     *p_char        char (first char)   'M'
     *(p_char + 4)  char (fifth char)   'Q'
     p_char[4]      char (fifth char)   'Q'
Of course, with pointer constants, such as array names, you can use either array subscript notation or pointer notation.

To store strings in two dimensions, there are two data structures you can use — an array of pointers or a pointer to arrays. Suppose you declare:

    char *array_p_char[NUMBER_OF_STRINGS];
This allocates memory for an array of NUMBER_OF_STRINGS size, each element of which is a pointer to char (char *). The type of array_p_char becomes, in an expression, pointer to pointer to char (char **). Its value is the address of the first element in the array. If the value of array_p_char is 500, and pointers occupy four bytes, then the value of array_p_char + 1 is 504.

To allocate storage for each of these pointers, we use a loop, such as:

    for (i = 0; i < NUMBER_OF_STRINGS; i++)
        {
        array_p_char[i] = (char *) malloc
                        (NUMBER_OF_CHARS);
        }
The following code puts some data into the allocated memory:

    strncpy(array_p_char[0], "ABCDEFGHIJ",
           NUMBER_OF_CHARS);
    strncpy(array_p_char[4], "QRUSTVWZYZ",
           NUMBER_OF_CHARS);
Let's lay out some expressions using array_p_char:

     Expression           Meaning

     array_p_char         pointer to pointer to char
     *array_p_char        char pointer pointing at
                          "ABCDEFGHIJ"
     *(array_p_char + 4)  pointer to char (fifth element
                          in array_p_char) pointing
                          at "QRUSTVWZYZ",
     array_p_char[4]      pointer to char (as previous),
                          pointing at "QRUSTVWZYZ",
Since array_p_char[i] is a pointer to char, you can use indirection on it as:

     Expression          Meaning

     *array_p_char[0]    char 'A'
     array_p_char[0][0]  char 'A'
     *array_p_char[4]    char 'Q'
     array_p_char[4][0]  char 'Q'
     array_p_char[4][9]  char 'Z'
Alternatively, you can declare a pointer to arrays, as:

    char (*p_char_array)[NUMBER_OF_CHARS];
The parentheses alters the interpretation. The variable p_char_array is a pointer. What does it point to? It points to an array which is NUMBER_OF_CHARS long, each element of which is a char. Or, you could say it is a pointer to char arrays which are NUMBER_OF_CHARS long, symbolized by (char (*) [NUMBER_OF_CHARS]) or (char (*) [10]).

To allocate memory for this pointer to reference, use:

    #define NUMBER_OF_CHARS 10
    int NUMBER_OF_STRINGS = 5;
    p_char_array = (char (*) [NUMBER_OF_CHARS])
        malloc(NUMBER_OF_CHARS * NUMBER_OF_STRINGS);
(The size of the object targeted by p_char_array must be fixed at compiled time, hence the use of the #define.) However, you can use a variable for the number of objects to be allocated, since the number of objects can be changed at run time. Suppose that p_char_array was assigned the value of 2000. Then the value of p_char_array + 1 would be 2010.

You can put some data into p_char_array with:

    strncpy(p_char_array[0], "ABCDEFGHIJ",
           NUMBER_OF_CHARS);
    strncpy(p_char_array[4], "QRUSTVWZYZ",
           NUMBER_OF_CHARS);
Here are some expressions and their meanings.

     Expression           Meaning

     p_char_array         pointer to
                          char[NUMBER_OF_CHARS]
     *p_char_array        char [NUMBER_OF_CHARS] array
                                containing "ABCDEFGHIJ"
     *(p_char_array + 4)  char [NUMBER_OF_CHARS] array
                               (fifth set of 10 chars)
                               containing "QRUSTVWZYZ",
     p_char_array[4]      char [NUMBER_OF_CHARS] array
                               (fifth set of 10 chars)
                               containing "QRUSTVWZYZ",
Since the last three expressions are char arrays, they become in an expression the type pointer to char. Their value is the address of the first element. You can use indirection on them, either explicitly, with the * operator, or implicitly with subscripts, as in:

     Expression           Meaning

     *p_char_array[0]     char 'A'
     p_char_array [0][0]  char 'A'
     *p_char_array[4]     char 'Q'
     p_char_array[4][0]   char 'Q'
     p_char_array[4][9]   char 'Z'
What is the difference between this approach to 2-D arrays and the previous one? With p_char_array, the program computes the starting address of each row every time the subscript is used. With array_p_char, the data structure holds pointers to each row. To find the beginning of a row, the program simply performs a table lookup. This is a space/speed tradeoff.

Now to get back to C++, which is where your original question came from. Unlike malloc, new returns a pointer to the type specified. For example:

    p_char = new char;
allocates one byte of memory and returns a pointer to char to that memory. If you use subscripts with a size, as in:

    #define NUMBER_OF_CHARS 10
    p_char = new char[NUMBER_OF_CHARS];
then the amount of storage that is allocated is the NUMBER_OF_CHARS times the size of the object.

The object which follows new does not have to be a simple variable. So if you have:

    char (*p_char_array)[NUMBER_OF_CHARS];
you can allocate with:

    p_char_array = new char[NUMBER_OF_STRINGS]
                        [NUMBER_OF_CHARS];
This will allocate storage sufficient for NUMBER_OF_STRINGS objects, each of which is the size of an array of NUMBER_OF_ CHARS chars. The type that new returns is (char (*) [NUMBER_ OF_CHARS]), which matches the declared type of p_char_array. Once you have successfully allocated the storage, using p_char_array is no different in C++ than in C.

Memory Allocation

Now for the second part of your question. From which part of memory should one allocate storage for variables and objects? That depends on the size of the object and how long it must remain in memory. Automatic variables are allocated on the stack. malloc and new allocate memory from the heap. In general, the amount of memory available to the stack and to the heap depends on what type of processor you have.

It's generally best to allocate "small" objects (say, <20 bytes) from the stack, provided they don't need to be preserved between function calls. Declare the object within the body of the function; it will become an automatic variable. If a small object must persist in memory between function calls, declare it within the body of that function, but with the static qualifier. It will not be an automatic variable, but will be allocated from the data segment.

On many machines, the stack has a limitation of 64K bytes or less. So large objects, regardless of usage, should not be automatic. If they need to persist throughout a program, declare them static. Otherwise, allocate them from the heap. Of course, if you allocate them from the heap, you should be sure to call free or delete, once you are done with the object.

When programming on an x86 type machine, with the large memory model (where both pointers to data and pointers to functions require four bytes), the differences between far and near pointers are probably insignificant. However, if you are programming under Microsoft Windows using Microsoft Visual C++, you normally use the medium memory model (four bytes for function pointers, two bytes for data pointers). In the medium model, new allocates memory from the near heap, which is only 64K bytes long. This convention dramatically limits the use of new. How can you get more storage in Microsoft C++?

First, declare any pointers that will point to allocated storage as __far pointers. This declaration will force them to be four bytes. For example, if you code:

     char __far * p_char; // Far pointer
     p_char = new char[NUMBER_OF_CHARS]; // but still stuck on near heap
then the storage will be allocated from the near heap, but the address will be converted to a proper four-byte address using heap segment DGROUP, so it will be referenced correctly. Second, use __far as part of the call to new, particularly for built-in types. Coding:

    char __far * p_char;
    p_char = new __far char[NUMBER_OF_CHARS];
forces the storage for p_char to be allocated from the far heap. There are actually four new functions, each of which returns a different address type: __near, __far, __huge, and __based. (I may discuss the latter two address forms in a later column.)

You need to be sure to call the proper delete operator. There are four, corresponding to the four new operators.

For C++ classes, there is another interesting twist. Member functions get a pointer to this. How big is that pointer? The answer is two bytes, under the medium model. For example, suppose you have:

    class A_class
        {
    public:
        A_class() ;
        a_function() ;
        };
        
    A_class __far *p_a_class = new __far A_class;
The program will call the constructor for A_class with a four-byte address for this, but the constructor will expect only two bytes. You can imagine the havoc this misunderstanding can cause. There are two alternatives. First, you can add __far to the member functions, so that they accept a far pointer, as in the following:

    class A_class
        {
    public:
        A_class() __far;
        a_function() __far;
        };
Now if you declare an object of A_class as an automatic, then it has a two-byte this pointer. These functions expect a four-byte pointer; but the compiler is supposed to take care of the conversion for you.

Second, you can assign a memory model to a class, such as:

    class __far A_class
        {
    public:
        A_class();
        a_function() ;
        };
All pointers to A_class and the type of this will be afour-byte far pointers.

As a side warning to those getting into the medium memory model, you should be aware of a hair-pulling, expletive-causing, <fill in your own adjective> possibility of unfathomable bugs in simple-looking statements. Suppose you have:

    class __far A_class
        {
    public:
        char name[100];
        };
If you code:

        A_class a_class;
        char buffer[100];
        sprintf(buffer, "Name is %s", a_class.name);
then sprintf expects a two-byte near pointer, but a class.name is a four-byte far pointer. Of course, there is no type conversion on the arguments, because sprintf is a variable parameter function.

This particular nuance hit me when I was converting a large scale C program to C++, that had a tremendous number of sprintf calls of this type. Watch out for variable parameter functions. If you are starting from scratch — use the strstream class.

Valid Constants

Q

Is 00.0 a valid floating-point constant or an invalid octal constant according to the ANSI standard? This question is motivated by the following experience. A C program contained the following statement:

    float x = 00.;
While compiling this statement, a C compiler (probably not ANSI-compliant) running under AIX on a PS/2, complained that a struct or union was expected. Adding a zero as the tenths digit (00.0) changed the diagnostic to "semi-colon expected." Removing the leading zero eliminated the diagnostic in both cases. Is this a gray area in ANSI and K&R C?

Michael G. Soyka
Warren, RI

I don't believe it's a gray area at all. It appears that the compiler is not interpreting the code according to the Standard. The rule is that the compiler should parse source characters into the longest sequence that forms a token.

The Standard gives the following description of a floating-point constant, in syntax notation:

floating-constant:

    fractional-constant exponent-partopt floating-suffixopt

fractional-constant:

    digit-sequenceopt. digit-sequence
    digit-sequence.

digit-sequence

     digit
       digit-sequence digit

digit one of

     0 1 2 3 4 5 6 7 8 9
The digit-sequence that begins a floating point constant consists of any digit, including 0; this implies that "00." is a valid floating point constant. Similarly, "00000000001." is also a valid floating point constant. An octal constant is described as:

octal-constant

    0
    octal-constant octal-digit

octal-digit:

    0 1 2 3 4 5 6 7
The string "00" is a valid octal value. For comparison's sake, suppose your string had been spaced out as in "00 .". This string would have been interpreted as an octal constant followed by a member operator. The string "00 . 0" would be interpreted as an octal constant followed by a member operator, and another octal constant. Neither of these strings would make any sense.

Your question reminds me of the expression:

    i +++++ i
Using the longest sequence rule, this is parsed as:

    i ++ ++ + i
The compiler reports an error, because with the second ++, you are attempting to increment the expression i++. ++ must apply to a lvalue (reference to memory). If you had added parentheses:

    (i++) + (++i)
the expression would compile. However, its value would be undefined, as explained in previous columns. From somewhere I recall the statement, "Jim where Joe had had had had had had had had had had had the approval of the authorities," as being an English language example of the same style. By adding punctuation, the sentence will make sense.