February 1992/Doctor C's Pointers

Columns

Doctor C's Pointers®

Data Structures, Part 9

Rex Jaeschke

Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementors of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA 22091 or via UUCP at rex@aussie.com.
A stack is a list in which nodes can be added only at one end. Nodes can be accessed or removed only from that same end. When a new node is added it hides the previous end node, so in order to access that previous node the new node must first be removed. That is, the list works in a Last-In First-Out (LIFO) manner. The most recently added node is the first available for use, while the first node added is the last one available for use.
An often used example of a stack is that of a stack of plates in a cafeteria. The plates are stacked one on top of the other inside a spring-loaded hopper. As you take a plate off the top of the stack the weight of the whole stack decreases and the spring raises the remaining stack up to the top of the hopper. Since the term stack implies some sort of vertical orientation we often talk about stacks having a top and a bottom. And the act of adding to the top is commonly referred to as pushing onto the stack while that of removing from the top is called popping off the stack.
A stack need not have any particular orientation, but the terms push and pop are still typically used. For example, a paper napkin dispenser can push backward and pop forward; while a paper cup dispenser, as found at airplane water fountains, pushes upward and pops downward. In these mechanical stacks, push and pop operations typically do physically move the underlying stack. However, in a stack of memory this is not always the case. Although pushing onto a stack in a computer program results in a new top node being added, it typically will not affect the underlying nodes. To do so would be most inefficient and is equivalent to inserting and deleting a node from the beginning of an array.

Some Applications
Like other kinds of data structures, a stack has its strengths and weaknesses. When comparing one method of representation over another you must look at the kind of data being stored as well as the methods in which you will need to access it. Apart from storing plates, just how might a stack be used?
Applications involving tree structures can lend themselves to using stacks. Consider a nested hierarchy that starts from the top and branches out below in a number of directions. Each of those directions branches out into yet other directions, etc. An example of such a model would be a program written in a procedural language such as Fortran, COBOL, Pascal, C, or C++. The main program calls subroutines which in turn call other subroutines, etc. A block structured language, such as Pascal or C, also allows nesting of blocks within a single subroutine. And C++ provides a complex data and function nesting capability via its inheritance mechanism.
Such a language uses a stack by selecting a particular logic path in the hierarchy and following it down to its end branches, pushing things onto the stack as it goes. When it reaches the end of the path, it reverses direction, popping items off the stack as it traverses the path back to the beginning, where it takes another downward path.
Computers handle a very common problem, that of solving arithmetic expressions, using a hierarchical organization. Consider the four binary operators: addition (+), subtraction (-), multiplication (*), and division (/). Mathematically, you can write arbitrarily complex expressions involving just these operators and their operands. For example,

a + b - c * d + e / f - g
Just as in mathematics, computer languages have rules for operator precedence. In the case of C and C++, multiplication and division have the same precedence. Their precedence is higher than that shared by addition and subtraction. To determine the precedence of two operators at the same precedence level, C and C++ provide the concept of associativity. Operators with the same precedence associate left to right. Therefore, in C and C++, the grouping of terms in the previous expression is interpreted as

((((a + b) - (c * d)) + (e / f)) - g)
The tree in Figure 1 represents this expression, clearly showing the hierarchy.
To evaluate such as expression, a C or C++ program needs to get to the leaves at the bottom of the branches. Going from left to right, it first evaluates a + b then saves the result in some temporary location. Next it evaluates c + d and subtracts the result from the value stored in the temporary location. It then stores the new result back in the temporary location. It continues to traverse the tree like this until it puts the final result back in the temporary location.

A Primitive Expression Evaluator
A C or C++ program can implement such an expression evaluator using a stack. It first pushes both operands on the stack, pops them off to form the result after applying their operator, and pushes that result back on the stack where it can be used later. So the value on the top of the stack represents the result of the expression processed thus far.
A study of a primitive implementation of an expression evaluator begins with a look at the stack manipulation functions as shown in Listing 1.
Assume all terms have values that can be represented correctly in a signed int. As such, the stack is an array of int and the size of the stack is determined by the macro STACK_TOP. The variable stack_ptr maintains the index of the next available slot on the stack. (It takes its name from the Stack Pointer, a special purpose hardware register used in many CPUs to maintain the current stack top.) Since the stack is initially empty, stack_ptr is initialized to 0. The type of stack_ptr is size_t, an unsigned integer type defined in stdio.h and other headers. Because the stack is not very large, the type int could just have easily been used. However, using size_t makes the solution maximally robust regardless of the value of STACK_TOP.
Because the stack need only be directly accessed by the functions push () and pop (), it, along with the stack pointer, is static. This allows these two objects and two functions to be isolated from other source modules.
To push a value on the stack, a C or C++ program first checks to see if the stack is full. If the stack is not full, the program pushes the new value on the stack and increments the stack pointer. Otherwise, it complains that the stack is full.
To pop a value off the stack, a C or C++ program first checks to see if the stack is empty. If the stack is not empty, it removes the value and decrements the stack pointer. Note that when popping, a program does not actually remove a value from the array; it simply adjusts the stack pointer so that subsequent push operations can overwrite the values previously popped. If the stack is empty, the program complains, returning a dummy value of zero.
The four functions in Listing 2 correspond to the four operators in Listing 1.
Addition and multiplication are almost identical and they are the simplest. When applying an operator, a C or C++ program assumes that the two operands have already been pushed on the stack, left operand first. So to add two values, the program simply pops them off the stack, adds them together, and pushes the result back on the stack. Note that the order of evaluation of the two terms across the four binary operators is unspecified by C. That is, in pop() + pop(), it doesn't know which call to pop () will be made first. It so happens, that it doesn't matter for addition and multiplication, since these operators are commutative. That is, a + b is equivalent to b + a, and likewise for a * b and b * a.
Because subtraction and division are not commutative, a program must know exactly which operand it is getting when calling pop(). To do this, it needs to have a sequence point between the two calls to pop(). The most common way to achieve this is to put them into different statements. This does, however, require the temporary variable temp.
A useful feature for divide would be to check for a zero denominator and, on finding one, complain and return a zero value just like in the case of stack underflow.
Listing 3 contains the main program that exercises these functions.
Some sample inputs and outputs are
Enter expression: 12 + 56
Result = 68
Enter expression: 23 - 40
Result = -17
Enter expression: 123 * 12
Result = 1476
Enter expression: 12345 / 7
Result = 1763
You might think the four operator functions are trivial enough they could be implemented as macros. The most obvious attempt at this results in something like Listing 4.
The problem comes with subtract () and divide() where you need a temporary int. To get this you must make the macro definition a block and that precludes its use from a number of common places, including the main program in Listing 3. The fragment involving the call to subtract
else if (op[0] == '-')
    subtract();
else if (op[0] == '*')
expands to the code in Listing 5, resulting in the true path of the if having two statements — a block statement and a null statement. As such, the compiler complains about the dangling else that follows.
The program in Listing 3 doesn't actually evaluate nested expressions, just one pair of operands and an operator at a time. The program in Listing 6 does more but only for a specific instance of the expression
(a + b) * (c - d);
Tokenizers And Parsers
Handling arbitrarily complex expressions, as does a compiler or interpreter, requires machinery beyond the scope of this column. For example, it requires a routine to break free-form source into tokens. Such a routine is often called a tokenizer. While it is not too difficult to write your own, the lex utility (available with UNIX systems, as a third-party tool from MKS and others) and the public domain version Flex, can be used to automatically generate tokenizers.
To handle all binary, unary, and ternary operators in arbitrarily complex expressions we need a parser. And again, while you could write your own, yacc is a commonly available parser generator (as are GNU's bison and Holub's occs, among others).
If you have interest in pursuing tokenizing and parsing further you may wish to refer to the book The UNIX Programming Environment from Prentice-Hall by Kernighan and Pike. The whole of chapter 8 is dedicated to using these tools to implement a quite powerful calculator called hoc. And it uses a stack with push () and pop () functions along the way.

Hardware Stacks
Many computers have one or more stacks implemented in hardware. These are used to pass information into, and sometimes back from, a function. A common implementation of C involves allocating all automatic data on such a stack on entry to its parent function. In such cases though, the variables are not popped off the stack each time they are used. Instead, the compiler generates references to them using the stack pointer and an offset. For example, when the program

void f(void) { char c[5]; int i = 3; unsigned int u = 4; c[1] = 'A'; c[3] = 'B'; }
is compiled by Borland's C compiler, the relevant part of the code generated is shown in Listing 7.
Because automatic variables are often stored on a stack some interesting things can happen. Consider the program in Listing 8.
The function f() is called three times in succession and each time on entry, the compiler allocates space on the stack for j and frees it on exit. It is very likely in this case that each time j is allocated, it occupies the exact same location in memory and so the contents it had from the previous iteration will be seen at the start of the next. As such, the output resulting could be
j = 110
j = 111
j = 112
Note though that when automatic data is allocated on the stack, it has no guaranteed default initial value. That is, while space is allocated on the stack for j it isn't really pushed. That would have resulted in an explicit value being placed there.
A more interesting possibility is the following
j = 0
j = 1
j = 2
Here the garbage initial value just happens to be zero, as if j had been static.
A smarter compiler would likely recognize that it can throw away the increment to j since j is automatic and, as such, does not retain its value across function calls. This then would result in something like
j = 1515
j = 1515
j = 1515
As we have said, even though values are popped off a stack, they still actually reside there. So when that stack space is allocated for another purpose, unless it is initialized, it will take on the residual value left from previous uses. And this can give rise to some interesting addresses when you allocate an automatic pointer without initializing it.

Other Considerations
The above solutions and discussion skip over or bypass a number of important considerations. First, the stack is limited to ints. In reality we would want something else. Certainly, an array of doubles significantly increases the possibilities, but it would lose precision on large integers. Changing push () and pop() to handle double values is trivial. Try it. You might also think about having an array of unions where the union's member set comprises all the types you intend to store on the stack. You might also find it useful to record the current type stored in the union as well.
By implementing the stack as an array you have a fixed limit on its size. And while you could obtain the array at runtime using malloc(), it still might be useful to have a more general solution. For example, you could implement a stack as a singly-linked list. The root always points to the top node and each node points forwards to the next. The stack pointer would then be the address of the next available node.
When using a linked list, it might be inefficient to actually allocate a node on each push request and free one on each pop. It might be better to only allocate a new node when the current list is full but never free any. You might also wish to allocate nodes in clusters rather than to do them one at a time.
Whatever approach you take to representing the stack, you should always have a way of detecting and handling stack overflow and underflow. In the DOS world, it is common to find C compilers calling a stack probe routine immediately on entry to a C function. The probe checks to see if allocating the automatic variables needed by this function will overflow the stack. If it will, the probe function terminates the program with a message to that effect. If it will not overflow, the probe returns, the space is allocated, and the program continues. If you disable this stack probe via a compiler option or a #pragma directive, watch out, since now stack overflow may well cause your system to hang.