Columns


Questions & Answers

Byte Alignment

Ken Pugh


Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language courses for corporations. He is the author of C Language for Programmers and All On C, and was a member on the ANSI C committee. He also does custom C programming for communications, graphics, image databases, and hypertext. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@dukemvs.ac.duke.edu (Internet).

Q

I have a rather frivolous question that you or one of the readers of The C Users Journal may be able to answer. Given the following declarations,

register scalar *src, *dest;
register short ct;
where the type scalar is short, int, or long, all of the M680X0 compilers I am familiar with will compile the expression

*dest++ = *src++;
to a two-byte, single instruction move of the appropriate size, and also

*dest++ == *src++;
to a two-byte, single instruction compare. Many are able to optimize the expression

for (ct = length - 1; ct >= 0; ct- -)
   *dest++ = *src++;
to the sequence

MOVE.W length, Dr
BRA *+4
MOVE.s (An)+, (Am)+
DBF Dr, *-2
where s indicates the size of the scalar type and Am, An, and Dr are the registers the compiler allocated for src, dest, and ct, respectively.

My question is whether compilers for the 80X86 can equivalently compile and/or optimize such C code to use the built-in string and repeat or loop instructions.

This question, of course, derives from a running argument I have with a friend who believes Motorola processors do not handle strings efficiently. All of the iNTEL one-byte string instructions have two-byte, single instruction equivalents in the M680X0 instruction set (except for the rarely used down address mode of cmps, which takes two M680X0 instructions to implement). My impression is that the less general iNTEL string set is less accessible to programmers using high-level languages. Thus my question.

Joel Rees
South Salt Lake City, UT

A

Interesting question. I would say that it all depends on the compiler's optimization scheme. If you used a standard subroutine such as strcpy (), memcpy(), or strcmp (), the compiler manufacturer could easily write the function in assembly language to take advantage of the built-in operations. Intel's REPZ prefix in combination with CMPS (compare strings) and SCAS (scan strings) can work extremely fast. I doubt that a compiler would normally use these.

Using standard compiler functions to perform common operations permits it to use faster operations, even if they do not match the data type. For example, memory copying can be performed faster if the size of the data moved matches the bus width. A standard function could be "smart" and do half or a quarter of the move instructions plus maybe a few single character moves and come out faster than moving things singly.

Optimizations get to be really tricky business. The string instructions may be faster than a series of discrete instructions if there is no instruction cache in the processor. The memory fetches for the instructions would outweigh the extra memory cycles for the individual bytes. A large instruction cache would tend to make a set of discrete instructions faster, because no instruction fetches would be required after the initial ones.

You might have coded your example as

ct = length;
while (ct--)
   *dest++ = *src++;
The compiler would have one less optimization to perform.

Q

I have a couple of questions that I hope you can help me with. I have seen mention of them in various books, but frankly I can't remember where. The books did not really explain them anyway to the degree that I want.

Question 1:

In most of the C books that I have studied the first function listed is the main function. After the main function all of the other functions are entered.

main ()
   {
   code
   }
function 1
function 2
function 3
However, I have also seen

function 1
function 2
function 3
main ()
   {
   code
   }
In this case the functions are listed first and then the C main function. What are the advantages and disadvantages of these two different formats. I assume that it has to do with releasing memory that is not being used, but does it really make any difference which format you use?

Question 2:

I have seen a little mention of alignment. I am talking about word and byte alignment of data in memory. Word alignment is supposed to be faster, but there does seem to be a use for byte alignment as well.

I would like to hear some explanation of this subject in greater detail. I know that there isn't a lot of detail to give, but any would help.

By this I assume that how I list my variables is important. It seems I should do the following:

main ()
{
int     one,two,three,four;
float   eny,mine,mo;
char    first,second,third;

program
}
However, some other information I ran across seemed to indicate that alignment has to do with strutures and unions. I certainly would like this cleared up.

Lyle O. Haga
Englewood, CO

A

Answer 1:

It doesn't matter to the compiler or to the output program the order in which you list your functions. To the compiler, main() is just another function, although it has the property of being the one that is called by the program startup code. The linker will pick up the name wherever it occurs in the file.

To the human, the order varies according to purpose. For example, many times I have main first. Other times I keep it in a separate file. Still other times it goes last, especially when it is surrounded by #ifdef DEBUGMAIN / #endif. I frequently put a main() function in files which contain functions that are used as my personal library. The main function contains calls that test the operations of the functions. When I am testing the functions, DEBUGMAIN is #defined. When I am using them in other programs, DEBUGMAIN is not #defined.

Answer 2:

Alignment refers as to the memory boundary on which a data item is stored. The memory bus in a processor can be a byte, two bytes, or four bytes wide. In the case of a byte-wide bus, there are no alignment problems. For a two-byte bus, a single reference to memory produces two bytes at the same time. For example, memory address 0 produces the values for bytes 0 and 1 on the bus. Addressing memory address 2 produces the values for bytes 2 and 3 on the bus. If an instruction only requires one of the bytes, the other is disregarded.

On some processors, if the data item is 2 bytes, it must reside at an even address. If the data item is 4 bytes it must reside at an address divisible by 4.

For individual data items, the processor keeps these stored at whatever alignment is required. Unless you compare the actual addresses, you have no indication of their alignment.

With structures, alignment does pose some interesting problems. For processors that require alignment, the compiler puts filler or packing bytes between members to ensure that alignment is proper. If you print the sizeof of a structure, it may be larger than the total of its members.

On the PC, which has no alignment requirements, some compilers using packing bytes, some do not, and some give an option to go either way.

Readers' Replies

UNIX Process Problems

Tim Reilly had a problem with executing a child process that did not return the proper exit code. This was listed in the April CUJ. The original code appears in Listing 1. Several people responded to a request for comments. Some of the responses are listed below. T. William Wells wrote a few comments about the code. The code has been revised to Listing 2. The line reading

while (wait( (int *) 0 ) == pid);
has been replaced with a clearer set of code. The while loop hides the purpose. In addition, the call to signal has been moved to the top. There is a small but finite probability that an interrupt could occur between the fork and the call to signal. When the exec occurs in the child, signal handling will be reset to default, so that the extra call to signal is not needed.

Keith Neufeld's letter below adds a few more bits of information. To demonstrate the problem further, Listing 3 and Listing 4 show the program modified with some printfs and the corresponding output.

I'll be speaking at the UNIX conference sponsored by Boston University this month. I'll check with UNIX gurus there to see what the problem might be. (KP)

I am fascinated and confounded by Tim Riley's all_true program/problem in the April CUJ. I hope to God you didn't just dig up some obscure feature of UNIX to foist upon us unsuspecting readers as an April Fool's joke; if you did, I'm biting, and hard. It'll be good education for me, anyway.

Two aspects of this make it particularly interesting for me. The first is that, as Tim mentioned in the last paragraph, if you change RET_VALUE to something other than 0, the program always returns RET_VALUE. The second is that if you decide to catch SIGQUIT instead of SIGINT, the program always returns RET_VALUE, even if it is 0, unless of course you stop it with SIGINT or some other signal it's unprepared to catch.

One thing: I don't know of any particular advantage gained by splitting the program into main () and process (); it seems to function the same if process() is included inline, and, for me, at least, makes it more readable. Oh — and it always returns 2000 when it catches SIGINT on our system, an NCR Tower SysV, also.

Keith Neufeld

qsort Comparison Function

Let me just drop a note on the Firdaus Irani question in the December 1990 issue of The C Users Journal, page 92, using qsort. By definition, any compare function always has the prototype:

int compare(const void *, const void *);
But, as always in C, the programmer is free to manipulate user-defined functions to conform to this standard. I've seen extensive typecasting and double function calls, but in reality, a much simpler solution can and will do. First, define a new type with

typedef int (*_QSORT_CMP_)(const void *, const void *);
then, later in the code:

....
qsort(array, nelem, elemsize, (_QSORT_CMP_)user_function);
....
user_function can now be anything you desire, receiving pointers to any kind of structure. Its only requirement is to return integer type uniquely defining the relation between two received parameters (<0,0,>0). For example,

int user_function(unsigned char **a, unsigned char **b)
    {
    return( strcmp(*a, *b) );
    }
This typecast will not produce any error or warning messages. However, because the programmer explicitly overrode the function declaration, the compiler is not responsible for catching casting errors. Actually, the compiler will not produce any errors even if user_function is now declared to be receiving more (or less) than two parameters! Beware indeed!

This typecasting can be useful when dealing with search functions such as bsearch. Imagine a structure defined as:

typedef struct {
   int len;
   int ID;
    int year;
    ....;
   } DATA;
Then let's have an array of such structures, initialize each element and then sort an array by ID:

#define NELEM(array)
(sizeof(array)/sizeof(array [0]))

DATA data_array[...];
int i;

for ( i=0; i < NELEM(data array) ; i-- )
   initialize_data_array(&data array[i] );
qsort(&data_array [0],NELEM(data_array), \
   sizeof(DATA), (_QSORT_CMP_)usr_fn1);
One can now request and find an element in this array just by using an ID uniquely identifying each element:

int ID;
DATA *D;

D =  (DATA *) bsearch(&ID, &data_array[0], \
     NELEM(data_array), sizeof(DATA). \
     (_QSORT_CMP_)usr_fn2);
User-defined functions are defined as

int usr_fn1(DATA *D1, DATA *D2)
{
return ( D1->ID - D2-> );
}

int usr_fn2(int *key, DATA *D)
   {
   return ( *key - D->ID );
   }
Use of usr_fn2 in bsearch is highly compiler-dependent, but serves its purpose when the programmer knows that first parameter for the search function will always be a key. Again, this is not always true, check your compiler's runtime library documentation. Again, be absolutely sure to check this, because it is not guaranteed in generic description of search function.

To relate this example to a real situation, where sort'n'searches were extensively used, I've recoded the search function in assembly language, and could guarantee a key to be the first parameter passed to the associated compare function. See Listing 5.

Sergio Cherskov
Las Vegas

Thanks for your reply. With casts, the programmer can stick him/herself in the foot, as your code shows. That actually can get around a problem that was posed at the X3J11 committee meetings. One would like to have an array of function pointers that contains different function types.

You can still specify something like

int function1(int, int, char *);
int function2(char *, int);
   ...
int (*function_array[5]) () = {function1, function2...};
However the use of ()s without any parameter specification is deprecated.

However, using casts, it can now be specified as

typedef int (*void_parms)(void);

int (*function_array[5]) (void) = \
   {(void_parms) function1,
   (void_parms) function2...};
There is no need to use two functions for comparison. Your comparison function usr_fn1 can be used for both the qsort and the bsearch functions. Both functions call the comparison function with two addresses of elements in the array.

You could code this as

DATA data_search;
   ...
   data_search.ID = ID;
   D = bsearch(&data_search, \
       &data_array[0],
     NELEM(data_array),
     sizeof(DATA), (_QSORT_CMP_) usr_fn1 );
If the sort required involved more than a single member of the structure, you would have to code it in this style.

To avoid the unsightly cast, some readers (Peter Torkelson of Kent, WA and and Esmond Hart of Auckland, New Zealand) suggested using a typedef as

typedef int (*compare_t) \
   (const void *, const void *);
Maybe this name could even be standarized, since it is useful for many different functions. (KP)

.EXE Files

I got a letter from Alex Robinson regarding a program that allows you to insert data items in an .EXE file. The program also packs the .EXE file, better than EXEPACK.

I tried the program out on my hypertext program, InfoNavigator. It works well. The results were

/* Original file */
LINKBTG EXE  238224
/* EXEPACK version */
BROWSER1 EXE  225672
/* Tinyprog version */
BROWSER EXE  146656
The program is available from the Tranzoa, Co. P.O. Box 911, Maple Valley, WA 98038. (206) 432-3532. Cost is $15. (KP)