May 1994/Questions and Answers

Columns

Questions and Answers

Using C Libraries in C++

Kenneth Pugh

Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C and C++ language courses for corporations. He is the author of All On C, C for COBOL Programmers,and UNIX for MS-DOS Users, and was a member of the ANSI C committee. He also does custom C/C++ programming and provides SystemArchitectonicssm services. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@allen.com(Internet) and on Compuserve 70125,1142.
Q
I have been trying unsuccessfully to use the qsort function to sort data in one of my classes. Following is a simple example. I need to sort the array of x and y values in various ways. Thinking about qsort, which allows you to sort an array given a compare function, this route seemed perfect. The prototype for the compare function is:
int (*compare)(const void *element1, const void *element2);
At first I thought I would declare the compare function as follows:

extern "C" { int compare_function(const void *a, const void *b) { ... } }
However this did not work because compare_function needs access to DataType (see Listing 1) , which suggested a member function. After trying to declare it as a member function, my compiler complained as follows:
File RA.CPPS; Line 317
Error: cannot implicitly convert
from: int (foov::*Cpp func)
         (void const *, void const *)
to  : int (*C func)
         (void const *, void const *)
So I tried:
extern "C"
   {
   int compare_function(const void *a, const void *b);
   }
then the compiler complained that I was missing a semi-colon. I am totally lost why this does not want to work for me. If you have any suggestions or comments I surely would like to hear from you. I am using Symantec C++.
Phil S. Bolduc
Kamloops, B.C.
A
It appears you have a Catch-22 situation here. You need to pass qsort the name of a function, presumably one written in C. However, you need a C++ function in order to access the private members of class A. The compare function needs to be a friend function or a member function of class A, so it can get at the private members. In this case, it needs access to the template for struct DataType.
C++ function names are typically mangled by the compiler to include information on the types of parameters. The names of C functions are not mangled. qsort should not care whether the function pointer passed to it points to a function whose name was mangled or not. The function name does not exist once the linker has resolved it. However you cannot pass a C++ function to qsort, as the calling conventions for C++ functions may differ from those of C functions. For example, with Microsoft 7.0 the right argument for C functions is pushed onto the stack first. For C++ functions, the left argument is pushed first and the value for this (the implicit pointer argument for member functions) is pushed last. In C, it is the calling function's responsibility to pop the stack. In C++, it is the called function's job. You could write compare_function to use the parameters in the reverse order that they are passed to it. However, you will still have a stack popping problem.
Borland C++ appears to bypass the C/C++ incompatibility problem by defining a non-standard _USERENTRY keyword. This keyword is applied to any function whose address is passed to another function such as qsort. The calling conventions for functions declared with this keyword appear to be the same, regardless of the language.
What you really want is to be able to call a C++ function from C. It would be easy if the C version of your compiler had a statement such as:
extern "C++"
   {
   int compare_function(const void *a, const void *b);
   }
This doesn't exist in the C Standard for the obvious reason that it was developed before the C++ standards effort began. [Actually, it was proposed before the C Standard was frozen, and rejected — pjp] Whether it might ever get added in the future is another question. I had a few cases in porting portions of code from C to C++ in which it might have been handy at the time. However I think it would have been a maintenance nightmare had I been able to use it.
The simplest answer to your problem is to define the struct DataType template outside of the class. I would use a naming prefix to make it less likely to clash with other names and to show its intended usage. You might call it class_A_DataType. That way the compare function can be written in C.
A better answer is to get hold of the source for qsort and recompile it as a C++ function. Still better is to rewrite it as a template function. That way you can avoid using void * parameters. I dislike void * parameters since they avoid type-checking. For example, your function could start as:
template <class T>
void qsort(T array[], unsigned int size,
         int (*p_compare_function)
         (const T &, const T &))
       {
       ///...
       }
Note that the parameter giving the size of an element is not needed, since that will be implicit inside the function when it is instantiated for a particular type. The definition of a compare function that could be passed to qsort is:
int A::compare_function(const & first,
                    const & second)
   {
   if ( first.x > second.x )
      return +1;
   else if ( first.x < second.x )
      return -1;
   else
      return 0;
   }
This function corresponds to the compare function in your example. I prefer using reference parameters, but you could stay with pointers. If you only want to sort arrays of objects as a whole, you could define this function as:
enum Sort_direction
{SORT_ASCENDING, SORT_DESCENDING};

template <class T>
void sort_array(T array[], unsigned int size,
             Sort_direction sort_direction)
   // Restriction - T must have operator >()
   {
   ///...
   }
The compare function does not need to be passed to this function, as it is assumed that the class will have an operator >() as a member function. I added the sort_direction parameter to permit sorts in both orders.

Function Prototypes
In a recent C++ class, a student asked about function prototypes. In many of the books he had read, he saw prototypes without names, such as:

string_copy(char *, const char *);
In fact, he commented that he didn't think parameters were allowed, because he never had seen them. Names are permitted of course. The names of the parameters only exist within the scope of the prototype. They do not have to agree with the names of the parameters in the function definition. For example, you might have:

string_copy(char * s1, const char * s2);
in the prototype and

string_copy(char * s2, const char * s1) { ... }
as the definition. The names in the prototype are important as documentation, however. They help specify the contract between the provider of the function and the user. The names should be meaningful, such as:

string_copy(char * destination, const char * source);
In ANSI C, function pointers can involve prototypes. For example in Classic C, the declaration:

int (*p_function)();
is a pointer to a function which may take any number or type of parameters. The ANSI standard deprecates the use of this declaration. Pointers to functions should include the parameters as part of the type. For example,

int (*p_function_with_no_parameters)(void); int (*p_function_taking_int)(int); int (*p_function_taking_int_double(int, double); function_with_no_parameters(void); function_taking_int(int); function_taking_int_double(int,double); p_function_with_no_parameters = function_with__no_parameters; p_function_taking_int = function_taking_int; p_function_taking_int_double = function_taking_int_double;
You cannot use a pointer to one type of function to point to another type. For example:

p_function_with_no_parameters = function_with_taking_int
yields a compiler error. The one place in which this change affects programming the most is with functions that take function pointers as parameters. qsort, as shown in the answer to the previous question, is one such function. The prototype for qsort is:

qsort(void * array, unsigned int number_elements, size_t size_of_element, int (*compare_function) (const void *, const void *));
If you used qsort with an integer array, you might also use:

int_compare(const void * p_first, const void * p_second) { int a = * (int *) p_first; int b = * (int *) p_second; // Compare a to b } #define SIZE_ARRAY 10 int int_array[SIZE ARRAY]; qsort(int_array, SIZE_ARRAY, sizeof(int), int_compare);
int_compare has been shown with void * parameters so that its header matches that required by qsort. I am not particularly wild about that, since int_compare really uses int * parameters, rather than void * parameters. Alternatively you could code:

int_compare(const int * p_first, const int * p_second) { // compare first to second }
and use:

qsort(int_array, SIZE_ARRAY, sizeof(int), (int (*)(const void *, const void *)) int_compare);
The cast (int (*)(const void *, const void *)) converts the type of int_compare to that required by qsort. P.J. Plauger reminds me to tell you that this admittedly is more honest, but not guaranteed to be portable. The C Standard permits different calling sequences for functions with different argument types. Although this may look a little ugly, you can always hide it behind a typedef or a #define.

Structure Alignment
I've had several questions on structure alignment, particularly with the optimization of member access. These questions come up especially with newer 64-bit processors, which access data on eight-byte boundaries. However the same questions apply to processors which prefer access on two-byte boundaries. Some processors can only access particular data types on multi-byte boundaries. For example, on an IBM System/360, floating point numbers had to be stored at addresses evenly divisible by 4. On other processors, such as the 8086, data types can be stored at any address. On advanced processors such as the 80486, memory is accessed in multi-byte chunks. It will take fewer memory accesses to access multi-byte data types if they are stored on byte addresses divisible by the width of the memory bus.
Most compilers have an option that forces structure members to be aligned on the byte boundary of the programmer's choice. Access to data on natural byte boundaries (e.g., the width of the memory bus) results in quicker retrieval, and sometimes even smaller code. However there may be wasted space due to unused bytes (called packing bytes) in the structure that are used to align the members. Forcing a structure to be as compact as possible requires more code and more time to access the individual members. The packing option for compilers varies. For Microsoft, it is -Zp followed by the alignment value (1, 2, or 4).
The question is to pack or not to pack. Suppose you have the following template for a structure in your program.

struct s_mixed_up { long 1ong_member; char char_member; long another_long_member; char another_char_member; };
The first thing you have to determine is whether the order of the structure is important. For pure efficiency purposes, an order such as:

struct s_mixed_up { long long_member; long another_long_member; char char_member; char another_char_member; };
will always produce an equal or smaller data object, regardless of the packing option used. This is because the char data types are grouped together. The second determination is whether you have to match the layout of previously stored data that uses this structure template. If so, you will have no choice in using what packing options you can use. If neither of these cases apply, you are free to decide the packing you want for your structures. The decision is based on a tradeoff between code size and data size. A fully packed data structure will take less data space. However the code to access the data members may take up a lot of room. On a DEC Alpha, it takes only two instructions to fetch and use a long which is aligned on an eight-byte boundary. It takes twelve instructions to fetch a long that is part of a packed data structure.
If you have large arrays of structures, then packing may pay off in the total size of code and data. If you only have a few variables of a particular structure template, then the additional code may well exceed the savings in data space.
In this same vein, bit fields are a way of packing information into a structure. Like the packing bytes for normal members, whether you use bit fields or not depends on whether data size or code size is more important to you. C++ extends the possibilities for bit fields. In C, the base type of an element containing bit fields can only be declared as int, signed int, or unsigned int. An implementation may use a data type smaller than an int (such as a byte), if the size of the bit field can fit into that type. For example, given

struct s_bit_field { int five_bits : 5; int another_five_bits: 5; int still_another_five_bits: 5; };
an implementation may put five_bits, another_five_bits, and still_another_five_ bits all into a single 16-bit int (or larger). Alternatively, it may put each one into its own byte (for faster access with possibly no shifting). If a subsequent bit field can fit into the same storage space as the previous field, then it would be packed together. For example, if an implementation did use bytes, then

struct another { int four_bits : 4; int another_four_bits: 4; };
would use a single byte for both four_bits and another_four_bits. In C++, a bit field may be any integral type (char, short, int, long, and signed/unsigned variations), as well as an enumerated type. For example:

struct s_bit_field { char five_bits : 5; char another_five_bits: 5; char still_another_five_bits: 5; };
would allocate a byte for each of five_bits, another_five_bits, and still_another_five_bits. The layout of bit fields is inherently machine dependent. I do not recommend using bit fields unless you are trying to match the bit fields of a particular machine register. In that case, the structure is explicitly tied to a particular machine and non-portability of layout does not matter.
If you are using the bit fields to store just a few boolean values, then it may make more sense to use char for those variables instead. The code to do the shifting and masking will be eliminated. The total data size to store the variables may not be any greater.