October 1992/Standard C

Columns

Standard C

The Header <string.h>

P.J. Plauger

P. J. Plauger is senior editor of The C users Journal. He is secretary of the ANSI C standards committee, X3J11, and convenor of the ISO C standards committee, WG14. His latest books are The Standard C Library, published by Prentice-Hall, and ANSI and ISO Standard C (with Jim Brodie), published by Microsoft Press. You can reach him at pjp@plauger.com

Introduction
The functions declared in <string.h> form an important addition to Standard C. They support a long tradition of using C to manipulate text as arrays of characters. Several other languages better integrate the manipulation of text strings, SNOBOL being a prime example. All that C incorporates in the language proper is the notation for null-terminated string literals such as "abc". The Standard C library provides all the important functionality. These functions manipulate three forms of strings:

Functions whose names begin with mem manipulate sequences of arbitrary characters. One argument (s) points to the start of the string — the lowest subscripted element. Another (n) counts the number of elements.

Functions whose names begin with strn manipulate sequences of non-null characters. The arguments s and n are the same as above. The string ends just before the element s[n] or with the lowest value of i for which s[i] is zero ('\0'), whichever defines a shorter sequence.

All other functions whose names begin with str manipulate null-terminated sequences of characters. These functions use only the argument s to determine the start of the string. Each group has its distinct uses, as you might expect.
What you might not expect are several design lapses in these functions. The functions declared in <string.h> are not the result of a concerted design effort. Rather, they represent the accretion of contributions made by various authors over a span of years. By the time the C standardization effort began, it was too late to "fix" them. Too many programs had definite notions of how the functions should behave. Some of the problems are:

Many of the functions that search, return a null pointer when the search fails. You have to capture the return value and test it before you can safely use it further. A pointer to the end of the string is just as good a failure code and much more usable in expressions.

The functions that copy, return a pointer to the start of the destination area. That is sometimes useful in a larger expression, but the address of the end of the copy is more informative. You can perform multiple copies more effectively with the latter return value than with the former.

The names of some functions are mysterious. strcspn and strpbrk, for example, do not loudly proclaim what they do.

The set of functions is incomplete and inconsistent. strnlen and memrchr are two sensible additions, for example, whereas strncat is surprising.
Despite these aesthetic gripes, I find the functions declared in <string.h> to be both important and useful. Several of them are, in fact, leading contenders for generating inline code. Many C programs use these functions, and use them a lot. They are worth the effort to learn and to optimize.
The header <string.h> contains quite a few functions. I will be describing them in several installments. The first group consists mostly of the simple functions that copy and concatenate strings, plus a few miscellaneous additions.

What the C Standard Says
The reading from the scripture this month occurs in two installments. Here is the first:

7.11 String handling <string.h>

7.11.1 String function conventions
The header <string.h> declares one type and several functions, and defines one macro useful for manipulating arrays of character type and other objects treated as arrays of character type.133 The type is size_t and the macro is NULL (both described in 7.1.6). Various methods are used for determining the lengths of the arrays, but in all cases a char * or void * argument points to the initial (lowest addressed) character of the array. If an array is accessed beyond the end of an object, the behavior is undefined.

7.11.2 Copying functions

7.11.2.1 The memcpy function

Synopsis

#include <string.h> void *memcpy(void *s1, const void *s2, size_t n);

Description
The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.

Returns
The memcpy function returns the value of s1.

7.11.2.2 The memmove function

Synopsis

#include <string.h> void *memmove(void *s1, const void *s2, size_t n);

Description
The memmove function copies n characters from the object pointed to by s2 into the object pointed to by s1. Copying takes place as if the n characters from the object pointed to by s2 are first copied into a temporary array of n characters that does not overlap the objects pointed to by s1 and s2, and then the n characters from the temporary array are copied into the object pointed to by s1.

Returns
The memmove function returns the value of s1.

7.11.2.3 The strcpy function

Synopsis

#include <string.h> char *strcpy(char *s1, const char *s2);

Description
The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.

Returns
The strcpy function returns the value of s1.

7.11.2.4 The strncpy function

Synopsis

#include <string.h> char *strncpy(char *s1, const char *s2, size_t n);

Description
The strncpy function copies not more than n characters (characters that follow a null character are not copied) from the array pointed to by s2 to the array pointed to by s1.134 If copying takes place between objects that overlap, the behavior is undefined.
If the array pointed to by s2 is a string that is shorter than n characters, null characters are appended to the copy in the array pointed to by s1, until n characters in all have been written.

Returns
The strncpy function returns the value of s1.

7.11.3 Concatenation functions

7.11.3.1 The strcat function

Synopsis

#include <string.h> char *strcat(char *s1, const char *s2);

Description
The strcat function appends a copy of the string pointed to by s2 (including the terminating null character) to the end of the string pointed to by s1. The initial character of s2 overwrites the null character at the end of s1. If copying takes place between objects that overlap, the behavior is undefined.

Returns
The strcat function returns the value of s1.

7.11.3.2 The strncat function

Synopsis

#include <string.h> char *strncat(char *s1, const char *s2, size_t n);

Description
The strncat function appends not more than n characters (a null character and characters that follow it are not appended) from the array pointed to by s2 to the end of the string pointed to by s1. The initial character of s2 overwrites the null character at the end of s1. A terminating null character is always appended to the result.135 If copying takes place between objects that overlap, the behavior is undefined.

Returns
The strncat function returns the value of s1.
Forward references: the strlen function (7.11.6.3).
Footnotes
133. See "future library directions" (7.13.8).
134. Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will not be
null-terminated.
135. Thus, the maximum number of characters that can end up in the array pointed to by s1 is strlen.(s1)+n+1.
Here is the second quote from the C Standard:

7.11.6 Miscellaneous functions

7.11.6.1 the memset function

Synopsis

#include <string.h> void *memset(void *s, int c, size_t n);

Description
The memset function copies the value of c (converted to an unsigned char) into each of the first n characters of the object pointed to by s.

Returns
The memset function returns the value of s.

7.11.6.2 The strerror function

Synopsis

#include <string.h> char *strerror(int errnum);

Description
The strerror function maps the error number in errnum to an error message string.
The implementation shall behave as if no library function calls the strerror function.

Returns
The strerror function returns a pointer to the string, the contents of which are implementation-defined. The array pointed to shall not be modified by the program, but may be overwritten by a subsequent call to the strerror function.

7.11.6.3 The strlen function

Synopsis

#include <string.h> size_t strlen(const char *s);

Description
The strlen function computes the length of the string pointed to by s.

Returns
The strlen function returns the number of characters that precede the terminating null character.

Using the Functions
You use these functions declared in <string.h> to copy and concatenate strings of characters. You characterize each string by an argument (call it s) which is a pointer to the start of the string.

If a string can contain null characters, you must also specify its length (call it n) as an additional argument. n can be zero. Use the functions whose names begin with mem.

If a string may or may not have a terminating null character, you must similarly specify its maximum length n, which can be zero. Use the functions whose names begin with strn.

If a string assuredly has a terminating null character, you specify only s. Use the remaining functions whose names begin with str.
Beyond this simple categorization, the string functions are only loosely related. I describe each separately.
memcpy — If you can be certain that the destination s1 and source s2 do not overlap, memcpy(s1, s2, n) will perform the copy safely and rapidly. If the two might overlap, use mem-move(s1, s2, n) instead. Do not assume that either function accesses storage in any particular order. In particular, if you want to store the same value throughout a contiguous sequence of elements in a character array, use memset.
memmove — See memcpy above.
strcat — If you have only two strings s1 and s2 to concatenate, or just a few short strings, use strcat(s1, s2). Otherwise, favor a form such as strcpy(s1 += strlen(s1), s2). That saves repeated, and ever-lengthening, rescans of the initial part of the string. Be sure that the destination array is large enough to hold the concatenated string. Note that strcat returns s1, not a pointer to the new end of the string.
memset — This is the safe way to store the same value throughout a contiguous sequence of elements in a character array.
strcpy — If you can be certain that the destination s1 and source s2 do not overlap, strcpy(s1, s2) will perform the copy safely and rapidly. If the two might overlap, use memmove(s1, s2, strlen(s2) + 1) instead. Do not assume that either function accesses storage in any particular order.
strerror — Use strerror(errcode) to determine the null-terminated message string that corresponds to the error code errcode. errcode should be errno or one of the macros defined in <errno.h> whose name begins with E. Be sure to copy or write out the message before you call strerror again. A later call can alter the message. If you simply want to write to the standard error stream a message containing strerror(errno), see perror, declared in <stdio.h>.
strlen — Use this function wherever possible to determine the length of a null-terminated string. It may well be implemented with inline code.
strncat — The strn in strncat(s1, s2, n2) refers to the string s2 the the function concatenates onto the end of the null-terminated string s1. The function copies at most n2 characters plus a terminating null if it doesn't copy a terminating null. Thus, strlen(s1) increases by at most n2 as a result of the call to strncat. That makes strncat a safer function than strcat, at the risk of truncating s2 to length n2.
strncpy — If you can be certain that the destination s1 and source s2 do not overlap, strncpy(s1, s2, n2) will perform the copy safely. Note, however, that the function stores exactly n2 characters starting at s1. It may drop trailing characters, including the terminating null. It stores additional null characters as needed to make up a short count. If the two areas might overlap, use memmove(s1, s2, n2) instead. (You must then store the appropriate number of null characters at the end, if that is important to you.) Do not assume that either function accesses storage in any particular order.

Implementing the Functions
The functions declared in <string.h> work largely independent of each other. The only exception is the pair strcoll and strxfrm. They perform the same essential operation two different ways. I discuss them in a later installment. The copy and concatenate functions each perform a fairly simple operation. Here, the challenge is to write them to be clear, robust, and efficient.
Listing 1 shows the file string.h. As usual, it inherits from the internal header <yvals.h> definitions that are repeated in with the header <stddef.h. (See Standard C, CUJ Aug.1991.)
Only the function strerror has a masking macro. It shares the internal function _Strerror with the function perror, declared in <stdio.h>. (That lets each of the higher-level functions provide its own buffer.)
Several other functions declared in <string.h> are serious candidates for implementing as built-in functions that generate inline code. A common practice is to give these built-in versions secret names. You then provide masking macros to gain access to the built-in functions. Thus, a production version of
<string.h> could well include several additional masking macros.
Listing 2 shows the file memcpy.c. I chose char as the working type within memcpy in the off chance that some computer architectures may favor it over unsigned char. (That's one of the justifications for having a "plain" character type.) memcpy can assume that its source and destination areas do not overlap. Hence, it performs the simplest copy that it can.
Listing 3 shows the file memmove. c. The function memmove must work properly even when its operands overlap. Hence, it first checks for an overlap that would prevent the correct operation of an ascending copy. In that case, it copies elements in descending order.
Listing 4 shows the file strncat.c. The function strncat first locates the end of the destination string. Then it concatenates at most n additional characters from the source string. Note that the function always supplies a terminating null character.
Listing 5 shows the file strncpy.c. The function strncpy is likewise similar to memcpy, except that it stops on a terminating null. strncpy also has the unfortunate requirement that it must supply null padding characters for a string whose length is less than n.
The str functions are direct analogs of the strn functions. Listing 6 shows the file strcat.c and Listing 7 shows the file strcpy.c. The functions differ only in not worrying about a limiting string length n. Of course, strcpy has no padding to contend with either.
Listing 8 shows the file memset.c. I chose unsigned char as the working type within memset in the off-chance that some implementation might generate an overflow storing certain int values in the other character types.
Listing 9 shows the file strlen.c. The function strlen is probably the most heavily used of the functions declared in <string.h>. It is the leading contender for implementation as a built-in function. If that form exists, look for places where strlen masquerades as inline code. The functions strcat and strncat are two obvious examples.
Finally, Listing 10 shows the file strerror.c. It defines both strerror and the internal function _Strerror. _Strerror constructs a text representation of certain error codes in a buffer. It uses its own static buffer only when called by strerror. I supply here specific messages only for the minimum set of error codes defined in this implementation of <errno.h>. Many implementations add more. Any unknown error codes print as three-digit decimal numbers.
This article is excerpted in part from P.J. Plauger, The Standard C Library, (Englewood Cliffs, N.J.: Prentice-Hall, 1992).