P.J. Plauger editor of The C User Journal. He is secretary of the ANSI C standards committee, X3J11, and convenor of the ISO C standards committee, WG14. His latest books are The Standard C Library, published by Prentice-Hall, and ANSI and ISO Standard C <with Jim Brodie), published by Microsoft Press. You can reach him at pjp@lauger.com,
Introduction
For the last two months, I have been discussing the header <string.h>. (See Standard C, CUJ Oct. and Nov. 1992.) So far, I have discussed the functions that copy, concatenate, and compare strings (plus a few others). This month, I discuss the last or the functions declared in <string.h> those that help you search strings in various ways.These functions are powerful and widely used. They are probably more widely used than the functions that concatenate and copy strings. You may have trouble guessing the name of the search function you need in the Stand- ard C library, but it is probably there. At least you can usually find a search function that comes close enough to meet your needs.
What the C Standard Says
7.11.5 Search functions
7.11.5.1 The memchr function
Synopsis
#include <string.h> void *memchr(const void *s, int c, size_t n);Description
The memchr function locates the first occurrence of c (converted to an unsigned char) in the initial n characters (each interpreted as unsigned char) of the object pointed to by s.
Returns
The memchr function returns a pointer to the located character, or a null pointer if the character does not occur in the object.
7.11.5.2 The strchr function
Synopsis
#include <string.h> char *strchr(const char *s, int c);Description
The strchr function locates the first occurrence of c (converted to a char) in the string pointed to by s. The terminating null character is considered to be part of the string.
Returns
The strchr function returns a pointer to the located character, or a null pointer if the character does not occur in the string.
7.11.5.3 The strcspn function
Synopsis
#include <string.h> size_t strcspn(const char *s1, const char *s2);Description
The strcspn function computes the length of the maximum initial segment of the string pointed to by s1 which consists entirely of characters not from the string pointed to by s2.
Returns
The strcspn function returns the length of the segment.
7.11.5.4 The strpbrk function
Synopsis
#include <string.h> char *strpbrk(const char *s1, const char *s2);Description
The strpbrk function locates the first occurrence in the string pointed to by s1 of any character from the string pointed to by s2.
Returns
The strpbrk function returns a pointer to the character, or a null pointer if no character from s2 occurs in s1.
7.11.5.5 The strrchr function
Synopsis
#include <string.h> char *strrchr(const char *s, int c);Description
The strrchr function locates the last occurrence of c (converted to a char) in the string pointed to by s. The terminating null character is considered to be part of the string.
Returns
The strrchr function returns a pointer to the character, or a null pointer if c does not occur in the string.
7.11.5.6 The strspn function
Synopsis
#include <string.h> size_t strspn(const char *s1, const char *s2);Description
The strspn function computes the length of the maximum initial segment of the string pointed to by s1 which consists entirely of characters from the string pointed to by s2.
Returns
The strspn function returns the length of the segment.
7.11.5.7 The strstr function
Synopsis
#include <string.h> char *strstr(const char *s1, const char *s2);Description
The strstr function locates the first occurrence in the string pointed to by s1 of the sequence of characters (excluding the terminating null character) in the string pointed to by s2.
Returns
The strstr function returns a pointer to the located string, or a null pointer if the string is not found. If s2 points to a string with zero length, the function returns s1.
7.11.5.8 The strtok function
Synopsis
#include <string.h> char *strtok(char *s1, const char *s2);Description
A sequence of calls to the strtok function breaks the string pointed to by s1 into a sequence of tokens, each of which is delimited by a character from the string pointed to by s2. The first call in the sequence has s1 as its first argument, and is followed by calls with a null pointer as their first argument. The separator string pointed to by s2 may be different from call to call.The first call in the sequence searches the string pointed to by s1 for the first character that is not contained in the current separator string pointed to by s2. If no such character is found, then there are no tokens in the string pointed to by s1 and the strtok function returns a null pointer. If such a character is found, it is the start of the first token.
The strtok function then searches from there for a character that is contained in the current separator string. If no such character is found, the current token extends to the end of the string pointed to by s1, and subsequent searches for a token will return a null pointer. If such a character is found, it is overwritten by a null character, which terminates the current token. The strtok function saves a pointer to the following character, from which the next search for a token will start.
Each subsequent call, with a null pointer as the value of the first argument, starts searching from the saved pointer and behaves as described above.
The implementation shall behave as if no library function calls the strtok function.
Returns
The strtok function returns a pointer to the first character of a token, or a null pointer if there is no token.
Example
#include <string.h> static char str[] = "?a???b,,,#c"; char *t; t = strtok(str, "?"); /* t points to the token "a" */ t = strtok(NULL, ","); /* t points to the token "??b" */ t = strtok(NULL, "#,"); /* t points to the token "c" */ t = strtok(NULL, "?"); /* t is a null pointer */Using the Search Functions
memchr Use this function to locate the first occurrence (the one having the lowest subscript) of a character in a character sequence of known length. The function type casts the first (string pointer) argument to pointer to unsigned char. It also type casts the second (search character) argument to unsigned char. That ensures that an argument expression of any character type behaves sensibly and predictably. A search failure returns a null pointer, however. Be sure to test the return value before you try to use it to access storage. Also note that the return value has type pointer to void. You can assign the value to a character pointer but you can't use it to access storage unless you first type cast it to some character pointer type.strchr Use this function to locate the first occurrence (the one having the lowest subscript) of a character in a null-terminated string. The function type casts the second (search character) argument to char. That ensures that an argument expression of any character type behaves sensibly and predictably. A search failure returns a null pointer, however. Be sure to test the return value before you try to use it to access storage. Note that the call strchr(s, '\0') returns a pointer to the terminating null. See also strcspn, strpbrk, and strrchr, described below.
strcspn You can think of strcspn as a companion to strchr that matches any of a set of characters instead of just one. That makes it similar to strpbrk as well. Note, however, that strcspn returns an index into the string instead of a pointer to an element. If it finds no match, it returns the index of the terminating null instead of a null pointer. Thus, you may find that the call strcspn(s, "a"), for example, is more convenient than either strchr(s, 'a') or strpbrk(s, "a").
strpbrk You can think of strpbrk as a companion to strchr that matches any of a set of characters instead of just one. That makes it similar to strcspn as well. Note, however, that strcspn returns an index into the string instead of a pointer to an element. If it finds no match, it returns the index of the terminating null instead of a null pointer. Thus, you may find that the call strcspn(s, "abc"), for example, is more convenient than strpbrk(s, "abc").
strrchr Use this function to locate the last occurrence (the one having the highest subscript) of a character in a null-terminated string. The function type casts the second (search character) argument to char. That ensures that an argument expression of any character type behaves sensibly and predictably. A search failure returns a null pointer, however. Be sure to test the return value before you try to use it to access storage. Note that the call strrchr(s, '\0') returns a pointer to the terminating null. See also strchr, strcspn, and strpbrk, described above.
strspn You can think of strspn as the complement to strcspn. It searches for a character that matches none of the elements in a set of characters instead of any one of them. strspn also returns an index into the string or, if it finds no match, the index of the terminating null. Thus, the call strspn(s, "abc"), for example, finds the longest possible span of characters from the set "abc".
strstr You write strstr(s1, s2) to locate the first occurrence of the substring s2 in the string s1. A successful search returns a pointer to the start of the substring within s1. Note that a search failure returns a null pointer.
strtok This is an intricate function designed to help you parse a null-terminated string into tokens. You specify the set of separator characters. Sequences of one or more separators occur between tokens. Such sequences can also occur before the first token and after the last. strtok maintains an internal memory of where it left off parsing a string. Hence, you can process only one string at a time using strtok. Here, for example, is a code sequence that calls the function word for each "word" in the string line. The code sequence defines a word as the longest possible sequence of characters not containing "white-space" defined here as a space, horizontal tab, or newline:
#include <string.h> char *s; for (s=line; (s=strtok(s," \t\n"))!=NULL;s=NULL) word(s);The first call to strtok has a first argument that is not a null pointer. That starts the scan at the beginning of line. Subsequent calls replace this argument with NULL to continue the scan. If the return value on any call is not a null pointer, it points to a null-terminated string containing no separators. Note that strtok stores null characters in the string starting at line Be sure that this storage is writable and need not be preserved for future processing.You can specify a different set of separators on each call to strtok that processes a given string, by the way.
Implementing the Search Functions
Listing 1 shows the file memchr.c. The major concern of function memchr is to get various types right. You must assign both the pointer and the character arguments to dynamic data objects with different types. That lets you compare the array elements as type unsigned char correctly and efficiently. I wrote the (void *) type cast in the return expression for clarity, not out of necessity.Listing 2 shows the file strchr.c. The function strchr is the simplest of these functions. It is the obvious analog of memchr.
Listing 3, Listing 4, and Listing 5 show the files strcspn.c, strpbrk.c, and strspn.c, respectively. Both strcspn and strpbrk perform the same function. Only the return values differ. The function strspn is the complement of strcspn.
Listing 6 shows the file strrchr.c. The function strrchr is a useful complement to strchr. It memorizes the pointer to the rightmost occurrence (if any) in sc. The type cast in the return statement is necessary, in this case, because sc points to a constant type.
Listing 7 shows the file strstr.c. The function strstr calls strchr to find the first character of the string s2 within the string s1. Only then does it tool up to check whether the rest of s2 matches a substring in s1. The function treats an empty string s2 as a special case. It matches the implicit empty string at the start of s1.
Listing 8 shows the file strtok.c. The function strtok is the last and the messiest of the seven string-scanning functions. It doesn't look bad because it is written here in terms of strspn and strpbrk. It must contend, however, with writable static storage and multiple calls to process the same string. It is probably at least as hard to use correctly as to write correctly. When strtok is not actively scanning an argument string, it points at an empty string. That prevents at least some improper calls from causing the function to make invalid storage accesses. (The function is still at risk if storage is freed for a string that it is scanning.)
This article is excerpted in part from P.J. Plauger, The Standard C Library, (Englewood Cliffs, N.J.: Prentice-Hall, 1992).