William Smith is the engineering manager at Montana Software, a software development company specializing in custom applications for MS-DOS and Windows. You may contact him by mail at P.O. Box 663, Bozeman, MT 59771-0663.
The include file <string.h> in the Standard C library defines 22 functions for manipulating character strings. Seventeen of these functions begin with the prefix str and another five begin with the prefix mem. The functions that begin with the prefix str work on null-terminated strings. They accomplish such critical tasks as finding the length of a string, concatenating strings, or comparing strings, to name just a few of the tasks. The functions that begin with mem work on any buffer of memory. These functions do not interpret a null character as a terminator.
At first, the functions in <string.h> appear to offer a broad smorgasbord of functionality. I originally expected them to satisfy most string-processing requirements I would encounter. In actuality, I repeatedly encountered situations where what I needed to do I could not accomplish with a single call to one of the Standard C functions. But they are good building blocks. I found myself using and grouping them to accomplish what I really needed.
Most of the string processing tasks I am faced with center around manipulating text data for input and output. I nearly always have to parse and convert some text script file or user input into a data structure and vice versa. Over time, in just about every program I wrote, the specific needs for text processing started to repeat themselves. I frequently needed to delete and insert strings, or trim leading and trailing tabs and spaces from text. These and many other requirements were common to nearly every project I would work on. After almost ten years of programing in C, a group of about an additional 20 functions has precipitated and become a crucial part of my C function library. I am going to share with you the most recent incarnation of my bare bones but essential string function library. These functions complement the Standard C functions defined in <string. h>
Dynamic Memory Issues
When writing string functions, you can go in a couple of different directions with regards to dynamic memory. You can dynamically allocate memory to store the string that results from a function's execution of a task. This approach allows you to avoid modification of the original string passed to the function. For example, when performing search and replace, you can use dynamic memory to store the string that contains the modifications. The function can then leave the original string unchanged. However, when using this approach, the programmer must keep track of memory allocation and make sure to release the allocated memory eventually. This can be a challenge in certain situations. Use of dynamic memory may be more suitable in C++. C++ is better organized to provide object creation and deletion. This helps with dynamic memory management and relieves some of the burden on the programmer.Since the functions presented here are pure Standard C, I choose to avoid dynamic memory. In fact, I also choose to avoid creating buffers on the stack as a scratch or work space. Some of the editing functions require a temporary work space, but I get around this by using the memmove function defined in <string.h>. memmove provides safe memory copying of overlapping buffers. You would need a temporary copy of the source buffer to do it yourself. Although convenient, using memmove has the disadvantage of being more costly with respect to processor time. This varies from system to system, but generally there are usually faster ways to accomplish the same task as memmove. Modifying the functions to avoid the use of memmove can wring a bit more performance and efficiency out of them.
In the future, it might be worthwhile to create object wrappers in C++ for these string functions. For now, I will leave them in standard C. This means that all the functions assume that the strings that you pass to them are NULL-terminated. The functions also assume that the strings are pointers to memory areas that are large enough to accommodate the resulting string generated by the function. The burden of avoiding buffer overflows rests on the programer.
Implementation
I break the string functions up into two categories. I group functions for extracting or finding a substring in a string into the file named STR_NGET.C (Listing 1 and Listing 2) . The second group, in STR_EDIT.C (Listing 3 and Listing 4) , contains functions that I use for editing strings.
Functions for Getting Substrings
STR_NGET. C contains the functions str_nleft, str_nmid, str_nright and str_rstr. The first three functions extract a specified number of characters from a string. These functions modify the string itself by moving the desired characters into it. str_nleft extracts the n left- most characters. str_nright extracts the n right-most characters. str_nmid extracts n characters from a specified position.str_rstr resembles the function strstr defined in <string.h>. But instead of finding the first occurrence of a substring, str_rstr finds the last occurrence of the substring. The relationship between strstr and str_rstr is analogous to the relationship between strchr and strrchr. I have seen a function called strrstr in some libraries that come with commercial compilers. It is equivalent to my function str_rstr. It is not a part of the standard.
Functions for Editing Strings
All the 13 functions in the file STR_EDIT.C do some type of modification or editing to a string. The functionality ranges from the simple padding of strings to a fixed length for justification to complete search and replace.str_center, str_ljust, and str_rjust justify strings. These functions first trim leading and trailing spaces and tabs from a string. They then move the string so it is either centered, left-justified, or right-justified within a specified length.
The trimming functions, str_ltrim, str_rtrim and str_trim execute the trimming tasks required by the justification functions mentioned above. These functions trim all characters from the end or ends of a string that match a list of characters to trim.
The function str_delete removes a specified number of characters from a string starting at a designated location within the string. The function str_insert inserts a string into a string at a designated location in the string. The function str_rplc uses both str_delete and str_insert implement a search and replace capability. str_mrplc does search and replace for all matches. str_rplc just replaces the first match. The function str_repeat builds a string of desired length by repeating a string.
The function str_vcat is a variable-argument version of the Standard C function strcat. This function concatenates a list of strings. The last string or parameter passed to str_vcat must be a null-pointer. str_ocat is a version of strcat that can handle overlapping strings. An example of overlapping strings would be a single string with multiple pointers to different locations in the string. Depending on the compiler vendor, sometimes strcat will work with overlapping strings, sometimes it will not. For safety and constancy I created the function str_ocat. str_ocat is just wrapper for memmove.
Conclusions
Nearly every major program I have written has involved text processing in some form. The Standard C library provides a useful, but shallow group of string-manipulation functions. Over time and out of need, I have come up with the group of string functions presented here. These functions build upon the standard library functions and provide the functionality that I have found important in practice.There are an endless number of more functions you can invent. And you can probably find more efficient ways to implement the functions demonstrated here. Nevertheless, these are the functions I have found useful and essential in my work with C.