A Portable C++ String Class

A framework for cross-platform data-file management

William Hill

William is information-systems manager at Eurédit SA, a Paris firm which publishes the Europages business-to-business telephone directory on paper, CD-ROM, European online services, and the World Wide Web. William can be contacted at bhill@dialup .francenet.fr.


The company I work for, Eurédit SA, produces a European-wide yellow-pages directory which includes listings and publicity for 150,000 European companies in over 30 countries. With over 60 partner companies worldwide collecting data and selling publicity, communication and file exchange can be a Tower of Babel, confusing enough to intimidate even the most hardened of data-processing personnel. We currently accept two formats for submission of editorial data: Paradox tables generated by a Windows application we distribute to interested partners, and a fixed-length EBCDIC file format used by our partners working out of mainframe production shops. All of our publication data is processed for markup from the second format. In addition to photocomposition markup, our data-file format has been used in a Europe-wide online system, a Windows application on CD-ROM, and a Europe-wide fax server. Needless to say, this EBCDIC file format, though aging and cryptic, is mission critical for our products. For the purposes of this article, I will refer to this file format as the "Europages file format."

Since our Europages file format is just a specialized, fixed-format file, we developed a fixed-format library and inherited our Europages file-format classes from them. A small set of foundation classes was then put together in order to guarantee portable development using string classes, linked lists, and arrays. The foundation classes immediately paid for themselves in terms of generic, reusable code, and served as a solid underpinning for the port of the entire package to other platforms. Here, I'll share some of the design decisions we made in developing the foundation classes, describe some of the portability gotchas we encountered, and present a lean-but-mean portable string class.

Europages File Format

All Europages file records are of equal length, but each record has a variable number of fields depending on the information it contains. Before an individual record can be used, its record type must be known. Once the record type is known, its individual fields may be accessed. The format has often been used by dense, hard-to-debug applications that break once the data-file definition is modified (every year to varying degrees). We needed to make the format transparent by creating a software tool for our production chain that could be shared with our editorial partners.

Our design priorities were:

Portability Requirements

We did not want to develop our package on one platform or compiler and then port it to the next one. Code was implemented and tested simultaneously on DOS and SunOS compilers. This allowed us to identify the particularities of each compiler early on. Code written later in development benefited from this cross-platform approach. The result is an important body of C++ code that does not contain any direct references to a single platform or compiler. DOS compilers included Borland C++ 3.1 and 4.0, Visual C++ 1.5, and GNU 2.3. The SunOS compilers used were Sun C++ 3.0 and GNU 2.3.

We avoided those C++ features that, due to lack of an adopted language standard, are not universally supported: templates, exception handling, and run-time type identification. As soon as these language features are generally available, they will be integrated into the package.

Programmers using our classes don't need to know if the data being manipulated is represented in EBCDIC or not. A character string in memory can be compared automatically to a character string in our Europages file. All read operations convert from EBCDIC to the native character set, correctly translating all accented characters. Likewise, all write operations handle the proper translation towards EBCDIC. These operations are carried out by a straightforward table lookup. The Europages file-format classes guarantee that all data is properly represented in memory.

Our package requires several reference files, which are accessed on disk during processing. One of these files contains the list of yellow-pages product references under which advertisements may be purchased in our directory. Each reference, and its associated text heading, occupies 116 bytes. The entire table contains over 7000 entries (812 KB). We chose not to access this in RAM in order to avoid complications when using the package under DOS. We did not place this table in an indexed file, to avoid problems with file compatibility and byte order (Endianness) in our UNIX implementations. To obtain acceptable access times and portable behavior, the table was placed in a fixed-format file and sorted by product code. The entries are retrieved with a simple binary search. Although elementary, this technique provides satisfactory performance and instant code portability. Example 1 shows the technique we used within our class library.

Portability Gotchas

PC software tools are extremely rich in features and functionality when compared to programming environments on more powerful platforms. This richness sometimes comes back to haunt PC programmers when functions that they have routinely used turn out to be PC specific.

One such DOS function was memmove() in the <string.h> library. Simple code like the function call in Example 2 is not portable across platforms. memmove() is not available on UNIX compilers, but memcpy() does the job quite adequately. Since our earliest code relied heavily on function calls like this, #define macros are used to transform memmove() calls to memcpy() calls for our SunOS implementation.

We needed case-insensitive string comparison from our string class. Under DOS/Windows, functions like stricmp() have always been available for most C and C++ compilers. Programmers are so used to them that they forget that strcmp() is not part of the C Standard Library. When a small portion of code is ported to another platform (such as a Sun workstation) to be compiled with a strictly conforming ANSI C or C++ compiler, it instantly breaks at link time. Programmers usually view truly portable code as a pipe dream. Listings One and Two contain our portable version of strcmp(), which is conditionally compiled if we are not working on a DOS/Windows platform. Listing Three is a program that tests the string library.

Designing generic classes to be used in all types of applications is difficult. The temptation to add that one last member function is always there. Trying to design a "kitchen-sink" class that does all things for all applications is always a danger. Our string class contains exactly what we need, across all of our applications, and no more. Generic class design is a lot like application-interface design in that the public member functions of a generic class constitute an API. The more pertinent the proposed functions, the more useful the API. Unused functions in a class interface are better implemented through inheritance, when they are truly needed.

A Generic String Class

The generic string class presented here contains the most basic string-manipulation functions that we needed. It has been immensely useful for encapsulating traditionally problem-prone code inside an intuitive interface. From this base class, we have since inherited an enhanced class with full support for the comparison of accented character strings. The enhanced class includes a more-sophisticated search function based on the Boyer-Moore algorithm. The class now provides excellent performance on extremely large character buffers. We are currently working on a Unicode implementation of a derived class for Windows NT. The beauty of these solutions is that the original base code keeps ticking away, providing constant service and a clean springboard to more complex solutions.

All of the classes in our data-format framework use the orthodox canonical form for their class declarations and definitions. A fine illustration of this form is presented in Advanced C++ Programming Styles and Idioms, by James O. Coplien (Addison-Wesley, 1992). The canonical form ensures that instances of declared classes will do exactly what you expect them to do when they are created, copied, passed by value as function arguments, used on the left side of an assignment operator, and destroyed. This canonical form requires a default constructor, a copy constructor, an assignment operator, and a destructor (almost always a virtual destructor).

The string class has four different constructors:

The destructor is declared as virtual. This guarantees that a derived destructor will be called if an instance of a derived class is deleted through a base-class pointer. Two assignment operators are supplied with the class: one for assigning string instances and one for assigning C strings. Note in Listings One and Two that the assignment operators only delete the internal storage for their instances if the right-hand value's string length is strictly superior to its own length.

Selectors

Selectors are functions used for getting inside a C++ class and having a look around without producing any side effects. In other words, selector functions access class variables without changing their values. Selector member functions are usually declared as const member functions. The compiler then guarantees that the function cannot modify the instance for which it is called. The length() function, which returns the string length (not including the binary zero terminator, as in C), is the most solicited selector. isEmpty() and the ! operator both return True if the string has a zero length. The ! operator is really an inlined call to isEmpty() and is provided for notational convenience.

The majority of selectors for the string class are comparison and concatenation operators that share the same arithmetical notation. Strings may be compared with the mathematical operators >, <, >=, <= , and the traditional C operators == and !=. To maximize code reuse, the string class uses its own operators at every opportunity. Listing One shows that the operator += uses the operator + inside itself. We avoid rewriting existing functionality by building powerful operators from more basic ones. The class seems to bootstrap itself. Using the operators internally illustrates the usefulness of particular member functions. The operators >=, <=, and != are just simple calls to the <, >, and == operators, respectively. In Listing One, the >= operator returns a Boolean value indicating whether or not < is true. In this way, symmetric behavior for opposing functions is guaranteed.

The comparison operators all return Boolean values. For processing that requires the exact value returned by functions like strcmp(), the member function cmp() is supplied. It simply calls strcmp() from the standard library and passes back the value. The Basic functions left(), mid(), and right() are supplied as part of the public interface. They all rely on the private-member workhorse function ncpy(), which encapsulates the actual extraction code. ncpy() returns any substring within the string instance for which it is called. The [] operator has been overloaded to allow access to individual characters. Note that it returns a reference to the selected character, not a copy of the character on the stack. In this way, the character may be used on the left side of the assignment operator. The locate() functions find an occurrence of a character or a substring within the instance for which they are called. They return a zero-based offset to the occurrence.

Manipulators

Manipulators are member functions that actually change the inner state of the class instance for which they are called. toUpper() and toLower() transform the string instance to all upper- or lowercase characters. The derived "enhanced" class redefines these member functions to do special processing for accented characters. The insert() and erase() functions are used for inserting and deleting substrings from a string instance. The fill() member function floods the buffer with a given character value and tags on a trailing binary zero. The iostreams classes are a great improvement over <stdio.h>, but sometimes nothing works like a call to sprintf(). Listing One shows how we recuperated sprintf() functionality using the <stdarg.h> library.

Numerous applications and techniques have been simplified using the string class. In Example 3, a character buffer is scanned and a given substring is continuously located and replaced by a different character sequence. The function is tight and clear, the algorithm stands out, and all code dealing with allocating and deallocating character buffers and copying and concatenating character strings is neatly encapsulated within the string class. This search-and-replace function was put together quickly and is easily maintained.

Supporting New Character Sets

Political and economic changes in Europe have brought new partners to the directory, and we must now support character sets for more languages, including Slavic, Polish, and Slovenian. If all possible accented characters are to be published, we must manage many more character values than the 256 available in eight bits. Don't forget that all these characters will end up in an EBCDIC file and will have to be translated somewhere along the line. We are extremely interested in the possibilities of 16-bit character sets such as Unicode, which, I am sure, will profoundly change the way editorial data processing is carried out. Unfortunately, Unicode is only implemented on Windows NT for the time being, and we are still trying to clear out our legacy-application cobwebs.

Our C++ package uses a completely portable, albeit inelegant, mechanism for transmitting accented characters. Specific byte values are set aside as "floating-accent" values. These values can only be used as accents of other characters. For example, the hexadecimal value 0x06 is used for the circumflex accent "^". Floating-accent values are placed immediately before the character to be accented. A Europages file-format record will use the hexadecimal sequence 0x03, 0x85 to transmit the character "à". On a DOS platform, this sequence will be read into memory as the ASCII hexadecimal value 0x88. Under Windows, it will be read into memory as the ANSI hexadecimal value 0xEA.

Despite the work currently aimed at establishing a Unicode standard and the premature announcement of the death of ASCII, the truly portable text-representation systems are all based on 7-bit ASCII codes: PostScript, Acrobat, SGML (and by extension, HTML).

Our character controls include run-time font-measurement calculations. This means that character strings too long to be published in one column are detected at file-composition time (not photocomposition time), generating important savings for us in terms of delay and overdue costs. Each editorial field in the package knows about the typeface used to compose it on a printed page. All accepted characters are assigned their corresponding typeface width in 200ths of Didot points. If a given field value overshoots the width of a page column, an error is returned.

Conclusion

The Europages file-format classes have been used extensively since the beginning of 1994. They serve as a 10,000-line C++ code repository for all programs used to verify and process Europages data files for the publication of our paper directory and electronic products. Results show that editorial applications put together using these classes have been developed up to ten times faster than previous applications, which were developed in C.

The true test of usefulness and durability is software maintenance. Our previous software tools were difficult to maintain when our product changed. This meant either an exhausting application rewrite to match the new product specifications, or a hasty, unsatisfying maintenance job that left the edifice as shaky as our nerves.

The Europages directory undergoes product modifications every year. Sometimes these changes are incremental. This year the changes ran deep and profoundly affected the structure of our data files. The C++ classes were easily maintained, and publishing programs were ready in October of 1994, seven months in advance of next edition's publishing deadline. Using previous methods and tools, the publishing software was never available more than three months in advance. This software-maintenance success has validated our design decisions and proven that medium-to-large-scale project portability is possible if design goals and considerations are clearly defined and understood at the outset.

Example 1: Retrieving entries using a binary search.

BOOL F2HCode::verify(const char *src)
{
    const int nRecLen = 5;      // length of an individual field
    FFCursor *cursHead = new FFCursor("head13.dat", FALSE, nRecLen);
    char szBuf[nRecLen + 1];
    BOOL retval = FALSE;
    long nLow = 1;          // record numbers start at 1
    // return the number of records in the fixed format file:
    long nHigh = cursHead -> numRecs();
    while(nLow <= nHigh)    {
        long nMid = (nLow + nHigh) / 2;
        cursHead -> gotoRecord(nMid);
        cursHead -> getRecord(szBuf);
        szBuf[sizeof(szBuf) - 1] = '\0';
        if(strcmp(src, szBuf) < 0)
            nHigh = nMid - 1;
        else if(strcmp(src, szBuf) > 0)
            nLow = nMid + 1;
        else    {
            retval = TRUE;
            break;
            }
        }
    delete cursHead;
    return retval;
}
Example 2: Nonportable function call between UNIX and DOS.
// fixed-format field write
// function
BOOL FField::put(char *src)
{
    memmove(szBuf + nOffset, src,
nLength);
    return TRUE;
}
Example 3: Using a string class to continuously replace substrings in a character buffer by another sequence of characters.
// replaces one substring by another for an entire string
// char *s : source character buffer
// const char *x : substring to find
// const char *y : substring to replace x with
// int len : string length of s
void replaceXbyY(char *s, const char *x, const char *y, int len)
{
    if(strcmp(x, y) == 0) return;
    MXString strSrc = s;
    MXString strX = x;
    MXString strY = y;
    int pos;    // position returned by locate() function
               // if pos == MXSTRING_LOCATENOTFOUND,
               // the substring was not found
    // find the offset of strX in strSrc
    while((pos = strSrc.locate(strX)) != MXSTRING_LOCATENOTFOUND)
    {
        strSrc.erase(pos, strX.length());   // erase this copy of StrX
        strSrc.insert(pos, strY);          // insert StrY in its place
    }
    strncpy(s, strSrc, len);
}

Listing One

// Class: MXString  class for managing zero terminated C-style strings
// Author: W Hill
#ifndef MXSTRING_HPP
#define MXSTRING_HPP
#include <string.h>
#include <assert.h>
#include <ctype.h>
enum BOOL   { FALSE, TRUE };
typedef const char *CSTR;
const int MAX_VARGS_BUFLEN = 1024;
const int MXSTRING_LOCATENOTFOUND = -1;
class MXString    {
//
public:
    // orthodox canonical form
    // see Advanced C++ Programming Styles and Idioms, James O. Coplien
    MXString();
    MXString(const MXString&);              // copy constructor
    MXString(const char *);
    MXString(unsigned int nSprintfSize);    // resize buffer for sprintf()
    virtual ~MXString();
    virtual MXString& operator=(const MXString&);   // assignment operator
    virtual MXString& operator=(CSTR);
    virtual MXString operator+(const MXString&) const;
    virtual MXString& operator+=(const MXString&);
    // type conversion
    operator CSTR() const;
    // duplication
    // user is responsible for deleting the returned pointer
    // just like ANSI C strdup() function
    char *strDup() const;
    // substring member functions/operators
    char& operator[](unsigned int index);
    // remember BASIC?
    MXString left(unsigned int len) const; // return first len characters
    MXString mid(unsigned int start, unsigned int len) const; 
                               // return len characters from offset start
    MXString right(unsigned int len) const; // return last len characters
    // substring/character functions return MXSTRING_LOCATENOTFOUND for 
        // error. offset is 0 based
    virtual int locate(const MXString&) const;
    virtual int locate(const char c) const;
    // comparison operators
    //      > and < are used for alphabetical sorting operators
    virtual BOOL operator>(const MXString&) const;
    virtual BOOL operator>(CSTR) const;
    virtual BOOL operator>=(const MXString&) const;
    virtual BOOL operator>=(CSTR) const;
    virtual BOOL operator<(const MXString&) const;
    virtual BOOL operator<(CSTR) const;
    virtual BOOL operator<=(const MXString&) const;
    virtual BOOL operator<=(CSTR) const;
    virtual BOOL operator==(const MXString&) const;
    virtual BOOL operator==(CSTR) const;
    virtual BOOL operator!=(const MXString&) const;
    virtual BOOL operator!=(CSTR) const;
    virtual int cmp(const MXString&) const;
    // case conversion member functions
    virtual void toUpper();             // converts instance to uppercase
    virtual void toLower();             // converts instance to lowercase
    // check/toggle sensitivity setting for all MXStrings
    static void sensitivity(BOOL b);
    static BOOL sensitivity();
    // insertion; deletion members functions
    MXString& insert(unsigned int start, MXString&);
    MXString& erase(unsigned int start, unsigned int len);
    // handy printf formatting-type function;
    MXString& sprintf(CSTR fmt, ...);
    // returns length of zero terminated string
    // not length of allocated buffer
    unsigned int length() const;
    
    // return TRUE if string is empty 
    BOOL isEmpty() const;
    BOOL operator!() const;
    // fills string with single character 
    void fill(unsigned int len, const char c =  );
//
private:
    static BOOL bSensitive; // compares/searches are case sensitive ?
    char *rep;
    int nSprintfBufSize;
    MXString ncpy(unsigned int start, unsigned int len) const;
    };
inline MXString::operator CSTR() const
{
    return rep;
}
inline MXString MXString::left(unsigned int len) const
{
    return ncpy(0, len);
}
inline MXString MXString::mid(unsigned int start, unsigned int len) const
{
    return ncpy(start, len);
}
inline unsigned int MXString::length() const
{
    return strlen(rep);
}
inline MXString MXString::right(unsigned int len) const
{
    return ncpy(length() - len, len);
}
inline BOOL MXString::isEmpty() const
{
    return (*rep == \0) ? TRUE : FALSE;
}
inline BOOL MXString::operator !() const
{
    return isEmpty();
}
inline void MXString::sensitivity(BOOL b)
{
    MXString::bSensitive = b;
}
inline BOOL MXString::sensitivity()
{
    return MXString::bSensitive;
}
class ostream;
ostream& operator<<(ostream& s, MXString& m);
#endif  // MXSTRING_HPP

Listing Two

#include mxstring.hpp
BOOL MXString::bSensitive;
#ifdef  sunos
int stricmp(const char *s1, const char *s2);
#endif  // portable stricmp()
#ifdef  sunos
int stricmp(const char *s1, const char *s2)
{
    while(toupper(*s1++) == toupper(*s2++))
        if(*s1 == \0 && *s2 == \0)
            return 0;
    if(toupper(*s1) < toupper(*s2))
        return -1;
    else
        return 1;
}
#endif  // portable stricmp()
MXString::MXString()
{
    rep = new char[1];
    assert(rep);
    rep[0] = \0;
    nSprintfBufSize = MAX_VARGS_BUFLEN;
}
MXString::MXString(unsigned int nSprintfSize)
{
    rep = new char[1];
    assert(rep);
    rep[0] = \0;
    nSprintfBufSize = (nSprintfSize > 0) ? nSprintfSize : MAX_VARGS_BUFLEN;
}
MXString::MXString(const MXString& s)
{
    rep = new char[s.length() + 1];
    assert(rep);
    strcpy(rep, s.rep);
    nSprintfBufSize = MAX_VARGS_BUFLEN;
}
MXString::MXString(const char *s)
{
    rep = new char[strlen(s) + 1];
    assert(rep);
    strcpy(rep, s);
    nSprintfBufSize = MAX_VARGS_BUFLEN;
}
MXString::~MXString()
{
    delete[] rep;
}
// As for all operators and functions that require possible reassigning to the
// *rep pointer, a test is first made to verify that the existing string 
// buffer is larger than the incoming string. If so, make a straightforward
// copy. Buffer space is freed only if incoming string requires extra space.
MXString& MXString::operator=(const MXString& s)
{
    if(rep != s.rep)        {
        if(s.length() > length())   {
            delete[] rep;
            rep = new char[s.length() + 1];
            assert(rep);
            }
        strcpy(rep, s.rep);
        }
    return *this;
}
MXString& MXString::operator=(const char *s)
{
    if(rep != s)        {
        if(strlen(s) > length())    {
            delete[] rep;
            rep = new char[strlen(s) + 1];
            assert(rep);
            }
        strcpy(rep, s);
        }
    return *this;
}
MXString MXString::operator+(const MXString& s) const
{
    char *tmp = new char[length() + s.length() + 1];
    assert(tmp);
    strcpy(tmp, rep);
    strcat(tmp, s.rep);
    MXString retval = tmp;
    delete[] tmp;
    return retval;
}
MXString& MXString::operator+=(const MXString& s)
{
    *this = *this + s;
    return *this;
}
char *MXString::strDup() const
{
    char *tmp = new char[length() + 1];
    assert(tmp);
    strcpy(tmp, rep);
    return tmp;
}
MXString MXString::ncpy(unsigned int start, unsigned int len) const
{
    if(start > (length() - 1))       {
        MXString emptyString;
        return emptyString;
        }
    if(len > strlen(&rep[start]))
        len = strlen(&rep[start]);
    char *tmp = new char[len + 1];
    assert(tmp);
    strncpy(tmp, &rep[start], len);
    tmp[len] = \0;
    MXString retval = tmp;
    delete[] tmp;
    return retval;
}
char& MXString::operator[](unsigned int index)
{
    // return \0 if the index value is out of bounds
    if(index < 0 || index > length())
        return rep[length()];
    return rep[index];
}
int MXString::locate(const MXString& s) const
{
    char *p;
    int off;
    if(MXString::sensitivity()) {
        p = strstr(rep, s.rep);
        off = p ? (int)(p - rep) : MXSTRING_LOCATENOTFOUND;
        }
    else    {
        MXString src = *this;
        src.toUpper();
        MXString tmp(s);
        tmp.toUpper();
        p = strstr(src.rep, tmp.rep);
        off = p ? (int)(p - src.rep) : MXSTRING_LOCATENOTFOUND;
        }
    return off;
}
int MXString::locate(const char c) const
{
    char *p;
    int off;
    if(MXString::sensitivity()) {
        p = strchr(rep, c);
        off = p ? (int)(p - rep) : MXSTRING_LOCATENOTFOUND;
        }
    else    {
        MXString src = *this;
        src.toUpper();
        char tmp = toupper(c);
        p = strchr(src.rep, tmp);
        off = p ? (int)(p - src.rep) : MXSTRING_LOCATENOTFOUND;
        }
    return off;
}
BOOL MXString::operator>(const MXString& s) const
{
    if(MXString::sensitivity())
        return strcmp(rep, s.rep) > 0 ? TRUE : FALSE;
    else
        return stricmp(rep, s.rep) > 0 ? TRUE : FALSE;
}
BOOL MXString::operator>(CSTR s) const
{
    MXString str = s;
    return (*this > str);
}
BOOL MXString::operator>=(const MXString& s) const
{
    return (s < *this);
}
BOOL MXString::operator>=(CSTR s) const
{
    MXString str = s;
    return (str < *this);
}
BOOL MXString::operator<(const MXString& s) const
{
    if(MXString::sensitivity())
        return strcmp(rep, s.rep) < 0 ? TRUE : FALSE;
    else
        return stricmp(rep, s.rep) < 0 ? TRUE : FALSE;
}
BOOL MXString::operator<(CSTR s) const
{
    MXString str = s;
    return (*this < str);
}
BOOL MXString::operator<=(const MXString& s) const
{
    return (s > *this);
}
BOOL MXString::operator<=(CSTR s) const
{
    MXString str = s;
    return (str > *this);
}
BOOL MXString::operator==(const MXString& s) const
{
    if(MXString::sensitivity())
        return strcmp(rep, s.rep) == 0 ? TRUE : FALSE;
    else
        return stricmp(rep, s.rep) == 0 ? TRUE : FALSE;
}
BOOL MXString::operator==(CSTR s) const
{
    MXString str = s;
    return (*this == str);
}
BOOL MXString::operator!=(const MXString& s) const
{
    return (*this == s) ? FALSE : TRUE;
}
BOOL MXString::operator!=(CSTR s) const
{
    return (*this == s) ? FALSE : TRUE;
}
int MXString::cmp(const MXString& s) const
{
    if(MXString::sensitivity())
        return strcmp(rep, s.rep);
    else
        return stricmp(rep, s.rep);
}
void MXString::toUpper()
{
    for(unsigned int i = 0; i < length(); i++)
        rep[i] = toupper(rep[i]);
}
void MXString::toLower()
{
    for(unsigned int i = 0; i < length(); i++)
        rep[i] = tolower(rep[i]);
}
MXString& MXString::insert(unsigned int start, MXString& s)
{
    if(start < (length() + 1))       {
        MXString strStart = ncpy(0, start);
        MXString strEnd = ncpy(start, length() - start);
        *this = strStart + s + strEnd;
        }
    return *this;
}
MXString& MXString::erase(unsigned int start, unsigned int len)
{
    if(start < (length() + 1) && len <= strlen(&rep[start]))       {
        MXString strStart = ncpy(0, start);
        MXString strEnd = ncpy(start + len, length() - (start + len));
        *this = strStart + strEnd;
        }
    return *this;
}
void MXString::fill(unsigned int len, const char c)
{
    if(len > length())    { 
        delete[] rep;
        rep = new char[len + 1];
        assert(rep);
        }
    memset(rep, c, len);
    *(rep + len) = \0;
}
#include <stdio.h>
#include <stdarg.h>
MXString& MXString::sprintf(const char *fmt, ...)
{
    char *szBuf = new char[nSprintfBufSize];
    assert(szBuf);
    va_list args;
    va_start(args, fmt);
    int val = ::vsprintf(szBuf, fmt, args);
    va_end(args);
    // if retval >= MAX_VARGS_BUFLEN then
    // we have written past the end of the buffer
    // memory is probably trashed; an exception should be thrown here
    assert(val < nSprintfBufSize);
    *this = szBuf;
    delete[] szBuf;
    return *this;
}
#include <iostream.h>
ostream& operator<<(ostream& s, MXString& m)
{
    s << (CSTR)m;
    return s;
}

Listing Three

#include <iostream.h>
#include <fstream.h>
#include mxstring.hpp
int main(int argc, char *argv[])
{
    MXString str(Hello, world!);
    cout << instance [str] ==  << str << \n;
    cout << MXString instances are  << sizeof(MXString) <<
         bytes in size << \n; 
    cout << instance [str] is  << sizeof(MXString) <<
         bytes in size << \n; 
    cout << instance [str] contains string representation of  <<
        str.length() <<  bytes in length << \n\n; 
    MXString strUp = STRING;
    MXString strLow = string;
    cout << strUp ==  << strUp << \t << strLow ==  << strLow << \n;
    MXString::sensitivity(FALSE);
    cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
    cout << strUp <<  ==  << strLow <<  :  <<
        (int)(strUp == strLow) << \n;
    MXString::sensitivity(TRUE);
    cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
    cout << strUp <<  ==  << strLow <<  :  <<
        (int)(strUp == strLow) << \n;
    MXString::sensitivity(FALSE);
    cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
    cout << strUp <<  !=  << strLow <<  :  <<
        (int)(strUp != strLow) << \n;
    MXString::sensitivity(TRUE);
    cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
    cout << strUp <<  !=  << strLow <<  :  <<
        (int)(strUp != strLow) << \n\n;
    strUp = UP;
    strLow = low;
    cout << strUp ==  << strUp << \t << strLow ==  << strLow << \n;
    
    MXString::sensitivity(FALSE);
    cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
    cout << strUp <<  <  << strLow <<  :  <<
        (int)(strUp < strLow) << \n;
    MXString::sensitivity(TRUE);
    cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
    cout << strUp <<  <  << strLow <<  :  <<
        (int)(strUp < strLow) << \n;
    MXString::sensitivity(FALSE);
    cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
    cout << strUp <<  >  << strLow <<  :  <<
        (int)(strUp > strLow) << \n;
    MXString::sensitivity(TRUE);
    cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
    cout << strUp <<  >  << strLow <<  :  <<
        (int)(strUp > strLow) << \n\n;
    MXString::sensitivity(FALSE);
    MXString strSrc(This string is for searching inside);
    MXString strLocate(SEARCH);
    char chLocate = G;
    int rc = strSrc.locate(strLocate);
    cout << Using source string :  << strSrc << \n;
    cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
    cout << strLocate <<  subchain search result :  << rc << \n;
    rc = strSrc.locate(chLocate);
    cout << Using source string :  << strSrc << \n;
    cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
    cout << chLocate <<  subchain search result :  << rc << \n\n;
    
    MXString::sensitivity(TRUE);
    rc = strSrc.locate(strLocate);
    cout << Using source string :  << strSrc << \n;
    cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
    cout << strLocate <<  subchain search result :  << rc << \n;
    rc = strSrc.locate(chLocate);
    cout << Using source string :  << strSrc << \n;
    cout << Sensitivity ==  << (int)MXString::sensitivity() << \n;
    cout << chLocate <<  subchain search result :  << rc << \n\n;
    MXString strBegin = Beginning ;
    MXString strMiddle = Middle ;
    MXString strEnd = End;
    cout << strBegin ==  << strBegin <<  strMiddle ==  <<
                 strMiddle <<  strEnd ==  << strEnd << \n;
    MXString strCat;
    strCat = strBegin + strMiddle + strEnd;
    cout << strCat ==  << strCat << \n;
    cout << strCat.left(9) ==  << (MXString)strCat.left(9) << \n;
    cout << strCat.mid(10, 6) ==  << (MXString)strCat.mid(10, 6) << \n;
    cout << strCat.right(3) ==  << (MXString)strCat.right(3) << \n\n;
    
    strCat.erase(0, 10);
    cout << strCat.erase(0, 10) ==  << strCat << \n;
    strCat.insert(0, (MXString)Start );
    cout << strCat.insert(0, \Start\) ==  << strCat << \n\n;
    MXString strBig;
    if(argc > 1)    {
        ifstream fin(argv[1]);
        if(fin.good())  {
            int nCount = 0;
            char szBuf[1024];
            while(fin.getline(szBuf, sizeof(szBuf)))    {
               strBig.sprintf(Line [%05d] : %s, ++nCount, szBuf);
               cout << strBig << \n;
               }
            }
        cout << \n;
        }
#if defined _CONSOLE || sunos
    strBig.fill(1000000, #);  // this works under WinNT and SunOS
    cout << strBig << \n\n;
#endif  // _CONSOLE || sunos
    // write over string used at beginning of program   
    str = Goodbye, world!;
    cout << str << endl;
    return 0;
}
End Listings


Copyright © 1995, Dr. Dobb's Journal