May 2001 C++ Experts Forum/The (B)Leading Edge

C++ Experts Forum

The (B)Leading Edge: Using IOStreams — locales and facets

Jack W. Reeves

In this installment of "The (B)Leading Edge," I intend to delve into locales and facets — as one friend of mine says: another obscure and esoteric C++ topic that nobody is interested in. Before I attempt to address that opinion, I want to go back and clean up a few points about my previous columns.

String_stream Revisited

Two columns back, I introduced a String_stream template class to demonstrate what's involved in deriving new IOStream classes [1]. What I showed in the examples of that column were correct — I believe, but I was a little lazy in my testing. As often seems to happen, shortly after I wrote the column, I had an occasion to actually use the String_stream template on a real project. One of my original intents was that it be possible to instantiate a iString_stream with a const string object, or some similar read-only string type. When I tried this with the code I presented in the column, it would not compile.

The problem (in case you haven't run into this just yet) is that the output functions of the underlying String_streambuf (in the final version presented, this was the sync function) would generate compiler errors because they attempted to apply operations on the string type that were not supported. Originally, I assumed that since String_stream was a template that such functions would not be instantiated when I tried to create an iString_stream object. (Obviously, if I tried to create an oString_stream or a bi-directional String_stream with a read-only string, I would expect such errors.) I wasn't thinking clearly. The sync function is a virtual function. On most implementations, the compiler has to always instantiate such functions because it needs their addresses to put in the vtable. Whether this can be considered strictly compliant behavior is rather beside the point. The real point is that my String_streambuf class could not be instantiated with a read-only string type. Fortunately, the fix was trivial.

Most examples of derived IOStreams classes use a bi-directional derived streambuf class, but then create three different stream classes. There is no reason that an iString_stream class has to use a bi-directional String_streambuf class, however. I promptly created an iString_streambuf class that supports reading from a string, but no output to the string. I then changed the iString_stream class to use this new type of derived streambuf. The result is shown in Listing 1. As I said, it is trivial — the only functions that the derived iString_streambuf provides are a constructor and a destructor. The constructor sets up the pointers for the get area in the base streambuf class — and that is all that is required. There is no underflow function, no putbackfail function, and no put area to be maintained. The default base class implementation is acceptable for all functionality. That's what I call "easy." One note of caution has to be raised: if you specialize an iString_stream for a const string, you can get something like the following to compile:
iString_stream<const std::string> istrm("pi = 3.14157");
but this is definitely not a good idea. Because the constructor now has a const string& as its argument, the compiler will create a temporary string object from the string literal. Naturally, this object has gone out of scope and been destroyed at the end of the statement, leaving the istrm object to read data from a string object that no longer exists. In practice this doesn't seem to happen to me very often, but it is definitely something to be aware of. Once again, the rational for using a String_stream object was to wrap an IOStream interface around an existing string without having to copy the data. The String_stream does no memory management — it depends on the supplied string object. For this reason, it has to be supplied with a object whose lifetime exceeds that of the String_stream object. If this is ever an issue, the Standard library stringstream class is a much better choice.

Another Look at Temperature

A couple of readers pointed out some problems with my Temperature class in the last column. First there was a minor error in the operator> function. It should have read:
inline bool operator> (Temperature lhs, Temperature rhs) { return rhs < lhs; }
A more serious problem was pointed out by reader Philip Hibbs. He observed that Temperature's equality comparison operators were directly comparing raw floating-point values. He correctly noted that such comparisons should always be done with reference to a tolerance. Under the circumstances, it seems pretty obvious that I had not had any reason to use Temperature's comparison operators. This seems like as good a time as any to note that good class design is a non-trivial process, even when it seems like the class being designed is trivial.

On an Ada project, I created a Temperature class that represented the data the same way it arrived from the sensor (as a 12- bit signed binary number — or something like that). This seemed reasonable for that project because the external sensor interface was specified by an ICD (Interface Control Document) that was part of a subcontractor relationship that I figured was reasonably immune to capricious changes. On the other hand, I have worked on projects where interface boards were being designed in house, and the only thing in common between two versions of such a board was that they both plugged into the same backplane. In that case, I was inclined to create a more general-purpose abstraction.

Once you start trying to create general purpose, reusable classes, you often find that you have to confront issues that might not be relevant for a particular project. This is often used as an argument against creating and using general-purpose software, especially for such simple classes as Temperature. I disagree. Every issue that the class designer confronts (and correctly solves, of course) is one less issue that a user has to worry about. Maybe a floating-point number is not the best representation for an "ordinary" temperature value. Or maybe I will just correct the comparison operators. While such changes might trigger recompilations, no code that uses Temperature will have to change; that code will simply get better automatically. And that is the whole point.

Exploring facets

In the last column, I showed how to use some of the special features of the IOStreams library to create an output operator for Temperature that allowed the user to specify some class-specific formatting information [2] — specifically the scale to use. In that column, I set things up so that the client had to specify a scale; otherwise the output operation would fail. I noted at the time that such behavior might not be considered acceptable. Now, I am going to tackle the problem of providing a default output format for a Temperature value while keeping with the spirit of the current Temperature class implementation. In other words, I want to be able to write
#
Temperature t;
// ... calculate t
ostrm << t << std::endl;
and get some reasonable output. Of course, what is reasonable depends upon the application context. In the previous column, I required the output be written as
ostrm << Temperature::asC << t << std::endl;
but this just changes where in the code the default gets hard coded. If this is at the top level of the application, that is fine, but what if this statement is part of some other class' output operator. If Temperature is a widely used class, we are likely to get temperature values output in different scales depending upon what one user or another thinks is appropriate. What we want is to provide one place where we can set the default. One obvious way to do this is via a static member of the Temperature class itself. A more flexible choice is to create a Temperature-specific facet that we can add to a locale. The latter approach has several advantages, which we will consider at the end.

My friend's observation about locales (and facets) being an obscure and esoteric feature of the Standard C++ library has a lot of truth in it. I am sure most programmers consider locales to be of interest only to programmers who are immediately involved in writing human interface code in applications that are intended for international markets. In fact, most of those programmers are more likely to be concerned with the internationalization features of MFC or X-Windows than with locales in C++. Like most things in C++, the vast majority of the time you do not need to know or care anything at all about locales. There is however a big difference between what you need to know and what you should know. In particular, I think everyone should know that statements such as:
ostrm << 123.4;
or
istrm >> x;
where x is a numeric or Boolean value, use a facet to do the actual output or input of the value. I also think it is reasonable to know that locales are an extensible framework that allows user-defined facilities to control just about any aspect of output or input that it might be reasonable to change for different locations or different cultures. If you know at least this much, then when a situation presents itself where it is reasonable to take advantage of the capabilities that locales provide, you at least will know enough to go looking for more information (see [3]). In our business, ignorance is not bliss; it just results in more work and poorer quality software.

Obviously, if you are really going to learn about locales and facets, you would probably start by first studying locales, then the standard facets, and finally explore the possibility of defining custom facets. Since I don't have time or space to do a full tutorial on locales, I am going to approach things in reverse. First, I am going to show you my Temperature facet class. Then I will explain the necessary details as I go along. My new header file for Temperature is shown in Listing 2. I made a few tweaks, but with the exception of the nested facet class, it is basically the same as the version I presented last time. (I moved the implementation of operater== out of the header and changed the name of the manipulators for displaying the scale.)

I decided to make my Temperature facet a nested class within Temperature. I named it temp_put. The Standard library uses the naming conventions:

xxx_put — facets that output something (usually via a put function)

xxx_get — facets that input something (usually via a get function)

xxxpunct — facets that provide formatting and punctuation information about something

My Temperature facet is publicly derived from the Standard base class locale::facet (as opposed to being derived from some other, existing facet class).

My new Temperature facet is intended to introduce a new facet family — in other words, it is not intended to be a member of one of the existing facet hierarchies. One of the requirements of a facet family is that the base class of the family must provide a static member object of type locale::id. A locale, which is fundamentally just a collection of facets, will use this id to lookup a Temperature::temp_put facet in a locale when requested.

In addition to the static id member, my Temperature facet contains a data member (_fmtr) that is a pointer to an output helper function. This function is precisely the same output helper function that I would put in the ostream pword array, which I described in my last column [2]. The Temperature::temp_put constructor takes an argument that initializes this member. Finally, temp_put provides a member function put, which takes the same arguments as the helper function.

Because the temp_put facet just forwards its operation to the helper function, I have shown its entire implementation inline in the header. A few things need extra explanation. One of these is the magic number "1" that is passed to the constructor of the base class locale::facet. All of the Standard facets are designed to be created on the heap and be managed by a locale object. Constructing the locale::facet base class with an argument of "0" (the default) basically says that the locale is responsible for deleting the facet when it is no longer required. Constructing locale::facet with an argument of "1" disables this behavior and leaves memory management up to the client. Since I intend to create only static temp_put objects, this is necessary.

Another thing to note is that the put function is marked as const. facets are intended to be immutable after they are created. All references to a facet are const references, so all functions have to be const functions. This is also the reason that the copy constructor and assignment operator are private.

After the declaration of Temperature::temp_put, I declare three public static members, temp_put_asK, temp_put_asC, and temp_put_asF. These are initialized in the implementation file (Listing 3) with their corresponding helper functions.

The only thing left in Temperature is to look at the changes to the operator<< function in Listing 3 necessary to support the custom locale. If a format is specified in the stream, then that is used instead of any locale. If no format is specified, then the function retrieves the locale for the stream and checks to see if there is a Temperature::temp_put facet in the locale (call to std::has_facet). If not, we have no default, so we indicate an error and quit. If the facet is available, we retrieve a reference to it by calling the std::use_facet function. Via this reference, we invoke the put function of the facet, which just invokes its helper function.

That's all there is to Temperature. In order to make use of this capability, we have to add the necessary facet to a locale that will become part of the ostream that Temperature's operator<< uses. The simplest thing to do is add the facet to the global locale. This is the locale used by default when a stream is constructed. Because of the way locales work, we cannot just add a facet to an existing locale. locales, like facets, are treated as immutable objects — they can not be changed after they are created. Instead we have to construct a new locale that is a copy of an existing locale plus the additional facet. Naturally, there is a locale constructor that does this. Then we make the new locale the global locale.

The code looks like this:
std::locale loc = std::locale(std::locale(),
    &Util::Temperature::temp_put_asF);
std::locale::global(loc);
The first line constructs a locale that is a copy of the existing global locale (std::locale()). This locale is then used to construct a new locale that additionally takes a new facet (passed by reference). This facet will either replace an existing facet or be added, depending upon its id. The next line replaces the existing global locale with a copy of the one just created. In this example, the facet will cause Temperature values to be displayed in Fahrenheit.

Once the global locale has been updated, any new stream objects that are created will contain a locale with a Temperature::temp_put facet by default. This will not affect any existing streams however, so you might want to also do the following:
std::cout.imbue(loc);
std::cerr.imbue(loc);
This will add the new locale to the existing iostream objects. If you are using wide streams, you might want to do the same thing to them. Naturally, this code belongs at the very top level of an application.

At this point, I am pretty sure that readers who actually know something about locales and facets are probably choking in disgust. I have managed to avoid almost every convention that the Standard library uses in creating my Temperature facet. Just to make sure they know I did it on purpose, I will now outline a number of possible improvements to Temperature's I/O facility that I will leave as exercises for the reader (or for when I get around to them).

The convention followed in the Standard library is for facet base classes to provide a standard interface with public functions (such as put), which invoke protected virtual functions to do the real work. These protected virtual functions are usually named do_xxx (e.g., do_put). I didn't follow this convention in Temperature's facet, primarily because I was using a direct dispatching scheme instead of inheritance. It is certainly possible to create three subclasses of Temperature::temp_put instead of three static objects.

The standard facets that do input and output all work on iterators instead of directly on a stream. Furthermore, these facets are templates that take the type of iterator as a template argument. This is obviously more flexible than what I have shown here. I decided to keep things simple for this example, but you should consider what would be required to change the facet's put function to take an output iterator instead of an ostream as its target. Besides there is reason (3) below.

There is obviously a lot of redundancy in the helper functions. It would make more sense to replace the helper functions with a class that provided functions to return the floating-point value to be output and the character representing the scale (to be used if needed). This sounds precisely like the type of functionality that a punctuation-style facet could provide. Minor changes to the Temperature class could allow a new Temperature::punct facet to serve both as a facet in a locale and as the target for the manipulators that determine the scale to use to display the Temperature. I suggest that you consider what changes this would require in Temperature. My version is shown in Listings 4 and 5.

Summary

I noted above that using facets and locales provided a more general purpose solution than having a static default value in Temperature. The primary reason is flexibility. The static class Temperature default is pretty much all or nothing. Using a custom facet, however, makes it possible for the client to add it to the global locale, add it to a locale used just for one stream, add it to an application-specific locale, or even to have different Temperature defaults added to different locales, which might be selected on the fly.

Naturally, the flip side of flexibility is added complexity — the client potentially has to know about the facet, has to create a new locale to contain the facet, has to make that locale the global locale, etc. Fortunately, a lot of this can be hidden. One possibility is for Temperature to provide a static initialization function that would do all of this for the client. Such a function could even be invoked automatically at startup. This would provide the appearance of a global default output scale for Temperature, while still allowing the client the flexibility to change things. This little example just begins to show what might be necessary to create a truly reusable library component that makes use of a custom facet.

As I noted in the beginning, designing reusable, general purpose classes is not trivial. A well-designed class tends to be trivial to use however. Hopefully, I have shown that adding custom facets to a class design is not that difficult and esoteric. Likewise, I also hope that perhaps this little introduction has given you some hint of how facets and locales can be useful. In the ultimate case, a site might want to add custom facets for a class like Temperature to existing locales so that they would automatically be there just like the C++ Standard facets.

References

[1] Jack Reeves. "The (B)Leading Edge: Using IOStreams — part I," The C/C++ User's Journal Experts Forum, January 2001, http://www.cuj.com/experts/1901/reeves.htm.

[2] Jack Reeves. "The (B)Leading Edge: Using IOStreams — part II," The C/C++ User's Journal Experts Forum, March 2001, http://www.cuj.com/experts/1903/reeves.htm.

[3] Angelika Langer and Klaus Kreft. Standard C++ Iostreams and Locales: Advanced Programmer's Guide and Reference (Addison-Wesley, 2000).

Jack W. Reeves is an engineer and consultant specializing in object-oriented software design and implementation. His background includes Space Shuttle simulators, military CCCI systems, medical imaging systems, financial data systems, and numerous middleware and low-level libraries. He currently is living and working in Europe and can be contacted via jack_reeves@bleading-edge.com.