C/C++ Contributing Editors


Standard C/C++: The Facet time_get

P. J. Plauger

Ever wish you could just read in a date or time without a lot of coding? Well now you can, possibly even in French.


Introduction

Last month, I showed how you can display various components of a time using a locale facet in the Standard C++ library. Template class time_put converts members of a struct tm object, obtainable from various functions declared in the Standard C header <time.h>, to printable text. (See "Standard C/C++: The Facet time_put," CUJ, June 1998.) The template class is, in fact, a locale facet, a creature that works with iostreams classes (among other things) to help you perform culture-specific input and output. In this sense, time_put is primarily a convenient way to package the functionality long provided by the Standard C function strftime.

The Standard C library has no corresponding provision for reading times and dates, however. You can obtain the current time and date in encoded form, of scalar type time_t, by calling the function time. You can convert between this scalar encoding and the structured encoding, of type struct tm with various functions. And you can convert encoded times to text form with strftime and other functions. But the Standard C library offers no functions analogous to scanf for reading text and converting it to an encoded time.

This can be mildly annoying. I have often found the need to read in a date or time - either as a command-line argument, from the keyboard in response to a prompt, or from a field in a text file - and convert it to internal encoded form. To be nice to people, you'd like to support widespread custom, such as writing "December 2, 1979" instead of some barbarism such as "791202" (or "19791202" now that the millenium looms ever closer). The former is nicer to people, but much harder to code than the latter. To be even nicer, you'd like to support customs of other lands. If the function strftime can be coaxed into converting dates to French - and it can, at least on some implementations - it makes sense to have a date scanner also convert French dates to internal encoded form. We nevertheless elected not to add such a feature to the Standard C library.

The C++ Standard decided to correct this shortcoming. Its analogs to printf and scanf, for converting between text and numeric encodings are the locale facets num_put and num_get. It offers a similar pair of template classes for converting between text and monetary encodings, with the locale facets money_put and money_get. (I have described all these facets over the last several months.) You will not be surprised to learn that time_put has its corresponding template class time_get, which I describe this month.

C++ Locale Facets

As I have described more than once in past columns, the Standard C++ library provides for multiple active locales within a program, not just one global locale as in the Standard C library. Each locale is encapsulated within a locale object. Every input or output stream based on the iostreams classes is "imbued with" (contains) a locale object. A stream consults its stored locale object to determine any locale-dependent behavior. (See "Standard C/C++: Introduction to Locales," CUJ, October 1997.)

A locale in the Standard C library is divided up into half a dozen categories. The category LC_TIME, for example, controls the behavior of the function strftime. A locale facet in the Standard C++ library is an object of type locale::facet, as defined in the header <locale>. Each locale object actually consists of an open-ended set of references to objects of different facet types. A locale category in C is represented by one or more facets in C++.

Template class time_get is one such facet. An object of a class specialized from this template class converts text sequences, by locale-dependent rules, into one or more components of a tm object. Typically, the facet extracts characters from an input stream. Its get functions convert different components of a time or date. Every locale object contains references to objects of type time_put<char> and time_put<wchar_t>. These two objects differ only in the type of the sequence elements they extract.

How can you use time_get to advantage? Listing 1 shows a small example program. It should work with any compiler that includes some version of the Dinkum C++ Library, such as Microsoft Visual C++ V4.2 or later. I built upon the example I showed last month, for printing time components. It demonstrates how you can create a time extractor, by overloading operator>> for input streams. For simplicity in defining the extractor overload, I make use here of the class Time_fmt that I used last month for output. An object of this class stores a tm object and the strftime format string that should be used to convert its components to a text sequence. In this simple example, the extractor makes no use of the format string, however.

The extractor obtains a reference _Fac to the appropriate time_get object associated with the input stream. It then calls _Fac.get to extract a date from the input stream. Thus, the two lines:

Time_fmt date;
cin >> date;

should read from the standard input stream the current date in the form favored by the time_get facet. In this case, the member function date_order() returns the enumeration value mdy to signal that the facet expects dates of the form "December 2, 1979."

The example program repeatedly reads a line of text as a date and prints it out. Running under Visual C++, I get:

December 2, 1979
2/12/79
Dec 02, 79
2/12/79

As you can see, the conversion is tolerant of several popular alternate date formats.

Listing 1 shows a template definition of operator>> for Time_fmt objects that works with any kind of input stream you care to contrive. (But please note that only the facets corresponding to the element types char and wchar_t occur in all locale objects.) As with earlier examples, I began with an existing template extractor from the header <istream> and reworked it as needed. You can write a simpler form if you're less ambitious, but I still recommend that you match an existing extractor closely, to get all the details right.

I remind you again that the macro _USEFAC(loc, fac) is peculiar to my implementation of the Standard C++ library. It paves over the dialect differences between compilers that support different generations of template processing. The macro expands either to use_facet<fac>(loc) or use_facet(loc, (fac *)0), accordingly. Thus, the macro does what's needed to obtain a reference to a locale facet.

Template Class time_get

Template class time_get behaves much like template class num_get. (See "Standard C/C++: The Facet num_get," CUJ, February 1998.) It is a facet like many of the others I've described these past several months. (For the basics of locales and facets, see "Standard C/C++: Introduction to Locales," CUJ, October 1997.) I repeat here just the description of how the extractor in Listing 1 makes use of time_get:

Listing 2 shows one way to implement (most of) template class time_get. As usual, I omit most of the implementation-specific magic code. Most of the interesting action, also as usual, occurs in the various virtual functions with names that begin with do_. (You override these virtual member functions, as needed, in a derived class to produce your own flavor of a time_get facet.)

I remind you, as usual, that the macro _WIDEN converts a member of the basic C character set, as type char, to the element type. Typically, this involves little or no work. Similarly, the macro _NARROW converts the other way, or to a null character if that is not possible.

Numeric input fields are converted by calling the private member function _Getint. It converts an integer by the usual rules, then checks to ensure that the result is in range. Text input fields are rather harder to convert. The code must accept either "Dec" or "December," for example. Obviously, it has to favor the longer match, or the shorter match will always queer the scan. But this template class is obliged to work with input iterators, which cannot back up. Moreover, the scan must stop on the first element that fails to match any of the candidate text patterns. It cannot overshoot.

Listing 3 shows the template function _Getloctxt that does this tricky operation. The code is not easy to puzzle out, but it does the job. It is designed to take a sequence of alternative inputs, such as:

":Jan:January:Feb:February:Mar:March"
":Apr:April:May:May:Jun:June"
":Jul:July:Aug:August:Sep:September"
":Oct:October:Nov:November:Dec:December"

and return the index of the first colon-delimited field that matches. If no field matches, the function returns a negative value.

Virtual Functions

Here is a brief description of each of the virtual functions in template class time_get. As in the other facets, they do all the interesting work:

time_get::do_date_order

virtual dateorder do_date_order() const;

The virtual protected member function returns a value of type time_base::dateorder, which describes the order in which date components are matched by do_get_date. In this implementation, the value is time_base::mdy, corresponding to dates of the form December 2, 1979.

time_get::do_get_date

virtual iter_type
do_get_date(iter_type first, iter_type last,
    ios_base& x, ios_base::iostate& st, tm *pt) const;

The virtual protected member function endeavors to match sequential elements beginning at first in the sequence [first, last) until it has recognized a complete, nonempty date input field. If successful, it converts this field to its equivalent value as the components tm::tm_mon, tm::tm_day, and tm::tm_year, and stores the results in pt->tm_mon, pt->tm_day, and pt->tm_year, respectively. It returns an iterator designating the first element beyond the date input field. Otherwise, the function sets ios_base::failbit in st. It returns an iterator designating the first element beyond any prefix of a valid date input field. In either case, if the return value equals last, the function sets ios_base::eofbit in st.

In this implementation, the date input field has the form MMM DD, YYYY, where:

time_get::do_get_month

virtual iter_type
do_get_month(iter_type first, iter_type last,
    ios_base& x, ios_base::iostate& st, tm *pt) const;

The virtual protected member function endeavors to match sequential elements beginning at first in the sequence [first, last) until it has recognized a complete, nonempty month input field. If successful, it converts this field to its equivalent value as the component tm::tm_mon, and stores the result in pt->tm_mon. It returns an iterator designating the first element beyond the month input field. Otherwise, the function sets ios_base::failbit in st. It returns an iterator designating the first element beyond any prefix of a valid month input field. In either case, if the return value equals last, the function sets ios_base::eofbit in st.

The month input field is a sequence that matches the longest of a set of locale-specific sequences, such as: Jan, January, Feb, February, etc. The converted value is the number of months since January.

time_get::do_get_time

virtual iter_type
do_get_time(iter_type first, iter_type last,
    ios_base& x, ios_base::iostate& st, tm *pt) const;

The virtual protected member function endeavors to match sequential elements beginning at first in the sequence [first, last) until it has recognized a complete, nonempty time input field. If successful, it converts this field to its equivalent value as the components tm::tm_hour, tm::tm_min, and tm::tm_sec, and stores the results in pt->tm_hour, pt->tm_min, and pt->tm_sec, respectively. It returns an iterator designating the first element beyond the time input field. Otherwise, the function sets ios_base::failbit in st. It returns an iterator designating the first element beyond any prefix of a valid time input field. In either case, if the return value equals last, the function sets ios_base::eofbit in st.

In this implementation, the time input field has the form HH:MM:SS, where:

time_get::do_get_weekday

virtual iter_type
do_get_weekday(iter_type first,
  iter_type last, ios_base& x,
  ios_base::iostate& st, tm *pt) const;

The virtual protected member function endeavors to match sequential elements beginning at first in the sequence [first, last) until it has recognized a complete, nonempty weekday input field. If successful, it converts this field to its equivalent value as the component tm::tm_wday, and stores the result in pt->tm_wday. It returns an iterator designating the first element beyond the weekday input field. Otherwise, the function sets ios_base::failbit in st. It returns an iterator designating the first element beyond any prefix of a valid weekday input field. In either case, if the return value equals last, the function sets ios_base::eofbit in st.

The weekday input field is a sequence that matches the longest of a set of locale-specific sequences, such as: Sun, Sunday, Mon, Monday, etc. The converted value is the number of days since Sunday.

time_get::do_get_year

virtual iter_type
do_get_year(iter_type first,
  iter_type last, ios_base& x,
  ios_base::iostate& st, tm *pt) const;

The virtual protected member function endeavors to match sequential elements beginning at first in the sequence [first, last) until it has recognized a complete, nonempty year input field. If successful, it converts this field to its equivalent value as the component tm::tm_year, and stores the result in pt->tm_year. It returns an iterator designating the first element beyond the year input field. Otherwise, the function sets ios_base::failbit in st. It returns an iterator designating the first element beyond any prefix of a valid year input field. In either case, if the return value equals last, the function sets ios_base::eofbit in st.

The year input field is a sequence of decimal digits whose corresponding numeric value must be in the range [1900, 2036). The stored value is this value minus 1900. In this implementation, a numeric value in the range [0, 136) is also permissible. It is stored unchanged.

Conclusion

I have my usual reservations about this facet as with several others I have described recently. It performs a complex job that many would consider esoteric. That wouldn't be so bad, except that it takes a heroic effort on the part of implementors to avoid the overheads it causes. Every program that uses the iostreams classes drags in all the code for template classes time_get<char> and time_get<wchar_t>. (VC++ avoids loading these classes, I hasten to report.)

On the other hand, I confess that writing the extractor in Listing 1 was remarkably easy. It was a real pleasure to be able to type in dates with very little coding and debugging. I could get used to having time_get around. I wouldn't be surprised to find that I use it more and more in the coming years. You just might too.

P.J. Plauger is Senior Editor of C/C++ Users Journal and President of Dinkumware, Ltd. He is the author of the Standard C++ Library shipped with Microsoft's Visual C++, v5.0. For eight years, he served as convener of the ISO C standards committee, WG14. He remains active on the C++ committee, J16. His latest books are The Draft Standard C++ Library, Programming on Purpose (three volumes), and Standard C (with Jim Brodie), all published by Prentice-Hall. You can reach him at pjp@plauger.com.