It's tiresome enough to parse a command line, but harder still if you have to deal with the varying character representations of Win32.
Introduction
Win32 console applications are ideal for many programming tasks. When writing such an application, a programmer faces the initial hurdle of deciding on command-line syntax and then writing a command-line parser to extract program arguments. In this article, I present some simple C++ classes that wrap main (or wmain) and parse the command line into switches, options, and strings.
These are "lightweight" classes. They are meant to be easy to understand and use, rather than a bulky application framework. A user of these classes starts a new console application by deriving from CConsoleApp. Calls to member functions GetSwitch, GetOption, and GetString retrieve command-line arguments. The user completes the application by "fleshing out" the virtual member functions init, main, deinit, and usage.
This implementation relies on the C++ Standard library and uses the built-in support for string, wstring, list, and iterators. The generic text mappings in tchar.h (a Microsoft-specific header) handle strings and characters. Consequently, my implementation supports console applications using either SBCS (single-byte character sets) or Unicode. The library is built as a static library (conlib.lib) and is targeted for Windows 9x/NT.
The Console Application Framework
The core class for the framework is CConsoleApp. This class is defined in conapp.h (see Figure 1). A CConsoleApp application derives a class from CConsoleApp and creates a single global instance of the derived class. Here is an application skeleton:
class CTrivialApp : public CConsoleApp {}; CTrivialApp App; inline CConsoleApp* GetConApp() { return &App; }Instead of implementing a main function and fiddling with argc and argv, the class user overrides the CConsoleApp member functions init, main, deinit, and usage. _tmain, the console app's entry point, calls these functions. _tmain, a friend function of CConsoleApp, is implemented in the library. _tmain maps to main for SBCS apps and to wmain for Unicode apps. Figure 2 shows the implementation of _tmain.
_tmain first retrieves a pointer to the derived class via a call to GetConApp, a global function defined by the class user. This pointer allows _tmain to access member functions of CConsoleApp and its derived class.
The remainder of _tmain is straightforward. ParseCmdline, the base-class member function, parses the command line. Then, the application's init function is called. If init returns false, the application exits with the return value m_ExitCode. If the flag m_ShowUsage is true, the application's usage function is called before exit. If init succeeds and returns true, the application's main function executes. When main completes and returns, the deinit function is called before the application exits with the return code m_ExitCode.
CConsoleApp also inherits several member functions from its base class, CModule. The constructor for CModule calls the Win32 API GetModuleFilename to determine the complete pathname of the application. Various parts of the pathname can be accessed using CModule's member functions (see Figure 3).
Parsing the Command Line
Prior to calling the application entry points (init, main, deinit, and usage), ParseCmdline parses the command line (see Figure 4). This member function parses the application command line into a list of CCmdArg (see Figure 5) tokens. This list is stored as the class member arglist. arglist will be empty if the application does not have any command-line arguments.
ParseCmdline does not use argc or argv from _tmain; instead, it gets the "raw" command line using the Win32 API, GetCommandLine. This character string is converted to class tstring, where tstring maps to class string for SBCS apps and class wstring for Unicode apps. As far as ParseCmdline is concerned, a token is any substring of non-whitespace characters that is delimited by whitespace characters. A token may also be delimited by a pair of double quotes, in which case it may include embedded whitespace. AddToArglist constructs a CCmdArg for each extracted tstring that is passed to it and then adds it to arglist. The very first token in the command line is excluded from the list, since it corresponds to the executable's pathname.
Once a token is extracted, it is converted into a CCmdArg object and added to the arglist. Each CCmdArg object is given a type of CmdSwitch, CmdOption, or CmdString. Along with the type, all or a portion of the token string is stored as well. If a token starts with a '-' or '/' character, it is either a CmdSwitch or a CmdOption. A CmdOption is a CmdSwitch with a trailing ':' character followed by an option value. A CmdString is any token that does not start with a switch character ('-' or '/'). Table 1 shows some examples.
AddToArglist, shown in Figure 6, converts a tstring into a CCmdArg object. The constructors for class CCmdArg take a type and either one or two tstrings. If the input string is a CmdSwitch, then the tstring stored with the CCmdArg object does not include the switch character. If the input string is a CmdOption, then two tstrings, the option name tstring (without the switch character) and an option value tstring (without the option delimiter), are stored with the CCmdArg object. For a CmdString, the entire input tstring is stored in the CCmdArg object.
The CCmdArg objects, which represent the tokens of the command line, are stored in a list, the arglist member of CConsoleApp. Elements are appended to the end of the list using the push_back function, which stores CCmdArg tokens in order from the start to the end.
Accessing the Tokens
Once the command line has been parsed into a list of CCmdArg objects, it is easy to perform operations on the list. One common operation is to test for the presence of a particular switch in the command line. CConsoleApp supplies the member function TestSwitch for this purpose. TestSwitch takes two arguments, the name of the switch and a Boolean instructing whether or not to remove a matching CCmdArg from arglist. TestSwitch returns true if the named switch is present in the command line.
Figure 7 shows the implementation of TestSwitch. The iterators iter_begin and iter_end are initialized to the start and end of arglist. The generic algorithm find_if is applied to arglist over the range of these iterators. EqualArg, a function object, tests the equality of two CCmdArg objects (see Figure 8). If it returns true, then find_if returns an iterator pointing at the matching CCmdArg in the arglist.
Another CConsoleApp member function that is useful for retrieving command-line switches is GetSwitch. This function takes an additional argument, a reference to a bool. This makes it ideal for initializing a set of variables to reflect the presence of command-line switches. For example, to set three variables corresponding to the switches "noise", "noexec", and "exact", a programmer might use these calls:
GetSwitch(bNoise,"noise",true); GetSwitch(bNoexec,"noexec",true); GetSwitch(bExact,"exact",true);If the switch is not present in the command line, the corresponding variable will be set false. If the switch is present, the variable is set true, and the CCmdArg is removed from arglist (since the third argument is true). For example, after executing the above GetSwitch calls with the command line "-noise -exact", the variables will have the values bNoise=true, bNoexec=false, and bExact=true.
The CConsoleApp member function, GetOption, queries and retrieves command-line options. This function takes four arguments: a reference to a bool that is set true if the requested option is present, a reference to a tstring that is the name of the requested option, a reference to a tstring that will contain the value of the option, and a bool that indicates whether or not to remove the matching CCmdArg from arglist. The following example retrieves the command-line option "-o:outfile.txt":
GetOption(bOut,"o",oval,false);On return, bOut is true, oval has the value "outfile.txt", and CCmdArg remains in arglist.
The CConsoleApp member function GetString retrieves command-line strings. Unlike the previous access functions, which use the function object EqualArg to find a matching CCmdArg, GetString uses the function object EqualType (see Figure 9) to find a matching CCmdArg type. Figure 10 shows the source for GetString. Note that when the generic algorithm find_if is applied to arglist, it will return when it encounters the first CCmdArg that is a CmdString. Subsequent command-line strings will be returned only if the strings are removed from arglist as they are enumerated.
Finally, CConsoleApp provides the utility function DumpArglist for dumping all CCmdArg tokens to standard output.
Dealing with SBCS and Unicode
The Microsoft compiler libraries use character-set mappings to allow applications to be recompiled for MBCS (multi-byte character sets), SBCS, or Unicode. To take advantage of this encoding portability, applications must include the header file tchar.h and use encoding-portable types (TCHAR, LPTSTR, etc.) and macros (T("...")). C applications must also use encoding-portable string functions, such as _tcscpy instead of strcpy or wcscpy and _tmain instead of main or wmain.
This character-set mapping has been extended to the string classes of the C++ Standard library by including the following macros in conapp.h:
#ifdef _MBCS #undef _MBCS // enforce SBCS #endif #include <tchar.h> #include <string> #include <list> using namespace std; #ifdef _UNICODE #ifndef UNICODE #define UNICODE 1 #endif #define tcout wcout #define tcerr wcerr #define tstring wstring #else #undef UNICODE #define tcout cout #define tcerr cerr #define tstring string #endifNote that _MBCS is explicitly undefined. This is necessary because the C++ Standard library does not support MBCS. This leaves us with either SBCS, the default, or Unicode. To get Unicode support, the manifest constants _UNICODE and UNICODE must both be defined. The first constant enables Unicode types in tchar.h; the second constant enables the use of Win32 APIs, which have a 'W' suffix (as opposed to the 'A' suffix for SBCS applications). I've also added appropriate mappings for tcout, tcerr, and tstring.
When a Unicode-compiled version of a CConsoleApp-derived application is executed on a non-Unicode platform (Windows 9x), an error message displays, and the application terminates. This check is made in the constructor for CModule when the Win32 API GetModuleFileName is called. Because the UNICODE manifest constant is defined when the app is built, GetModuleFileNameW is called, which generates the error ERROR_CALL_NOT_IMPLEMENTED. A string resource loads the error message using an explicit LoadStringA call. Incidentally, this string resource is the reason conlib.rc and conres.h need to be included in an application project.
Building a Test Application
My library includes an example application called Testapp, which is available from the CUJ ftp site (see p. 3 for downloading instructions). Visual C++ 5.0 and 6.0 project files are provided for Testapp. Once you add conlib.lib to your library search path and conapp.h, conres.h, and conlib.rc to your include file search path, these projects are ready to be built. I've also supplied project files for building several variants of conlib.lib to work with different run-time libraries. For details, see usage.doc, which accompanies the source code.
For more extensive examples using conlib.lib, see source code for the utility programs preMake and preBuild at my website (http://www.sourcequest.com). These utilities act as command-line wrappers to the Microsoft Nmake and Build tools.
Stan Mitchell founded SourceQuest, Inc., which provides Windows system programming services to clients in the Silicon Valley and beyond. He is author of Inside the Windows 95 File System, published by O'Reilly & Assoc. Stan can be reached at stanm@sourcequest.com and at http://www.sourcequest.com.