Most C/C++ programmers, at one time or another, have written code to perform elementary command-line parsing. This code, when dropped into a program, enables the program to interpret the command line with which it was invoked. This article is not about a command line processor, but rather, its direct opposite: a command line builder. To understand why you might need such a thing, consider the standard command-line interface provided to programs by C and C++:
main (int argc, char **argv){};
If your program calls another program (say, for test purposes) or spawns a child process, it will most likely have to simulate calling that program or child process from a command line. Internal functions may also require a simulated command-line call if they use the same argv-like interface. Since building a command line is a fairly involved process, it makes sense to create a routine to automate the process. This article presents a C++ class which greatly simplifies this task.What the OS Does
When the operating system invokes a C/C++ program it places the command line in an argument list called argv. The host system and compiler handle the construction of the argv-like structure.
For example, assume that a program called test resides on a DOS machine in the directory C:\EXE. You could execute this program by typing
test p1 p2 p3 p4test's entry point looks like this:
int main(int argc, char **argv, char **env) { }
On my machine, these parameters contain the following information:argc = 5 argv[0] = "C:\EXE\TEST.EXE"; argv[1] = "p1"; argv[2] = "p2"; argv[3] = "p3"; argv[4] = "p4"; argv[5] = 0; env[0] = "C:\COMMAND.COM"; env[1] = "PROMPT=$p$g"; env[2] = "PATH=C:\DOS;C:\EXE"; env[3] = 0;argv is an array of pointers to the parameters. In this case, the programmer does not become involved in filling in a value for argc or constructing argv and env. The compiler and host operating system take care of these chores.
Mimicking the OS
Let's assume that we want to call the program test from inside another program (for example, an automatic test program). The automated test program reads in the following script file:
// an actual file read into the automatic tester test p1 p2 p3 p4 test1 a b c d test2 xyzThe test program parses these lines, one at a time, into a string called command_buffer. (How the file is parsed is not of concern at this point.)
The automated test program must convert each string into the elements necessary to build a command line, and call the specified function. To facilitate this conversion process, I've created a special command-line class.
The Command Line Class
The class that converts a character string into an argv list is called command_line. This class is presented in the file cline.hpp (Listing 1) . command_line has three data members:
char *command_buffer; char **Cargv; int Cargc;command_buffer has already been introduced; it holds the character string representation of the command line. In the example above, the representation originated in a script file, but it could also be hard-coded in the test program. Cargv represents the internal version of an argv list. The integer Cargc indicates the number of tokens found in the character string, analogous to the argc parameter in main. Note that command_line declares a class called child_process as a friend. I explain why later, in the discussion on using the command_line class.
Special-Purpose Member Functions
Class command_line's member functions are implemented in file cline.cpp (Listing 2) . Before I describe the constructors, destructor, and assignment operators, I want to prepare the way by explaining the behavior of two other member functions. The first of these is called strtokens:
void command_line::strtokens(const char *command_string)This function counts how many tokens are in the command string. Given the command string:
test p1 p2 p3 p4strtokens would determine that it contained five tokens. strtokens' implementation is not shown here (full source is available on the code disk and online source see page 3 for details); in a nutshell, though, strtokens counts by finding white space boundaries between tokens. It places the token count in the data member Cargc, which is available to the other member functions in the class.
The second member function I want to explain as preparation is build_command:
void build_command(const char *command)This function inputs the string command, breaks it up into an argv-like structure (char **), and then returns the result just as if it were a system-defined argv list.
build_command relies upon two main data structures:
char *command_buffer; char **Cargv;The variable Cargv will be used later in the same manner as the standard argv. The character string command_buffer temporarily holds a single token after build_command extracts it from the input string. build_command subsequently copies this token to the appropriate place in the Cargv structure.
In the token extraction process, build_command maintains two counters:
int index; int pos;index is the current location being filled within the Cargv structure. For example, when index is zero, build_command is extracting the first token in the input string (the actual command name) and preparing to store it in Cargv[0]. pos is just a character index into the temporary string buffer, which build_command uses as it copies tokens.
Before build_command begins extracting tokens, it allocates memory for command_line::command_buffer, deleting any old memory first. build_command then calls strtokens to determine the number of tokens in the string.
Note that the error handling here is very primitive, since the program exits when an error allocating memory is detected. In specific applications, this error handling technique may not be sufficient.
build_command must also allocate space for the structure pointed to by command_line::Cargv. build_command uses the value in Cargc to determine how much space to get. Since Cargv points to an array of pointers, build_command must allocate enough space to hold the entire array:
Cargv = (char **) new char * [Cargc+1];
It actually allocates one extra pointer space so that the array can be terminated by a null pointer. Note that at this point, only the space for the Cargv pointers is allocated; the space for the individual parameters has not yet been obtained.If all is well, build_command then begins picking off tokens and stuffing them into the Cargv structure. For each token encountered build_command allocates sufficient memory to hold the token, sets one of Cargv's pointers to the memory and copies the token over. Again, if at any point a memory allocation fails, build_command kills the program with an abrupt message.
This process continues until the null character at the end of command_string is encountered. At this point, the final null pointer is placed in the final Cargv pointer:
Cargv[Cargc] = NULL;Construction, Destruction, and Assignment
The command_line class has three constructors. All three have one thing in common they acquire memory for the string command_buffer and for the command-line structure itself. This is important because when build_command is called, it assumes that memory has already been allocated to these structures. Thus, build_command deletes whatever is there before it attempts to acquire new memory to construct the new command line. This slightly convoluted logic would not be necessary if build_command was only called once; however, as I show later, it is possible to reassign a new command line to an already existing command_line object.
The first of the constructors is the default constructor. This constructor puts an ASCII NUL in command_buffer and a null pointer in Cargv[0]. The initialization constructor takes a character string as input and calls build_command with this string as a parameter. The copy constructor takes a command_line object as input and uses its command_buffer as the parameter to build_command. In all cases, the value for Cargc is assigned to 0 and then subsequently set to its proper value when build_command calls strtokens. Use of these constructors is illustrated later in this article.
The destructor simply deletes all the memory acquired by the command_line object. This includes the memory for command_line: Cargv and the command_buffer.
The remaining two member functions are overloaded assignment operators. In the first case, operator= inputs a character string and passes it as an argument to build_command. In the second case, operator= inputs a command_line object and passes its command_buffer in the call to build_command. These two functions are analogous to the initialization and copy constructor, with the exception that they replace already existing command_line objects. This is why I use a strcpy in these functions. The following illustrates how these assignments are used:
//a character initialization A = "hello 1 2 3"; // an object initialization B = A;Using the Command-Line Class
As an example of how to use the command-line class, I return to the child process scenario mentioned in the introduction. I won't go into detail about how to create a child process the reference at the end of this article [1] provides a good source on this subject. Here I concentrate on the command_line class. Since MS-DOS provides the simplest means of creating a child process (albeit also the least functional) I use the MS-DOS spawnv command in this example. The intent here is to issue a simple line of code that will execute the child process and return its status. As illustration, consider the following example:
int status; child_process A("hello 1 2 3"); // execute the child process and return its status status = A;
The class child_process (not shown, but on the code disk) contains three constructors, a destructor, and two assignment operators that correspond to the similar constructs in the command_line class. Its private interface contains two items:command_line *CLINE; char command_extension[100];The first item is a pointer, the command_line used to spawn the child process. The second is a buffer, which I explain shortly. All three child_process constructors perform the same task; they create a command_line object and initialize the buffer command_extension (see Listing 3) . A call to the default constructor looks like this:
CLINE = new command_line; command_extension[0] = NULL;The child_process destructor deletes the memory held by the CLINE object.
The child_process assignment operators simply reassign the CLINE object by calling build_command and returning the this pointer. Note that the command_line assignment operators are not used here. As illustration, consider the character assignment operator=:
child_process& child_process::operator=( const char *ptr) { CLINE->build_command(ptr); return *this; }
child_process's three remaining functions facilitate execution of the child process. To spawn a child process, a command must include the directory path to the executable as well as the file extension; in the case of MS-DOS this extension is .exe. Function add_extension uses strcpy and strcat to take a command string such as "hello" and turn it into "c:\bin\hello.exe". I define PATH in the child_process .hpp file (on the code disk). Different applications may require this value to be read in from a file, as in the automated testing tool example. Also be aware that this path will change based on executable location and operating system.The function exec_function calls add_extension and then performs the spawn command.
The final function is another overloaded operator which calls exec_function:
child_process::operator const int () { int status; status = exec_function(); return(status); }This code allows for the simple syntax described at the beginning of this section. Note that any class that utilizes the command_line class must be declared, in command_line, as a friend. In this case being a friend allows the class child_process to access the private interface of command_line.
Testing the Command Line Class
I've written a test program (Listing 4) that utilizes the child_process class just described to test the command_line class. This program tests all three command_line and child_process constructors as well as the child_process assignment operators. Note that the command_line class is transparent to the programmer, being encapsulated in the child_process class. Consider the first example:
child_process A("hello 1"); status = A;
This code defines the child_process object and then executes the child process, returning the status. In this case the programmer has no knowledge that a command line is even being constructed. Considering all the overhead involved in executing a child process, this code is more intuitive and saves a lot of typing.Some may not agree that this approach is more intuitive. As always, overloading operators has drawbacks in that it creates potential for confusion. However, if the logic behind the design is understood, the overloaded operators can provide much added functionality.
Conclusion
The uses for a command line obviously depend on the specific applications. I developed these classes in an effort to write a portable, automated test program. Although the child process presented here dealt specifically with the MS-DOS spawn command, I've used the command_line code itself on multiple UNIX platforms using the g++ compiler and openVMS on the Alpha. To use another operating system, replace the code using the spawn with the appropriate code for the host operating system.
In the automated testing tool, I found it helpful to use a command-line interface with the internal functions as well. In any event, these classes make creating a command line a much simpler process.
References and Further Reading
[1] Matt A. Weisfeld. "A Portable Library for Creating Child Processes," Dr. Dobb's Journal, vol. #200, p. 46.
[2] Ted Faison. Borland C++ 3.1 Object-Oriented Programming, 2nd ed. (SAMS Publishing, 1992).
[3] James O. Coplien. Advanced C++ (Addison-Wesley, 1992).
[4] Matt A. Weisfeld. Developing C Language Portable System Call Libraries (John Wiley & Sons Inc, 1994).
Matt Weisfeld is a Programmer Engineer at the Allen-Bradley Co. in Cleveland, Ohio. He has published several programming articles over the past few years, as well as a book, Developing C Language Portable System Call Libraries, published by John Wiley & Sons. Matt can be reached via the Internet at matt.weisfeld@cle.ab.com or on CompuServe [71620,2171].