Command-Line Argument Processing & the Mgetopt Library

C/C++ Users Journal November, 2004

A tool for C and shell programs

By Eugene Surman

Eugene Surman received a degree in Radio Engineering from Moscow College of Electro Communication. He currently is a senior analyst developer for Goldman Sachs. He can be reached at esurman@inch.com.

Command-line options represent the "face" of the program, and tools for options parsing should be convenient and easy to use. For me it was always a tedious job to write the awkward getopt loops, especially in shell scripts. Frankly, I never understood why the getopt function is not designed in the manner of getenv. While libraries such as getopt, getopt_long, popt, and argp are powerful and rich, they don't provide utilities for parsing command options for the shell scripts.

With this in mind, I wrote my own getopt library, which I call "mgetopt." I was trying to implement a solution that is equally easy to use in C and shell programs. Mgetopt supports short (see Listing 1), and GNU-style long and longshort options. It also a has number of convenient features, such as:

Mgetopt's distribution is available at http://www.cuj.com/code/ and http://www .inch.com/~esurman. Mgetopt is written in C and is available under the GNU license.

In general, you need only the functions mgetopt_parse and mgetopt to get the job done. And most important, you don't need to write getopt loops in C or shell programs anymore.

As you probably already guessed, I utilize a hash table to hold and retrieve options. In fact, I use a standard UNIX hsearch library, more precisely the GNU version. For nonLinux platforms, I include the file hsearch_r.c from GNU glibc to the mgetopt distribution.

There is a mgetopt utility to use in shell scripts. A mgetopt is simply an executable wrapper around the mgetopt library. The shell's eval function evaluates the output of mgetopt into the shell's variables of the following format:

$opt_<name>   (like:  $opt_a, $opt_d)

As Listing 2 illustrates, I use Perl's style convention for shell's option names. All you need to do is check whether an option variable exists.

Mgetopt Library Functions

In C, argument parsing is managed by the mgetopt_parse function:

int mgetopt_parse( const char* shortopts, 
                  const char* longopts,
                 const char* longshortopts, 
                   int argc, char** argv);

mgetopt_parse returns 1 if a parse is successful. Parsing stops as soon as the first nonoption argument is encountered (POSIX style). mgetopt_parse prints an error message on the standard error and returns 0 when the parse fails.

The first parameter of mgetopt_parse is shortopts, a string of short option letters. The format of the short option string is the same as in the standard getopt. If a letter is followed by the flag ":", the option is expected to have an argument. I also implemented an additional flag "@" for the numeric arguments. If a letter is followed by "@", the option is expected to have an argument with NUMERIC value. mgetopt_parse prints an error message on the standard error and return 0 when the argument of the "@" option is not numeric.

An example of a short option string definition is:

mgetopt_parse( "abc:d@", 0, 0, argc, argv);

where a,b are short options with no arguments, c: is the short option with a string argument, and d@ is the short option with a numeric argument.

The second longopts parameter is a string of long option names separated by commas, ":", or "@" characters. The flags ":" and "@" placed after the option name indicate the option with a string or numeric argument, respectively. Space characters are ignored.

An example of a long option string definition is:

mgetopt_parse( 0, "ignore-case, file: 
  delay@", 0, argc, argv);

where ignore-case is a long option with no argument, file: is a long option with a string argument, and delay@ is a long option with a numeric argument.

The third longshortopts parameter is a string of long option names, where the first letter in each name automatically becomes a short option. Names are separated by commas, ":", or "@" characters. You may also explicitly define a short option letter by placing it into parentheses "(x)" after the long option name.

An example of a longshort option string definition is:

mgetopt_parse( 0, 0, "ignore-case, input-file(f):", 0, argc, argv);

where ignore-case is a longshort option with no argument, and input-file is a longshort option with a string argument.

After mgetopt_parse successfully completes, it saves all command-line options in a hash table. The option values are then accessible by the mgetopt function:

const char* mgetopt( const char* opt_name);

The mgetopt function returns a pointer to the option value if an option is found in a hash table; otherwise, it returns NULL.

There are also a number of useful, predefined option names: NAME, the name of the program invoked (argv[0]); BAD, a list of "bad" option names detected; IND, an index of the next nonoption argument to be processed; SHIFT, the number of options processed, which could be used as an argument in shell's shift command; and HELP, which holds a help text if it is defined.

My solution for help text is to embed text inside of the option string by placing it into curly braces. mgetopt_parse removes braces and text from option strings and saves them into the hash entry HELP. This approach lets me implement the help facility both for C and shell programs.

An example of a short option string definition with help text is:

mgetopt_parse( "a { -a print all} s { -s silent} h { -h help}",
               0, 0, argc, argv);

There are two additional, convenient functions that are available only for C programs:

Listing 3 illustrates most of mgetopt's features just presented, while Listing 4 is an example of the shell script.

Conclusion

The mgetopt library I presented here utilizes a hash table to simplify the user interface and eliminate parsing loops. Most of mgetopt's features are available both for C and shell programs.