Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementors of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Nech Way, Reston, VA 22091 or via UUCP at uunet!aussie!rex or aussie!rex@uunet.uu.net.
Operating systems such as UNIX, MS-DOS, and VAX/VMS support defining symbols that a user program can access. The program supplies the name of such a symbol and is given the string value currently assigned to that symbol. All examples in this article were run using Microsoft's C compiler v6 under MS-DOS v3.3. (Other systems may require different commands to achieve the same results.)
I will refer to these symbols as environment variables. The set of such variables defined at one time makes up the program's environment.
Defining An Environment Variable
The simplest way to define an environment variable is via an operating system command. To define a variable ABC under MS-DOS, you use the SET command.
set abc=defTo view all variables defined in the current environment, use SET without arguments. My system produced the output
COMSPEC=C: \COMMAND. COM PROMPT=$p $g PATH=c:\dos;c:\util; ABC=defNote that while MS-DOS preserved the case of the variable's definition, it converted the variable name to upper-case (from abc to ABC). (There is one exception to this, however.) You can define the PATH variable in one of two ways and the results from each can differ. For example, if you enter
PATH c:\;the definition of the variable is converted to upper-case. However, in the following case, it is not.
SET PATH=c:\;This difference might be an issue if you are looking for a specific case only in a variable's definition. In addition to SET, the library function putenv can also be used to define a variable.
Accessing The Environment
Many systems support a third argument to main and traditionally that argument has been called envp. Like argc, envp is an array of pointers to char where each entry points to a null-terminated string containing a variable and its definition, separated by an equals sign. An entry containing NULL indicates the end of this array. Listing 1 displays the contents of the current environment given the previous definitions.Since envp is an array passed by address, it is the address of the first pointer that is actually being passed. Therefore, I could also have declared envp as
char **envp(just as you can declare argv).ANSI C does not include envp, although this feature is mentioned under "Common Extensions." envp, however, is part of UNIX SVID and POSIX.
Another UNIX feature sometimes available under MS-DOS is a global object called environ. This is typically declared as
/* stdlib.h */ extern char **environ;This declaration allows the startup code to point environ to the beginning of an array of pointers to char (representing the environment table). The user can access the table by subscripting environ (See Listing 2) . If the environment table is moved for any reason (as discussed later), environ is simply made to point to the new location.Note that while numerous compilers define environ as previously shown, they document it as
extern char *environ[];implying instead that environ is an array. It is not; it is a pointer.environ is not part of ANSI C. You should conditionally omit its declaration from stdlib.h when compiling in strict ANSI mode.
Defining Variables Via The Library
ANSI C defines a library routine getenv to access environment variables from within a program. The getenv in Listing 3 finds a defined variable by performing a case-insensitive search on the variable name.It is interesting that ANSI C contains this function, since the standard does not define what an environment is or how it works. If nothing like an environment exists for a given standard C system, getenv can simply return a NULL value for any given variable to indicate no match was found (or in this case, that no environment exists).
Changing The Environment
UNIX and some other systems provide the putenv function to let the user add a new variable definition or change an existing one. However, putenv is not part of ANSI C.Listing 4 defines a new variable test and then invokes a text editor called see using system. I make the editor load a copy of COMMAND.COM to display the current environment table. As expected, it contained
envp[4] ==> |test=1234|However, when the original program terminated, test was no longer defined. That is, when you use putenv to define a variable, that definition remains in force only while that program is running. Any work putenv does is reflected from within that program, and by any programs invoked from that program using system, exec, or spawn. Each spawned subprogram inherits its parent's environment. You cannot, however, use putenv to change your top-most level environment.When given a variable definition, putenv either adds the definition to the environment (if no such variable exists) or changes the existing definition. Consequently, the environment can grow. To extend the environment table pointed to by environ, the library may have to allocate new space at runtime and copy over the existing contents. (It could use realloc, for example.) As a result, the table pointed to by envp will no longer be the current table. Therefore, once you use putenv, you should access variable definitions using getenv rather than envp.
The actual strings representing the definitions are actually part of the user's program. For example in Listing 5, ev is an automatic array containing the definition for test. ev is defined via putenv, and one of the environ table entries is initialized to &ev[O]. As a result, changing the contents of the array ev indirectly changes the definition of the variable test. (As shown, the 2 is changed to x.)
By changing ev to contain "xyz", the pointer in environ now points to that string and the definition of test is lost. When the environment table is displayed, the following entry is shown:
envp[4] ==> |xyz|As a result, you have an invalid format entry in the table.You've seen now that modifying such strings directly in your program might indirectly affect the contents of environ. In particular, if you pass an automatic array to putenv and then execute a return statement, that array will no longer exist and its environ entry will point to where it used to be. A subsequent attempt to access that variable using getenv will result in undefined behavior.
If ev is an automatic pointer instead, the problem of table corruption does not occur. For example:
char *ev = "test=1234"; ev = "xyz";produced the output
string defined as >1234< string defined as >1234<The string literal "test=1234" is a static array of char allocated at compile-time. When the address of the string is given to putenv, it is placed in environ. When ev is made to point elsewhere, the entry in environ remains intact. This would not be true if the contents of the memory to which ev points were changed.
Constructing Full Directory Paths
Some MS-DOS compilers provide a library function called _searchenv, which allows traversing a variable definition containing a list of directory path names. Given a filename and a variable name, _searchenv searches all directories specified by the variable to construct the full pathname for the specified file.The _searchenv in Listing 6 is case-sensitive. The variable PATH does not match path. The file link.exe is found and has a full file specification of c:\dos\link.exe. As you can see, the listing preserves the filename's case in the path generated. While this is not a problem on MS-DOS (since the filenames are not case-sensitive), you should check case-sensitivity on other systems.