Matt is a computer-science student and a WWW consultant. He can be reached at mkruse@saunix.sau.edu.
If you've done any browsing at all on the World Wide Web, you've probably seen pages that contain information or images generated on the fly that change each time you load the page. Whether it's a counter to tell you that you're visitor #1000, a greeting that says "Welcome, visitor from yoursite.com," or an up-to-the-minute schedule for a tourist attraction, it can be created using server-side includes.
Server-side includes (SSIs) are simply commands embedded inside regular HTML documents that make your page do something different each time it is loaded. By using SSIs, you can create dynamic pages that will make your site stand out from the WWW pack. In this article, I'll describe the format of these commands, show how they work, and discuss how you can write programs to work with your Web pages.
Most WWW servers have a fairly simple job: When a request for a page comes in, they load the correct file from disk and send it out to the right place. If you put SSI commands inside your HTML document, however, things must happen a little differently. First, you need to tell the server to parse the document for embedded SSI commands, which are to be executed before the document is sent. By default, most servers will not parse documents at all, so any SSIs will simply be ignored. If you haven't put any commands inside your documents, there's no reason for the server to search every single file before sending it to the client. Likewise, if you use SSIs in only a few pages, the server needn't parse those pages that don't have them. So, most servers have a way to distinguish which pages should be parsed and which should not.
Usually, the server decides what to parse based on the filename. If the file ends in .html, it is delivered unparsed; if it ends in .shtml, it is searched for SSI commands. Of course, you can change this convention. For those servers that can handle the extra load of parsing all documents, you may wish to enable SSI parsing for all .html files.
Some servers (the CERN server, for instance) do not yet support SSIs. Also, the commands and formats are not a standard and may vary from one server to another. The examples I present here work correctly with both the NCSA httpd and Netscape servers, and should also work with other servers, or at least be close enough to modify easily.
What do SSI commands look like, and how does the server handle a command once it is found? All SSI commands are conveniently placed inside standard HTML comments so that the commands will not be displayed if a page ever gets to users without being parsed. In HTML, any text between "<!--" and "-->" is considered a comment. Consequently, the general format for an SSI command inside an HTML document is as in Example 1(a). Several different commands are available, each taking its own arguments using either one or two tags.
The #exec command executes an external program and includes its output in your document. It takes one of two options: cmd, which executes a single command using /bin/sh; or cgi, which gives a virtual path to a program to be executed. For example, to call a program in the same directory as the HTML page that the command is located in, you would use the command in Example 1(b).
Once you understand how commands are placed into HTML documents, the next step is, of course, to write a program to be executed. Your program can be written in any language that is executable on the server. This means C, Pascal, awk, Fortran, and so on, although the most common choice is Perl. Most scripts are written in Perl because it is easy to write, has great text-manipulation capabilities, and doesn't require lengthy compiles. Also, many Perl scripts and libraries are available on the Internet for downloading and use. All the examples in this article are in Perl.
The interaction between the program and the HTML page is simple. Whatever information is sent to STDOUT from your script is inserted into the page and treated just as if it were there to begin with. On some servers, anything sent to STDERR will get appended to the error log, which is helpful in debugging.
Every program used in an SSI must return two things: a MIME type identifier and any output to be included in the document. As you can see from Example 1(c), the first line output by the program is the MIME type; this tells the browser how to handle the data you'll be sending. In the example, this is "Content-type: text/html", which says that the information that follows is of type text and subtype html. The two newlines at the end of the MIME type header are always required. (One of the most common mistakes that new programmers make is not returning the MIME header in the correct format, including the empty line, which results in an error.)
After you've told the browser how to treat your data, any output from your program will be inserted into the document. The server doesn't check whether you've returned anything that it isn't ready to handle--for example, an image or other binary data--so be careful about what you send. The third line in Example 1(c) simply places the text "Hello, World" into your document wherever the SSI call is located. Example 2(a) is the original HTML document. Once the server parses the document and executes "script.cgi", the actual HTML code is sent to the user's browser; see Example 2(b).
If users view the source code for the just-received HTML page, they will see the processed HTML, not the source that contains the SSI command. Users don't even know that a command was executed. This example of a script that inserts the same text each time the SSI command is invoked has no practical value. Instead, SSIs are usually used to return output that changes with each load of the page. A better example, one that represents perhaps the most-common use for SSIs, is a "counter" script. You've probably encountered this many times: a Web page with a counter that keeps track of how many times the page has been loaded and displays the current count to each visitor.
Example 3(a) presents a simple counter script that uses a file to store the current number of accesses to the page. Each time the script is run inside an SSI, it reads the current count from the file, increments the count, writes the new value back to the file, and prints the value to STDOUT. The script also uses file locking, which is vital in a script such as this. Without file locking, two people accessing your page at the same time would both cause the program to read and write to the file at the same time, possibly causing the count to be lost.
After printing out the MIME header in line 2 of the script, the file containing the counter value is opened with both read and write access. The program then gets an exclusive lock on the file using flock(), reads the first line of the file into the $count variable, resets the file pointer to the beginning of the file, and increments the variable. The program then prints the variable twice--once to STDOUT to insert it into the HTML document, and once to the file to store the new value. The exclusive lock is then given up, the file is closed, and the program is finished.
Example 3(b) includes a counter in an HTML file. When the count is returned from the script, the count is placed between the <B> and </B> tags, making the count appear bold in the browser. Using this script inside of a WWW document will let you keep an ongoing page-access counter.
Although the previous counter script will do the job just fine, some improvements can make it more flexible. Rather than simply keeping track of the count for a single page, it would be nice to modify the script to keep track of counts for any page that calls it. You could then use a counter on any page on a WWW site by inserting the same command and have it automatically keep separate counts for each page.
You can accomplish this with only a few modifications to the original program. One environment variable available to scripts running under the NCSA (and some other) servers is DOCUMENT_URI, which contains the virtual path and filename of the HTML page that the script is being called from. You can then use this to detect which page is making the counter request and load the appropriate file.
Example 4 is a modified version of the counter script that incorporates these changes. First, it replaces all "/" characters from the calling URL with underscores to create a valid filename. If the resulting filename exists, it is used as the counter file, just as before. If it doesn't exist, the file is created and 0 is inserted as the current count. This type of counter script would be especially helpful for a commercial Web-presence provider who wants an easy way for customers to add counters.
While the #exec command may be the most useful and flexible, other SSI commands are useful in HTML documents. Table 1 presents some additional SSI commands found in the NCSA server.
This access-counter example is a useful script, but it is only one of many possible uses for SSIs. For example, SSIs can display a random image, present a welcome message to a user, update a private log of access statistics, output different text depending on the user's browser, or automatically redirect the user to a different page. SSIs can make your Web pages more dynamic and interesting, and those who visit your site will have another reason to come back.
(a)
<!--#command tag1="value1"
tag2="value2"-->
(b)
<!--#exec cgi="script.cgi"-->
(c)
#!/usr/bin/perl
print "Content-type:text/html\n\n";
print "Hello, World";
(a)
<html>
<h1>Sample output</h1>
<p>Here is the script output:
<!--#exec cgi="script.cgi"-->
</p>
</html>
(b)
<html>
<h1>Sample output</h1>
<p>Here is the script output:
Hello world!</p>
</html>
(a)
#!/usr/bin/perl
print "Content-type: text/html\n\n";
open(COUNTER,"+< counter.txt");
flock(COUNTER,2);
$count=<COUNTER>;
seek(COUNTER,0,0);
$count++;
print "$count";
print COUNTER "$count";
flock(COUNTER,8);
close(COUNTER);
(b)
<html>
<h1>Welcome</h1>
You are visitor #<b><!--#exec
cgi="counter.cgi"--></b>.
<p>
</html>
#!/usr/bin/perl
print "Content-type: text/html\n\n";
($PAGE = $ENV{'DOCUMENT_URI'}) =~
s|/|_|g;
unless (-e $PAGE) {
open(NEW,"> $PAGE");
print NEW "0";
print "0";
close(NEW);
exit(0);
}
open(COUNTER,"+< $PAGE");
flock(COUNTER,2);
$count=<COUNTER>;
seek(COUNTER,0,0);
$count++;
print "$count";
print COUNTER "$count";
flock(COUNTER,8);
close(COUNTER);
Command Format include file <!--#include virtual="path/to/file"--> echo <!--#echo var="Environment Variable"--> file size <!--#fsize virtual="path/to/file"--> last modified <!--#flastmod virtual="path/to/file"-->