| may2000.tar |
Simplifying Web Production
Reinhard Voglmaier In the beginning of the era of Web servers, everything was easy. A few people, mostly in their spare time, put documents on a Web server to make them available to the world. Now, however, Web production has become highly specialized. IT professionals don't have time to participate in the production of every Web page and, for security reasons, it is unwise to let everyone who is working on a Web page have the access privileges necessary to put that page into production. Our organization uses a methodical Web production process that provides clear lines of authority, lets Web designers operate in a test environment without security restrictions, and minimizes the number of users who can access the production systems. We divide the production process into the following roles:
The single Web editor instead can only put his documents on the test site. For this work, he doesn't need to have a userid with special access rights. The production site normally is accessed only by the automatic procedures running with the userid of the Web owner (normally the Webmaster). This article describes our process for bringing Web pages into production. This process is largely automated. I will describe the scripts we use for automating each step of the production process. For my purposes here, the Web site is an entity consisting of a collection of documents produced by one group and headed by a publisher, regardless of whether this entity is located on a server of its own or shares a Web server with other sites. The scripts described in this article are available from Sys Admin at: http://www.sysadminmag.com and also from my company at: http://www.GlaxoWellcome.it/Webmaster/Sysadmin. The Design of the Publishing System The intranet is divided into test and production stages. Publishing a site refers to the transition from test into production. The publishing system is designed using the previously described organization. The Web editors put their pages on an area on the Test Web server mounted via the SMB protocol (Samba file server). There may be several people putting documents on the same Web site. Every Web site corresponds to one file server. Once the Web editors have finished, the publisher declares the site frozen for inspection and examines the new or modified site. The publisher then reviews the site. This review covers style, layout, and most importantly, the content. If the publisher is happy with the documents, he opens an application (called P2P) in his Web browser that helps move his site into production. This application contains a list with all Web sites he supervises. Clicking on the Web site will show the last date of publication of the site and who executed the action. The publisher then only needs to submit the form. The procedure P2P actually does very few things. It writes a file containing the Web site to be put in production, the userid and name of the publisher who made the request, and the timestamp when this request was generated. This file is written in a directory that holds all the requests (details on the directory structure later). After generating this file, the procedure prompts for confirm the execution. The real work is done much later on. Automatic procedures now start. There's a cron-controlled procedure springing into life every five minutes, and it refers to the order book. This procedure makes a photograph of the Web site for each job. This means that the process produces a compressed tar of the Web site and puts the results in a convenient place (see below). After this, it sends email to the publisher informing him that the test Web site can be accessed again. A shell script to be executed for the update of the Web site is generated at the same time. In case the publisher decides to update further pages, it is sufficient to launch the P2P application again. The photograph and the entry in the order book will be overwritten. The real work will be done late in the night, when the server will be less stressed. A job scheduled by cron wakes up and looks in the directory where the shell-scripts are kept. Every script is executed, and the entry in the order book is cancelled. The Directory Structure Several directories are used to hold information and data and every request is describe by a one-line file: <User Name>,<User Id>,<Date>,<Location of WebSite>.
The application is made up of several parts: Library
The CGI script is located in a protected directory and the Web server delivers the userid (of the user who launched the request) in the form of an environment variable. The CGI script looks in the DBM database for what sites the user can put in production, who last updated it, and when. It generates the static HTML code and the object and constructor definitions of the JavaScript. The JavaScript handles all the dynamic features of the HTML object. With this information, it constructs the dynamic part of the JavaScript.
#!/usr/bin/perl -w
require "DBMUtil.pm"; # Utility functions to use the DBM database
use CGI ; # The CGI Library from Lincoln Stein
##################################################################
# S e t P a r a m e t e r s
#
my $DirRichieste = "/d01/Web/transfer/accepted/" ;
my $ConfigFile = "Editors.conf" ;
. . .
# Create CGI Object
$query = new CGI ;
print $query->header() ;
print $query->start_html(-title=>'Web Site Publisher',
-onLoad=>'InitForm()',
-script=>{ -language => 'JavaScript',
-src=>'./admin.js' },
-author=>'WebAdmin@Intranet.GlaxoWellcome.it');
&open_databases($DataBase);
&ReadData($Publisher) ;
($NamePublisher,$Role) = &getUserInfo($Publisher) ;
&close_databases();
##################################################################
##################################################################
print $query->start_multipart_form(-name=>'Administration',
-target=>'ProdWindow',
-onSubmit=>'PutInProd.cgi'
);
. . .
# Here we create a scrolling list that contains only a placeholder
# to keep sufficient place for the entries still to come.
print $query->scrolling_list(-name=> 'PathNames',
-values=> ['DUMMYDUMMYDUMMYDUMMYDUMMYDUMMY'],
-onChange=> 'ShowProperties(PathNames)',
-size=> 10 ) ;
. . .
The ReadData function actually generates this JavaScript:
function InitForm() {
InitDirList();
AddEntry($Directory,$Type,$Date,$UserId,$UserName) ;
. . . .
}
The AddEntry function is repeated for every Web site the publisher can put into production. The JavaScript program included in the HTML page contains all object declaration and all function definitions such as InitDirList() or AddEntry(). If the user submits the form, the script creates a file in the requests directory and prompts a confirm message. The file contains the necessary information to put the Web site in production: user name, userid, date, and Web site.The Administration Utility Everything is defined in a database. For user and directory administration, there is a Web-enabled application that allows updates to the database. The administration utility is written in the same way as the procedure to put a Web site into production. The CGI script generates static HTML code and the object definitions of JavaScript. The JavaScript objects are then filled with data from the database using the previously defined constructors. The administration can be delegated, because the user table defines whether the user is a publisher or an administrator. The form of the P2P application is generated on the fly by a CGI script based on the information found in the database. If the user is defined as an administrator, the CGI script generates a button to permit entrance into the admin mode. The cron Job The cron job is executed every five minutes. It reads the requests directory, produces a compressed tar file of the Web site to be put into production, cancels the file from the request directory, generates a file in the scheduled directory, and sends email to the publisher indicating that the photo is taken from the test site. The file in the scheduled directory contains the information of where the compressed tar file is kept and again userid, username, date, and Web site. At the end, it generates the shell script that really puts in production the Web site. Automatically Generated Shell Script The shell script that is generated automatically is very simple. Here is an example:
#!/bin/sh cd /htdocs/ProdServer/Research/Microbiology tar cvf - . | gzip > /d01/Hobbit/OldData/Research_Microbiology.tgz mkdir /htdocs/ProdServer/Research/Microbiology_new_<JobNr> cd /htdocs/ProdServer/Research/Microbiology_new_<JobNr> gzip -cd /d01/Hobbit/Data/Research_Microbiology.tgz | tar xvf - mv /htdocs/ProdServer/Research/Microbiology /htdocs/ProdServer/Research/Microbiology_old mv /htdocs/ProdServer/Research/Microbiology_new_<JobNr> /htdocs/ProdServer/Research/Microbiology rm -rf /htdocs/ProdServer/Research/Microbiology_oldSince the shell script generated contains all variables substituted with their actual values, it's very easy to debug. The new site is populated while the users can see the old site in production. The time when the state of the production site is not defined is limited to the execution of the last to move instructions. Last but not Least, the cron Job to Launch the Shell Scripts The last thing to do is to launch these shell scripts. The cron job looks in the directory of the scheduled jobs, launches the automatic generated script for every job, sends an email confirmation to the publisher, and cancels the entry in the scheduled jobs directory. Extending for Urgent Updates There are some pages that need several updates a day, so it's not possible to wait until night to update the site. However, this can be easily achieved. Since this regards only single pages, it's sufficient that we substitute the procedure that makes the photo (compressed tar) with a procedure that copies the relevant file from test into production. Obviously, there are no further steps needed. This means that we have two independent applications -- every application using its own set of scripts and its own database. Future Development The current solution still has some weak points. The first one is the way the pages are put on the test server. In order for every Web site to be independent from the others (and still allow sharing with other users), it must be a file server of its own. With the rapid growth of the Web sites grows administration overhead. If we take into account that there are even Web sites dismissed, we can imagine what will happen in the next 12 months. One solution is that Samba will support access control lists. This will allow having a few independent file servers and to delegate the administration of access rights to the admin responsible for this site. Another problem is the great waste of disk space. If Web sites become larger and larger, it's no longer possible to use this procedure. For example, an 800-Meg Web site containing many images and pdf files:
Conclusion In this article, I presented a system that gives the possibility to move documents from a test to a production server in a multi-user environment. The procedure to put the pages in production and the administration utility are both Web enabled. The access rights are defined in a database. The Web server does not have to run under a particular userid, because it writes in a staging area under no one's account. This area then reads the process running under a privileged userid. This system offers the advantage that there is only one entry point in the production system, which allows registration of the documents (e.g., in a database connected with a search engine), or assures putting in production only documents that respect the company standards, such as naming conventions. The system has its limits regarding large numbers of editors and a huge amount of space is handled. A new project will overcome these limitations.
About the Author
Reinhard Voglmaier studied physics at the University of Munich in Germany and graduated from Max Planck Institute for Astrophysics and Extraterrestrial Physics in Munich. After working in the IT department at the German University of the Army in the field of computer architecture, he was employed as a Specialist for Automation in Honeywell and then as a UNIX Systems Specialist for performance questions in database/network installations in Siemens Nixdorf. Currently, he is the Internet and Intranet Manager at GlaxoWellcome, Italy. He can be reached at: rv33100@GlaxoWellcome.co.uk.
|