Article	Figure 1	Figure 2	Figure 3	Listing 1
Listing 2	Listing 3	Listing 4	Listing 5	mar2006.tar

Automatic Program Generation with MySQL and PHP

Dick Munroe

It's all about leverage. During the most recent presidential election, I found myself "between engagements" and decided to go to Florida and work for the Kerry campaign. After spending three days with another volunteer building PCs and installing networks for the Florida Democratic Party headquarters and the north Florida offices, I found myself behind the keyboard building Web applications with the rest of the IT department. I should qualify that. I was the IT department, at least the technical staff part of it.

My responsibilities including designing, implementing, and deploying new Web applications for everything from e-commerce (donations, etc.), volunteer and candidate management, and results reporting and statistical analysis. All this had to be done as inexpensively as possible.

This article is one part praise for the free software community, another part discussion of one of the techniques I used to create applications quickly enough to be useful before the election, and one part praise for the people who staff state and local political offices. It's a tough job; they don't get paid much, and most of them do it out of the sheer hope that something better will result from the process.

The Computing Environment

What I did, mostly, for the Florida Democratic Party was to grind out Web applications. At the state and local level of political parties, cost is everything, and in the Unix environment, the free software movement has provided most generously. When I arrived, the principal Web development tools were MySQL and PHP used to deploy content from an Apache Web server running on FreeBSD.

MySQL is an SQL-compliant database capable of scaling smoothly from very small to very large data stores, supporting transactions and queries over the Internet, etc. It's free, very well supported, and performs well. We used it in every Web application deployed during my work with the FDP.

PHP (Personal Home Page) is yet another "Swiss army knife" language for developing Web pages. PHP version 4 is basically a third-generation programming language with object-oriented extensions allowing inheritance, polymorphism, and introspection. PHP also has an extension interface, and extensions (dynamically linked libraries) to the language have been written for everything from graphics to database interfaces. The FDP used PHP 4. Since the election, PHP 5 has been released with significantly improved object-oriented capabilities. All the work discussed here is implemented in PHP 4 and easily ported to PHP 5.

A full discussion of Apache and FreeBSD is beyond the scope of this article. However none of the work we did was specific to either Apache or FreeBSD. All that was needed was a server capable of running PHP and a platform that allowed integration of PHP with a Web server and communication with a MySQL server. Technically speaking, the server running Apache didn't need to be capable of running MySQL, but it was convenient that it could.

Each application at the FDP required the design of one or more tables in the FDP primary database. MySQL is pretty easy to deal with when doing rapid, interactive database design, and initially that's what we did. Eventually, we started using a terrific MySQL-specific database design and maintenance tool called phpMyAdmin. All the database table examples and figures for this article were created using phpMyAdmin. If you're going to be responsible for any aspect of MySQL database administration, phpMyAdmin is a must.

The development workstations were all Windows boxes running Windows/2000. It's not my favorite platform, but all the hardware, and I do mean all, was donated and workstations were constructed from whatever parts could be salvaged. Fortunately, there are free (or at least inexpensive) editors with language-specific extensions for the Windows environment. I also downloaded Cygwin (a Unix CLI layer for Windows) and used EMACS and other standard Unix tools for development on the Windows box.

Application Development at the FDP

The first job I tackled was fixing up an application written by a couple of college student volunteers doing volunteer management. The overall structure of the volunteer management application was done as follows:

[Present data to the user]

Get data from the user
Validate user data
Store/Update using MySQL

Each "layer" of the application was done in an ad hoc manner. Each page of the application basically duplicated whatever code was needed to interact with the user, the database, data validation, etc. An enormous amount of the code written for this application dealt with the interface to the MySQL database. Starting sessions, validating data, storing or updating data, and closing sessions were all coded explicitly. No attempt had been made to factor out the database access details into a separate function library or class, and the quantity of the code dedicated to dealing with the database obscured the details of the application under development, making it a lot harder to extend. Eventually, I finished this task and moved on to the next.

Accepting Donations

The next application was for accepting donations. The details of the application aren't important; what is important is another Web-enabled database application similar in kind, if not in detail, to the volunteer management application.

Given what I had seen in the volunteer management application, I wanted to come up with a more general view of the FDP's Web-enabled applications and then use that view to drive the creation of tools that would make it easy to implement those applications. After some thought, I realized that most of the applications used regularly by the FDP could be modeled as follows:

[Query MySQL Database]
Display data to the user
Collect data from the user
Organize collected data
Validate collected data
Store/Update data in MySQL

Overall, an application should do the following:

If data were being changed, one or more queries would be made to the appropriate database.
Data (if any) would be displayed to the user.
Data would be collected from the user.
Data would be organized in structures making it simple for validation and eventual storing into the appropriate database tables.
Data would be validated and any additional interactions with the user would be done to correct any errors.
Data would be stored or updated in the MySQL database.

Given that data was being stored in a MySQL database, I thought the underlying data abstractions should closely model the relational storage model (i.e., tables) while making the usual sorts of queries done by the applications easy and not prohibiting the writing of substantially more complex queries. Ideally, most of the "what data needs to be read/written" from the database would be "automagically" determined so that the applications could say things like "update the data" and the correct action would occur.

Clearly the place to start was a decent database abstraction layer that hid most of the tedious details of dealing with MySQL, while not overly restricting access to MySQL's lower-level features. This problem has been solved many times in the past, and a quick Google search turned phpclasses.org, an enormously useful site if you're into PHP programming.

The phpclasses.org site is dedicated solely to the collection and distribution of PHP class libraries. The code distributed by the site comes from all over the world and varies in purpose from the sublime to the ridiculous and in quality from completely professional to totally amateur. I've saved myself a lot of time and my clients a lot of money by using things I've found there either in whole or in part. If you're doing any serious PHP software development, you owe it to yourself to join phpclasses.org. It is free, but donations are accepted.

On phpclasses.org, I found the DB class. DB provided an object-oriented interface to MySQL. Note that I didn't care that the only database supported was MySQL. The FDP had standardized on a Linux, Apache, MySQL, PHP (a.k.a. "LAMP" or LAMP-like anyway since the platform was actually FreeBSD) environment so portability was not of immediate concern. One reason I chose the DB class was that, if necessary, DB could be ported easily to support other databases.

This solved the problem of cleaning up the tedium of accessing the database but did not address the problem of a general database table-oriented data collection that could easily interact with any database. Thus, I designed and wrote the SQLData class. This class has evolved over time, but basically it does the following:

Associates an instance of the class with a specific table in a database.
Is organized so that data to be stored in or fetched from the database can be manipulated by the name of the field.
Tracks the state of data in the instance, so that minimal updates can be automatically performed.
Provides iterators so that loops processing entire tables can be easily written.
Provides hooks for structuring data as arrays or object, potentially arrays or objects stored in other tables.

After designing the table for the donations application, I began the coding process. The table design for the donations application is shown in Figure 1. It's pretty straightforward, with much of the complexity resulting from the e-commerce interface rather than anything fundamental in the nature of the problem of donating money.

For every field defined in the table, we wanted a way to access, modify, and store the field's data. Even using SQLData, I had a lot of work to do. Each field needed a read/modify interface (plus additional ones that came in handy later, like controlling the "dirty" state of fields). This involved a lot of error-prone typing. Much of the work also looked to be "cut and paste", so I thought maybe I could get a computer to do it.

One of the greatest things about the architecture of SQL databases is that the meta-data, the data describing the database and its content, is stored in an SQL database. Simple queries with results like those shown in Figure 2 allow easy introspection of the tables in the database and the contents of those tables. Given that the meta-data is available to programs in general and to PHP in particular, it becomes relatively easy to write programs to process database table structures in a very general fashion. Table names, fields within tables, data types of fields, use of fields as keys, etc. are all available for processing.

Once I realized that this data was available, I quickly designed and wrote a simple PHP program to generate classes from the meta-data of a MySQL table. A day later, I had a program, buildClass.php, which reads the meta-data of a MySQL table in a database and emits a class derived from SQLData, which provides the framework for manipulating data within a single table of a database. Listing 1 is a partial listing of the generated class. Here is an explanation of the numbered sections of the listing.

1. The various underlying components of the application generation, in particular the SQLData class from which all specific table classes are derived and the Structured Data Dumper class that is used. To make the code generated by the programs referred to here, SQLData and SDD must be installed in your PHP include path. See the PHP documentation for details.

2. Every table-specific class is derived from SQLData, a class supporting generic table data storage and updating. Essentially, the table-specific classes are convenience classes to make dealing with specific tables easier. Note that the first character of the table name is in uppercase to make the class name.

3. This provides the binding between this class and the underlying MySQL table.

4. The constructor for the table-specific class. Because the table name is wired into the class, the remainder of the MySQL database access information must be provided when the class is instantiated. No data-oriented constructors are provided, because most of the table-specific class data initialization occurs either as a side effect of accessing the table through the underlying SQLData interfaces or from interactions with a user through Web forms.

5. The set* member functions set the named field to a value and note that the value is now "dirty" and should be flushed to the database when the next update or insert operation is issued. A set* member function will be created for each field in the table meta-data.

6. The get* member functions get the named field value and return it to the caller. A get* member function will be created for each field in the table meta-data.

7. The init* member functions are the same as set* functions, but the data is not set as "dirty" and will not be flushed to the database when the next update or insert operation is issued. An init* member function will be created for each field in the table meta-data.

8. The un_set* member fu nctions delete data for a field from the underlying SQLData class. Once deleted, the field data is no longer considered in insert or update operations. An un_set* member function will be created for each field in the table meta-data.

9. The is_set* member functions are predicates returning true if the field has data in the underlying SQLData class. Data is stored for a field using either the set* or init* member functions. An is_set* member function will be created for each field in the table meta-data.

10. The insert and update member functions (available via SQLData) both take an optional selector to indicate which record in the table should be modified. For properly designed tables with keys, it is generally possible to provide a set of default selectors, depending upon which keys in the table currently have data associated with them, to be used when a selector is not provided. The needUpdateSelector member function is overridden when possible to provide the default selectors.

11. For tables with indices that are auto_increment fields (see Figure 2), the auto_increment fields are updated automatically by MySQL after an insert operation has succeeded. The insert member function is overridden as necessary to make sure that the value of the auto_increment index is maintained when a new row is inserted in the table.

12. Last, but not least, debugging. The print_r member function dumps the content of the object (and its base objects) in a structured format that makes it easy (or at least possible) to see what's happening within the table-specific object. If the execution environment is a Web server, then the data is dumped into HTML format.

This process simplified the job of developing the donations application enormously, and I then began building the user interface that would use the mechanically generated table class.

Data Collection and Validation

Very shortly, it became clear that the user interface was a problem similar to that of the table classes (i.e., there were one or more tables for which forms had to be generated). Furthermore, the data was to be syntactically and semantically verified, any errors were to be corrected by further interactions with the user, and the data stored in the appropriate tables in the MySQL database. Because I had a pattern for generating executable "stuff" from MySQL meta-data, I decided to see what, if anything, could be done to generate rough drafts of the forms necessary to collect data to be stored in the necessary table.

To keep the collected data from corrupting the tables (garbage-in, garbage-out applies here), the collected data had to be verified. I partitioned the data verification into two distinct types:

Syntactic
Semantic

Verification could be done in two places, the client or the server. To improve responsiveness, syntactic validation using JavaScript (now EMACScript) is done at the client (i.e., the Web browser and semantic validation at the server).

Examples of syntactic validation are:

Zip codes must contain only digits and hyphens and can be either of Zip (01234) or Zip+4 (01234-5678) format.
Telephone numbers must be digits and must be 10 digits long.
Required files must be non-blank.

Occasionally, there would be a field that implied that others would no longer be optional, and I defined those to belong to syntactic verification.

I defined semantic validation to be "checking that data is meaningful in a given context". Examples are:

Is the county or state name real?
Does the city exist?
Is the credit card number valid?

Basically, semantic verification answers the question "does the data represent reality" in the context of the application. In many cases, semantic validation can be built in by restricting the input values to a particular range (i.e., forcing the user to select from a list of values such as county or state names).

So, there were three additional pieces to be examined for automation opportunities:

User Interface
Syntactic validation
Semantic validation

User Interface

As shown in Figure 2, the table meta-data has the basics, field name, data type, size of data, and whether data is required (not null). Given this information, it was easy to write another program much like the buildClass.php program to construct a simple user interface using HTML forms to display and capture data and to link that interface to the client-side syntactic validation framework and the server-side semantic validation. Figure 3 has been edited for space reasons and shows part of the user interface generated for the Example table.

It is important to remember that my goal for the FDP was not to produce a completely polished and fully functional Web application solely from MySQL meta-data. It was only to produce something that, with not much effort, could be turned into a polished functional Web application.

A quick look at the HTML generated by buildForm.php (Listing 2) is instructive. Here is an explanation of that listing:

1. These are the hooks to the "syntactic validation framework", discussed more fully in the next section. Unless buildSyntacticValidation.php has been run before buildForm.php, the JavaScript components of the syntactic validation will not be included.

2. This is where the syntactic validation framework is actually invoked. When a Submit button is clicked, the onSubmit code is called, and the form is not actually submitted unless or until all syntactic errors have been corrected. If buildSyntacticValication.php has not been run, the onSubmit code is omitted and no syntactic validation will be done.

3. Another hook for the syntactic validation framework. This row is where the error information (if any) will be displayed by the framework.

4. This is a typical required form field. Validation for the field is provided by the validation attribute. Note the required attribute for required fields is non-null.

5. This is a typical optional form field. Validation is still required (the validation always succeeds by default) and is done for consistency. Note that the required attribute is the null string for optional fields.

6. The action portion of the user interface. Save causes the captured data to be [optionally] syntactically validated and sent to the server for further processing. New clears the form and starts the data capture process over. Reload discards any data changes and starts the data capture process over.

Syntactic Validation

The syntactic validation requires JavaScript (now known as ECMAScript). Most modern browsers support JavaScript, so this requirement isn't all that restrictive, excluding text-only browsers such as lynx or links. It relies on the Document Object Model (DOM) as defined by the World Wide Web Consortium.

This gets a little more troublesome because Microsoft's Internet Explorer, in particular, is not very compliant with the DOM. It was possible to design an adequate browser-independent framework providing primitives to handle collection and displaying of error data, validation of required fields, and a driver to validate a form upon submission.

Since my goal was to generate most, but not all, of a Web-enabled application automatically, the validation hooks had to be associated with the individual fields rather than be generated monolithically. Furthermore, not every form would necessarily need syntactic validation.

This is handled by running (or not running) buildSyntacticValidation.php. This program generates the JavaScript routines to do the syntactic validation for each field in the form. If buildSyntacticValidation.php is not run, when the form is generated by buildForm.php no syntactic validation will occur at the time the form is submitted.

The syntactic validation framework is provided in a separate JavaScript file, syntacticValidationFramework.js (available from the Sys Admin Web site: http://www.sysadminmag.com/code/). Understanding the details of the DOM that allow the framework to work is left as an exercise for the reader.

Listing 3 shows the hook between the user interface (form) and the syntactic validation framework. The onSubmit action is taken when a submit button is clicked. A pointer to the form object requiring validation is passed to the framework and the contents of the form used to determine which validation needs to be done. If the validation framework returns false, the form's data is not sent to the server.

The Document Object Model requires all attributes of a tag to be represented. This means that the page designer can put anything into an arbitrarily named attribute. Each field to be validated must have a unique id, an indication of whether the field is required, and a pointer to the routine to be used to validate the field's contents.

Listing 4 shows a typical input tag with the hooks for the syntactic validation framework. Here is a quick explanation of the listing:

1. The name of the field received by the server during semantic validation and processing.

2. The unique id of the field. By convention, it is always the same as the name field. The DOM interface is most easily used if each tag has an id by which the field can be found.

3. If the field in the database must have a value, the required field must be non-null and must contain the message to be presented to the user should the contents of the field be omitted.

4. Invocation of the validation framework routine for the field.

BuildSyntacticValidation.php generates a JavaScript routine identical in all but name for each field in the user interface. The validation framework calls the validation routines with a pointer to the field to be validated. Each validation routine must return either false (validation failed) or true (validation succeeded) and prepare any error text to be displayed at the end of validation.

The current syntactic validation framework can be easily modified to fit a variety of error reporting and interaction styles without modification to any of the generation code. Or, something completely different can be written to meet site-specific requirements. After all, the generating code is in the public domain.

Semantic Validation

The data hits the database in semantic validation. Assuming that the form passes syntactic validation when the user presses the Save (Submit) button, the contents of the form are sent via HTTP to the server. For the purposes of the FDP, semantic validation basically took the data from the form and put it in the databases.

The semantic validation code is generated by buildProcessForm.php. The semantic validation actually doesn't happen by default. As generated by buildProcessForm.php, data was either inserted into or updated in the database and then control returned to the user for further work. Any semantic validation was considered custom code to be written by hand. This is consistent with my goal of generating most but not all of the application.

The Complete Process

I have cleaned up the application generation code somewhat for this article, so while all these pieces happened, they didn't happen as distinctly as I describe here.

At this point in the process, we had a set of tools that allowed the FDP applications to follow a very well-defined, data-driven software development process. All of these tools were either free software, shareware, or easily developed in-house. The tools and process were:

1. (phpMyAdmin) -- Design the database schema for the application. Database design was frequently a group interactive event, starting on a blackboard and quickly moving to a Web browser or terminal. Only one database design only took more than an hour from discussion to completed design.

2. (buildDatabaseConf.php) -- Generate the configuration file containing the information necessary to access the MySQL database.

3. (buildClass.php) -- Create the PHP class to make it easier to manipulate the data. All the PHP code generated by the build* programs use the generated class.

4. (buildSyntacticValidation) -- Create the individual field validation routines for the syntactic validation framework.

5. (buildForm) -- Create the user interface.

6. (buildProcessForm) -- Create the PHP code to store/update data in the database.

7. (Apache, FreeBSD) -- Install the generated application on the development Web server.

8. (Netscape, Internet Explorer, MySQL) -- Test the user interface/database interaction.

9. (emacs, UltraEdit, Putty, WinSCP) -- Modify the generated application to fully function and integrate with the production FDP Web site.

With these tools, we could go from the completion of a database design to a prototype application in a matter of minutes.

Conclusion

The techniques associated with automatic program generation are well understood and widely used in many application generators on any number of platforms. The general case is extremely difficult to solve, even if the problem is restricted to simple Web pages with a database as a backing store. I'm working on more general versions of the tools I developed for the FDP, but progress is slow at the moment, with clients and other interests interfering with the process.

My goal at the FDP was to do most of the work, and I spent a few days writing tools that would make turning out finished applications much faster and easier. Experience showed that 80% or so of the finished application could be automatically generated. Finished applications, fully integrated with the production FDP Web site, were frequently produced in a day with the vast majority completed in less than two days. In all likelihood, similar productivity levels can be achieved elsewhere with site-specific versions of the tools described here.

All code developed for the FDP discussed in this paper is available from:

http://www.phpclasses.org

You will have to join phpclasses.org to download the code, but the membership is free.

PHP is available from:

http://www.php.net

MySQL is available from:

http://www.mysql.com.

These tools can be built and run on Unix, Linux, and Windows platforms.

If you want a Unix-like environment for your Windows box, Cygwin is available from:

http://www.cygwin.com

If you're going to do any serious MySQL database development, phpMyAdmin is a must and can be downloaded from:

http://www.phpMyAdmin.org.

All of these are free software.

Acknowledgements

I thank Chris Sands, the director of IT for the Florida Democratic Party, for permission to publish this work. Many thanks to all the folks who supported the FDP during the 2004 campaign; lots of nights wouldn't have been possible without the free Diet Coke and junk food. I also thank Manuel Lemos, the creator of http://www.phpclasses.org. My job at the FDP would have been substantially more difficult if his site didn't exist. And, last but definitely not least, thanks to the many unnamed developers who have created so much fine software and put it into the public domain, particularly those folks who have contributed to PHP, MySQL, Apache, Cygwin, and FreeBSD. Without those tools, the work we did in Florida would have been impossible.

Dick Munroe is a software engineer, architect, and consultant with nearly 40 years of software and project experience ranging from the sublime to the ridiculous. He grinds code and wood from his offices at Cottage Software Works in Belmont, Massachusetts, Havana, Florida, and Guanaja, Honduras. When playing he can frequently be found at the top of mountains wondering whether they will find the pieces come springtime, or deep in the water worrying whether that shark is really as hungry as it looks. If you need more information about the work described herein or just want to touch base, contact the author at: munroe@csworks.com.