Dr. Dobb's Digest July 2009

Debugging MySQL Stored Procedures

Being able to do low- or no-cost logging with Stored Procedures is an extremely useful technique

By Brian Tarbox

Brian Tarbox is a Principal Staff Engineer in Motorola's Home and Network Mobility group.

Stored Procedures are programs that execute within a database server. They are usually written in a database language such as PL/SQL or ANSI SQL:2003 SQL/PSM. (Granted, some database servers do support Java stored procedures, but I don't examine them here.) There are any number of books for learning to write Stored Procedures -- MySQL Stored Procedure Programming, by Guy Harrison, and Teach Yourself PL/SQL in 21 Days,by Jonathan Gennick and Tom Luers, come to mind), but there are a handful of general reasons to write code in a stored procedure:

The logic being implemented might be database logic and so a database language is closer to the problem domain than a general purpose language like Java.
A stored procedure can be significantly faster than a Java program which might make multiple calls to the database.
A stored procedure can be more secure.

Regardless of the reasons for choosing to write a stored procedure, the problem remains of how to debug one. What if you could debug in both development and production at little to no cost to the performance of the Stored Procedures? Traditional debuggers do not generally work with stored procedures which can leave a developer with a fast and broken procedure executing within their database server.

Approaches That Don't Work

Debug the SQL in your Stored Procedure. This approach works on the assumption the main logic of your Stored Procedures is the actual DDL and DML within the procedure, in other words, the queries, inserts and so on. It assumes that the rest of the Stored Procedures is largely scaffolding to support the database operations. In many cases this is a valid assumption, after all if the Stored Procedure wasn't manipulating the database you probably wouldn't have written it as a Stored Procedure.

It goes without saying that regardless of how much non-SQL code you have in your Stored Procedures you need to validate the SQL itself, especially since this level of testing can be relatively straightforward. It can be as simple as starting your database command line tool (or query browser for the gui inclined) and pasting in the guts of your SQL statements to verify correctness. This of course goes beyond simple syntactic correctness, you must validate the semantic correctness as well.

In some cases however it's not quite that simple, for a couple of classes of reasons. First, your SQL code can (and usually will) rely on variables and parameters that have been defined and/or manipulated by the Stored Procedures. If you have a select statement that stored its results into a variable, and then a later SQL statement that uses that variable, then your "paste the sql into the command line" approach of testing gets a bit harder. You have to insert the one or more statements, execute them, perhaps creating temporary variables along the way, and possibly modify the SQL you are actually trying to test. This happens by degree but you can certainly reach a point where it's clear that you're no longer testing the SQL you started with.

The second class of problem with this approach is that often the logic of the Stored Procedures lives in the procedural code of the procedure and not in the SQL statements themselves. SPs are commonly used to instantiate business logic -- and this is usually embodied in the flow of the code through the procedure or procedures. In this kind of situation simply bench testing the SQL statements does not really test the procedure.

Insert print statements in your Stored Procedure. Another common approach is to sprinkle print statements throughout your procedure. This has also been described as "Sherman set the way back machine to 1980" or so when print statements were about the only game in town. This approach can actually be very useful, especially during the early stages of development. Each database server tends to have its own way of doing print statements and each has their own idiosyncrasies. For example when using MySQL concat() calls to build up a string to output you have to guard against null values, which turn your entire string to null. For example, the following code can be danagerous:


select someColumn from someTable into myVar where.
concat('better hope myVar is not null', myVar);

If the where condition results in no rows being selected then myVar might be null and the output of the concat will also be null. It's better to use concat_ws("delimiter", "text to store") which handles null values appropriately.

There are two main drawbacks to using print statements in this way. First, the print statements are live during production (unless you guard each one with a conditional flag), meaning that you pay the significant performance penalty for logging all the time.

Second and more serious is that if your stored procedures are invoked from a Java application, the print statements don't go anywhere. The print statements can only be seen if you execute your Stored Procedures from the command line. What's the point of log messages that you can't see?

Develop a rigorous set of return codes. In this approach you define a detailed set of return codes to cover all interesting cases. The implied contract here is that a given specific return code tells you everything you need to know about the execution of the procedure. Theoretically this is a fine approach but in the real production world it tends to fall apart. A return code might tell you what finally went wrong with a procedure but it's just too easy to imagine needing to know more about how the procedure got to that failure condition.

Put another way, if you get a support call from your most important customer at 3:00 AM do you want to have to a grand total of one return code to tell you what went wrong?

The Proposed Approach

The current approach uses several of MySQL's special features to create a logging scheme that is both robust and practically free. We create two tables: a temporary log table using the Memory engine and a permanent log table using the MyISAM engine. MySQL supports several storage engines that act as handlers for different table types. MySQL storage engines include both those that handle transaction-safe tables and those that handle non-transaction-safe tables. The memory engine performs all operations in memory and never writes to disk. As such it is very fast, though transient. The MyISAM engine is a non-transactional engine, transactions can include both transactional and non-transactional tables (the non-transactional table simply ignore the transactional commands).

Log messages are inserted into the tmplog table, which is a practically free operation since that table lives in memory. The extremely low cost of this operation means that a developer can be very verbose in their use of logging. Rather than agonize over whether to log something or not, you can simply log at will.

In the (hopefully) usual case where nothing goes wrong, the Stored Procedures does not have to do anything. Temporary tables only exist for the duration of a connection. In the typical J2EE usage pattern, an external request arrives at the system, a connection is retrieved from the connection pool, is used and then returned to the pool. When the connection is returned to the pool the temporary table is effectively dropped -- the code does not have to do anything to remove the log messages that where inserted during the successful invocation of the Stored Procedures. In this way the system suffers little to no performance cost for a successful operation.

When the Stored Procedure detects a failure condition it does an INSERT SELECT from the temporary memory table into the permanent MyISAM table. Thus it preserves all log messages that were written to the memory table into the MyISAM table. In this way the system records all of the information it needs but only in the cases where it needs it.

Log4J users can imagine being able to run your production system at DEBUG level for all the failure cases but only pay the overhead of running at WARN level for all of the successful cases -- without having to know which cases were which ahead of time.

An important thing to note is the choice of the MyISAM engine for the permanent log table. Remember that rows are typically only written to that table when things are going badly. That would usually result in the current transaction being rolled back -- except that we really want the inserts into the log table to succeed! The MyISAM engine is not a transactional engine. This means that even if a transaction is rolled back the inserts into the log table are preserved -- which is exactly the desired behavior.

The Code

There are four SPs defined in the debugLogging.sql package. Two SPs help setup the tables to be used, one performs temporary logging and the last copies messages to the permanent table.

This first procedure creates both the temporary and permanant tables. Note the use of the engine specifier to distinguish the two tables. The temporary log contains a single interesting column which was called msg. The permanent table adds an automatic timestamp and a thingID. The assumption is that the logs are written during an operation that creates, destroys, or modifies some object, and that that object has a unique identifier. In the video-on-demand space that I work in, this can be the identifier of a movie being streamed to a customer's set-top box.

Create Procedure setupLogging()
 BEGIN
     create temporary table if not exists tmplog (msg varchar(512)) engine = memory;
      create table if not exists log (ts timestamp default current_timestamp, thingID bigint,
           msg varchar(512)) engine = myisam;
  END;

This next procedure creates just the temporary log table. We've seen cases in the field where the temporary table does not exist at the time that we are needing to insert into it. In that case we have a procedure to recreate the table.


Create Procedure setupTmpLog()
 BEGIN
     create temporary table if not exists tmplog (msg varchar(512)) engine = memory;
  END;

The next procedure is the one that is called the most and performs the actual logging. A message parameter is written to the temporary table. There is a continue handler that creates the temporary table if it does not already exist.


Create Procedure doLog(in logMsg varchar(512))
BEGIN
  Declare continue handler for 1146 -- Table not found
  BEGIN
     call setupTmpLog();
     insert into tmplog values('resetup tmp table');
     insert into tmplog values(logMsg);
  END;

  insert into tmplog values(logMsg);
END;

The last procedure is the one to call when an error is detected within the user's SP and all of the logging needed to be saved for later analysis. This procedure takes a final message, presumably one saying what the final error condition is. It then inserts all of the rows in the temporary table into the permanent table, setting the timestamp along the way. To handle the situation where the user of the logging system chooses to continue after an error, and then runs into another error we make sure to delete all rows in the temporary table after inserting them into the permenant table.


Create Procedure saveAndLog(in thingId int, in lastMsg varchar(512))
BEGIN
   call dolog(lastMsg);
   insert into log(thingId, msg) (select thingId, msg from tmplog);
   truncate table tmplog;
END;

Sample Use

The following code sample shows a possible usage of the logging procedures. The example is based on the notion of a Stored Procedure that would be called from within another stored procedure. The mission of the inner procedure is to parse a comma separated list of values and insert each value into a cache table for use in a query by the calling procedure. It illustrates the value of being able to log not just the final results but the steps followed along the way since it might be very valuable to know which iteration of the loop contained the failure.



Create Procedure parseAndStoreList(in thingId int,
                                                             in i_list varchar (128),
                                                             out returnCode smallInt)
 BEGIN
    DECLARE v_loopIndex default 0;
    DECLARE Exit Handler for SQLEXCEPTION
    BEGIN
         call saveAndLog(thingId, 'got exception parsing list');    -- save the logs if things go badly
         set returnCode = -1;
    END;

    call dolog(concat_ws('got list:', i_list));   -- say we got to the start
    pase_loop: LOOP
         set v_loopIndex = v_loopIndex + 1;
         call dolog(concat_wc(',', 'at loop iteration ',  v_loopIndex);  -- say we got to nth iteration
         -- actually do the parsing, or whatever
    END LOOOP parse_loop;
    set returnCode = 0;
 END;

Conclusion

The ability to do low- or no-cost logging with Stored Procedures has proven to be an extremely useful technique. It lets you instrument your code with extensive information, which is available at production runtime, but only when needed.