Article

mar2006.tar

Debugging Web Applications

Ryan Matteson

During the past 10 years, companies have increasingly turned to the World Wide Web to sell products and provide quick access to a variety of support forums. The infrastructure used to support these systems is typically split into vertical tiers based on the type of infrastructure deployed (e.g., load balancers, Web proxies, Web servers, application servers, database servers). This tiered architecture approach allows companies to deploy multiple systems at each tier to ensure Web site resiliency, and it eases the process of upgrading systems because each tier can be scaled as demand for the companies' services increases.

There are numerous performance and availability benefits associated with tiered Web-based infrastructure, but there are also a few drawbacks. The biggest drawback is the complexity added by multiple systems, since pinpointing faulty systems in large deployments can be difficult. This article will provide an introduction to three open source tools that can assist with debugging Web-applications and pinpointing problems. A case study will also be presented to show how these tools can be used to solve a real-world problem.

Debugging with Curl

One tool that is invaluable for debugging Web-based applications is curl. Curl provides a full-featured, command-line environment that can be used to retrieve files, download Web-based content, and view the application-layer headers that are sent between clients and servers. To get started with curl(1m), the curl binary can be executed with the "-h" (print help menu) option to print the available options and values that can be passed to those options.

Curl will retrieve the resource passed as an argument and print the contents to standard out, or to the file passed to the "-o" option. The resource can be in the form of an http://, https:// or ftp:// style URL. The following example shows how curl can be used to retrieve the the curl source code:

$ curl -o curl.tar http://curl.haxx.se/download/curl-7.15.0.tar.gz
   % Total    % Received % Xferd  Average Speed   Time   Time    Time  Current
                                  Dload   Upload  Total  Spent   Left   Speed
  23 1709k   23  396k    0     0  40671      0  0:00:43 0:00:09  0:00:34 23822

Curl also contains numerous advanced debugging options, which can be used to test HTTP features, display verbose output, and retrieve protocol headers. The following example uses curl's "-v" (verbose output) option to display the HTTP request and HTTP response headers for a connection to the daemons.net Web server:

$ curl -k -v https://mail.daemons.net
* About to connect() to mail.daemons.net port 443
*   Trying 206.222.17.179... * connected
* Connected to mail.daemons.net (206.222.17.179) port 443
* successfully set certificate verify locations:
*   CAfile: /usr/share/curl/curl-ca-bundle.crt
  CApath: none
* SSL connection using DHE-RSA-AES256-SHA
* Server certificate:
*        subject: /C=US/O=mail.daemons.net/OU= \
         https://services.choicepoint.net/get.jsp?1605445126 \
         /OU=See www.rapidssl.com/cps (c)04/OU=Domain Control \
         Validated - StarterSSL(TM)/CN=mail.daemons.net
*        start date: 2005-05-20 22:09:48 GMT
*        expire date: 2006-06-20 22:09:48 GMT
*        common name: mail.daemons.net (matched)
*        issuer: /C=US/O=Equifax Secure Inc./CN=Equifax Secure eBusiness CA-1
* SSL certificate verify result: error number 1 (20), continuing anyway.
> GET / HTTP/1.1
User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) \
  libcurl/7.13.1 OpenSSL/0.9.7g zlib/1.2.3
Host: mail.daemons.net
Pragma: no-cache
Accept: */*

< HTTP/1.1 302 Found
< Date: Thu, 20 Oct 2005 18:49:20 GMT
< Server: Apache
< Set-Cookie: Horde=7ebcae69e30287045dd3d3d1fe1dd31f; \
  path=/; domain=mail.daemons.net; secure
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: no-store, no-cache, must-revalidate, post-check=0, \
  pre-check=0
< Pragma: no-cache
< Location: https://mail.daemons.net/login.php?Horde=7ebcae69e30287045dd3d3d1fe1dd31f
< Content-Length: 0
< Content-Type: text/html; charset=ISO-8859-1
* Connection #0 to host mail.daemons.net left intact
* Closing connection #0

If reverse proxies or load balancers are used to distribute requests across multiple Web servers, it can sometimes be difficult to determine which Web server responded to a client's request. This is especially true when the client's source IP address is NAT'ed, or when the Web server is handling hundreds of requests simultaneously. When these issues arise, you can use curl's "--user-agent" (user custom user agent) and "-H" (send custom request header) options to send a unique user-agent and request header to the server:

$ curl -v --user-agent "CURL DEBUG (`date`)" -H "X-foo: yikes"  \
  --show-error daemons.net

If the Web server is configured to log the user-agent attribute, the string "CURL DEBUG" will be logged along with a date timestamp:

$ tail -1 access_log
10.10.10.10 - - [26/Oct/2005:19:11:24 -0400] "GET / HTTP/1.1" \
  301 252 "-" "CURL DEBUG (Wed Oct 26 13:11:24 EDT 2005)"

When a problem is detected with the content returned from a server, this information can be used to easily find the server that is returning the errant content. The case study below, "Debugging sporadic Website behavior", will show how this feature was used to debug problems with an Apache Web server.

Debugging with Chaosreader

When debugging complex Web applications, the ability to view the complete client-server interaction and drill down to specific requests and responses can be invaluable. This capability is available with the freeware Ethereal and chaosreader utilities. Since Ethereal is covered thoroughly in numerous books and online articles, I will focus on the capabilities of chaosreader in this section.

Chaosreader is written in Perl and produces reports from tcpdump or snoop capture files. These reports include information on TCP, UDP, ICMP, and IP traffic, and they contain the application layer data from several well-known protocols. To get started with chaosreader, you'll need to download the Perl script from Sourceforge:

http://chaosreader.sourceforge.net/

Once the script has been downloaded to a local file system, the script can be executed with the "--help" option to display the available options and several practical examples.

Tcpdump and snoop are used to collect network traffic that can be analyzed by chaosreader. The following example uses tcpdump to write all packets with a source or destination port of 80 to a file named chaosreader.dump:

$ tcpdump -i en0 -s 1518 -w chaosreader.dump port 80

This example uses a snap length of 1518 bytes to ensure that all protocol headers and application data are captured. To get the most benefit from chaosreader, the packet captures should be taken when a problem is detected with a Web or application server. To analyze the packet capture with chaosreader, the file with the saved packets can be passed as an option to the chaosreader script:

$ chaosreader.pl -D html chaosreader.dump
  0003  192.168.1.8:55510,209.249.116.195:80           http
  0008  192.168.1.8:55515,209.249.116.197:80           http
  0004  192.168.1.8:55511,209.249.116.195:80           http
  0002  192.168.1.8:55509,209.249.116.195:80           http
  0005  192.168.1.8:55512,209.249.116.195:80           http
  0011  192.168.1.8:55518,209.249.116.197:80           http
  0001  192.168.1.8:55508,209.249.116.195:80           http
  0009  192.168.1.8:55516,216.52.17.116:80             http
  0006  192.168.1.8:55513,209.249.116.197:80           http
  0007  192.168.1.8:55514,216.52.17.116:80             http

Once chaosreader finishes processing the packet capture file, the results of the analysis can be viewed by changing to the directory passed to the "-D" (output all files to this directory) option and opening the file named index.html with a Web browser. This page contains all of the connections that were detected displayed in chronological order.

Each connection contains a unique connection descriptor, the date the request was issued, the number of bytes sent between the two end-points, and the source and destination IP addresses and port numbers. Each connection also contains a table with hyperlinks to the individual objects (e.g., images, HTML) transmitted between the client and server.

To view the protocol headers along with the results of the requests, the "as_html" link can be used. The "as_html" link is a great tool for debugging Web applications, since the requests and the results of those requests are displayed in chronological order. Two sample screenshots are included in Figure 1 and 2 to show the connection screen along with the screen displayed when the "as_html" link is clicked.

Viewing Content and Headers with HTTP Live Headers

The curl and chaosreader utilities are great tools for debugging Web applications, but they require a Unix shell and Perl interpreter to utilize their full capabilities. This is not ideal for all users, because some administrators are unable to get shell access to a Unix system or are unable to install a Perl interpreter on their desktop. In such cases, you can use the Firefox live headers plug-in to debug Web-based applications. The Live HTTP Headers plug-in will display the request and response headers as a page is loaded in Firefox, and it provides numerous options to filter results and control which data is collected.

To get started with the HTTP live headers plug-in, you can point your Firefox browser to the main live headers Web site:

http://livehttpheaders.mozdev.org/

Once your browser renders the page, you can click the "Installation" tab, then click on the version that matches the version of Firefox you are using. Firefox will then install the plug-in from a remote location and will add a new menu titled "HTTP Live Headers" to the Tools drop-down menu.

If you would like more control over the installation process, you can download the plug-in by clicking the "download it" link on the live headers Installation page, or by right-clicking on the file and using the "save as" option. Once the plug-in has been downloaded to the local drive, you can use Firefox's "File -> Open File" menu to open the file and begin the installation. Once the installation completes, a new menu titled "HTTP Live Headers" will be available under the Tools drop-down menu.

To open the Live HTTP Headers plug-in, you can click "Tools -> Live HTTP Headers". This will open a new window with four tabs and a large text box. Each time a Web site is visited in the main Firefox window, the HTTP request and response headers will be displayed in the large text box. To display the headers from specific pages, you can click on the Config tab and add a regular expression to the "Filter URLs with regexp" option. A Live HTTP Headers screenshot is included in Figure 3.

Case Study: Debugging Sporadic Web Site Behavior

While I was working at my desk one Friday afternoon, a colleague came by to discuss a problem he was experiencing. When he visited a specific Web site, he was periodically receiving messages in his browser stating that the "maximum number of redirects had been reached." He asked whether I could recreate the problem, so I placed my current task on hold and started debugging the issue to find the source of the problem.

I began by connecting to the site with Firefox and repeatedly refreshing the site. After refreshing the site 10-20 times, I received the error message mentioned above. Because this appeared to be an issue with redirects on one of a slew of Web servers behind a load balancer, I needed a way to accurately pinpoint which server or servers were sending the faulty redirects. I also needed a way to capture the redirect location, which is reflected in the "Location" attribute in the HTTP header.

After some careful thought, I decided to use a Bourne shell loop and curl's "--user-agent" option to address both issues. The loop would allow me to send multiple requests to the server with curl, which I could parse to retrieve the Location header. Curl's "--user-agent" option would allow me to set a unique string identifier, which could be parsed out of the Web server access_log once I detected a failure. The following loop is what resulted:

while :
do
   DATE=`/bin/date`

   echo "** Processing request at ${DATE} **" >> badserver.txt
   curl -v --user-agent "CURL DEBUG (${DATE})" \
     http://mysite.com 2>&1 | egrep "Location" >> badserver.txt
   sleep 5
done

This loop will send one HTTP GET request to the server mysite.com every five seconds, and the Location attribute and time of the request will be logged to the file badserver.txt. I let this loop run 10-20 times, which seemed to be the number of connections required to trigger the problem. Once I exited the loop, I saw the following entries in the file badserver.txt:

** Processing request at Wed Oct 26 19:11:18 EDT 2005 **
< Location: https://mysite.com/

** Processing request at Wed Oct 26 19:11:24 EDT 2005 **
< Location: https://mysite.com/

** Processing request at Wed Oct 26 19:11:29 EDT 2005 **
< Location: http://mysite.com/

** Processing request at Wed Oct 26 19:11:34 EDT 2005 **
< Location: https://mysite.com/

The Location directives indicated that non-secure requests were being redirected to a secure site on all but one of the servers. The one server that was behaving differently was sending clients a non-secure redirect. When the client followed the redirect to the non-secure Web site, the Web server would reply with another non-secure redirect, which was causing a redirect loop to occur. The browser noticed the loop, and terminated the connection after a specific number of redirects were performed. Once I had this information, I used the string "CURL DEBUG" along with the date to identify the server that was sending the broken redirect. This was accomplished by tailing the access_log on each server and searching for the string "CURL DEBUG (Wed Oct 26 19:11:29 EDT 2005)":

$ tail -10000 access_log | egrep "CURL DEBUG (Wed Oct 26 19:11:29 EDT 2005)"
10.10.10.10 - - [26/Oct/2005:19:11:29 -0400] "GET / HTTP/1.1" 301 252 \
  "-" "CURL DEBUG Wed Oct 26 13:11:29 EDT 2005"

Once I found the Web server that was returning the incorrect redirect information, I fired up vi to review the Apache httpd.conf configuration file. A quick search for the string "Redirect" revealed the following configuration:

<VirtualHost *:80>
        Redirect permanent / http://mysite.com/
</virtualhost>

The problem turned out to be a typographical error in the httpd.conf configuration file, which was easily fixed by changing the Redirect string to the following:

<VirtualHost *:80>
        Redirect permanent / https://mysite.com/
</virtualhost>

Once this change was made, all of the servers began working as expected. This problem could have been debugged from a number of angles and with numerous software utilities and load-balancer features.

Conclusion

When Web-based applications break or become unresponsive, it is essential to have a set of software tools to troubleshoot and pinpoint problems. In this article, I presented a brief overview of three tools that can be used to debug problems with Web-based applications. For additional information on each utility discussed in this article, and for references to additional utilities, please see the References section of this article.

References

Chaosreader -- http://chaosreader.sourceforge.net/

Curl -- http://curl.haxx.se/

Ethereal -- http://www.ethereal.com/

Firefox Live HTTP Headers -- http://livehttpheaders.mozdev.org/

Siege -- http://joedog.org/

ssl-cert-check -- http://daemons.net/~matty/

Wget -- http://www.gnu.org/software/wget/

Acknowledgments

Ryan thanks the developers of chaosreader, curl, Ethereal, Firefox, siege, and the wget software utilities!

Ryan Matteson works as a systems engineer and specializes in Web technologies, SANs, and the OpenBSD, Linux, and Solaris operating systems. When Ryan isn't busy working, he enjoys playing guitar and maintaining his blog at: daemons.net/~matty. Questions and comments about this article can be addressed to: matty@daemons.net.