Debugging
Web Applications
Ryan Matteson
During the past 10 years, companies have increasingly turned to
the World Wide Web to sell products and provide quick access to
a variety of support forums. The infrastructure used to support
these systems is typically split into vertical tiers based on the
type of infrastructure deployed (e.g., load balancers, Web proxies,
Web servers, application servers, database servers). This tiered
architecture approach allows companies to deploy multiple systems
at each tier to ensure Web site resiliency, and it eases the process
of upgrading systems because each tier can be scaled as demand for
the companies' services increases.
There are numerous performance and availability benefits associated
with tiered Web-based infrastructure, but there are also a few drawbacks.
The biggest drawback is the complexity added by multiple systems,
since pinpointing faulty systems in large deployments can be difficult.
This article will provide an introduction to three open source tools
that can assist with debugging Web-applications and pinpointing
problems. A case study will also be presented to show how these
tools can be used to solve a real-world problem.
Debugging with Curl
One tool that is invaluable for debugging Web-based applications
is curl. Curl provides a full-featured, command-line environment
that can be used to retrieve files, download Web-based content,
and view the application-layer headers that are sent between clients
and servers. To get started with curl(1m), the curl binary can be
executed with the "-h" (print help menu) option to print the available
options and values that can be passed to those options.
Curl will retrieve the resource passed as an argument and print
the contents to standard out, or to the file passed to the "-o"
option. The resource can be in the form of an http://, https://
or ftp:// style URL. The following example shows how curl can be
used to retrieve the the curl source code:
$ curl -o curl.tar http://curl.haxx.se/download/curl-7.15.0.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
23 1709k 23 396k 0 0 40671 0 0:00:43 0:00:09 0:00:34 23822
Curl also contains numerous advanced debugging options, which can
be used to test HTTP features, display verbose output, and retrieve
protocol headers. The following example uses curl's "-v" (verbose
output) option to display the HTTP request and HTTP response headers
for a connection to the daemons.net Web server:
$ curl -k -v https://mail.daemons.net
* About to connect() to mail.daemons.net port 443
* Trying 206.222.17.179... * connected
* Connected to mail.daemons.net (206.222.17.179) port 443
* successfully set certificate verify locations:
* CAfile: /usr/share/curl/curl-ca-bundle.crt
CApath: none
* SSL connection using DHE-RSA-AES256-SHA
* Server certificate:
* subject: /C=US/O=mail.daemons.net/OU= \
https://services.choicepoint.net/get.jsp?1605445126 \
/OU=See www.rapidssl.com/cps (c)04/OU=Domain Control \
Validated - StarterSSL(TM)/CN=mail.daemons.net
* start date: 2005-05-20 22:09:48 GMT
* expire date: 2006-06-20 22:09:48 GMT
* common name: mail.daemons.net (matched)
* issuer: /C=US/O=Equifax Secure Inc./CN=Equifax Secure eBusiness CA-1
* SSL certificate verify result: error number 1 (20), continuing anyway.
> GET / HTTP/1.1
User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) \
libcurl/7.13.1 OpenSSL/0.9.7g zlib/1.2.3
Host: mail.daemons.net
Pragma: no-cache
Accept: */*
< HTTP/1.1 302 Found
< Date: Thu, 20 Oct 2005 18:49:20 GMT
< Server: Apache
< Set-Cookie: Horde=7ebcae69e30287045dd3d3d1fe1dd31f; \
path=/; domain=mail.daemons.net; secure
< Expires: Thu, 19 Nov 1981 08:52:00 GMT
< Cache-Control: no-store, no-cache, must-revalidate, post-check=0, \
pre-check=0
< Pragma: no-cache
< Location: https://mail.daemons.net/login.php?Horde=7ebcae69e30287045dd3d3d1fe1dd31f
< Content-Length: 0
< Content-Type: text/html; charset=ISO-8859-1
* Connection #0 to host mail.daemons.net left intact
* Closing connection #0
If reverse proxies or load balancers are used to distribute requests
across multiple Web servers, it can sometimes be difficult to determine
which Web server responded to a client's request. This is especially
true when the client's source IP address is NAT'ed, or when the Web
server is handling hundreds of requests simultaneously. When these
issues arise, you can use curl's "--user-agent" (user custom user
agent) and "-H" (send custom request header) options to send a unique
user-agent and request header to the server:
$ curl -v --user-agent "CURL DEBUG (`date`)" -H "X-foo: yikes" \
--show-error daemons.net
If the Web server is configured to log the user-agent attribute, the
string "CURL DEBUG" will be logged along with a date timestamp:
$ tail -1 access_log
10.10.10.10 - - [26/Oct/2005:19:11:24 -0400] "GET / HTTP/1.1" \
301 252 "-" "CURL DEBUG (Wed Oct 26 13:11:24 EDT 2005)"
When a problem is detected with the content returned from a server,
this information can be used to easily find the server that is returning
the errant content. The case study below, "Debugging sporadic Website
behavior", will show how this feature was used to debug problems with
an Apache Web server.
Debugging with Chaosreader
When debugging complex Web applications, the ability to view the
complete client-server interaction and drill down to specific requests
and responses can be invaluable. This capability is available with
the freeware Ethereal and chaosreader utilities. Since Ethereal
is covered thoroughly in numerous books and online articles, I will
focus on the capabilities of chaosreader in this section.
Chaosreader is written in Perl and produces reports from tcpdump
or snoop capture files. These reports include information on TCP,
UDP, ICMP, and IP traffic, and they contain the application layer
data from several well-known protocols. To get started with chaosreader,
you'll need to download the Perl script from Sourceforge:
http://chaosreader.sourceforge.net/
Once the script has been downloaded to a local file system, the script
can be executed with the "--help" option to display the available
options and several practical examples.
Tcpdump and snoop are used to collect network traffic that can
be analyzed by chaosreader. The following example uses tcpdump to
write all packets with a source or destination port of 80 to a file
named chaosreader.dump:
$ tcpdump -i en0 -s 1518 -w chaosreader.dump port 80
This example uses a snap length of 1518 bytes to ensure that all protocol
headers and application data are captured. To get the most benefit
from chaosreader, the packet captures should be taken when a problem
is detected with a Web or application server. To analyze the packet
capture with chaosreader, the file with the saved packets can be passed
as an option to the chaosreader script:
$ chaosreader.pl -D html chaosreader.dump
0003 192.168.1.8:55510,209.249.116.195:80 http
0008 192.168.1.8:55515,209.249.116.197:80 http
0004 192.168.1.8:55511,209.249.116.195:80 http
0002 192.168.1.8:55509,209.249.116.195:80 http
0005 192.168.1.8:55512,209.249.116.195:80 http
0011 192.168.1.8:55518,209.249.116.197:80 http
0001 192.168.1.8:55508,209.249.116.195:80 http
0009 192.168.1.8:55516,216.52.17.116:80 http
0006 192.168.1.8:55513,209.249.116.197:80 http
0007 192.168.1.8:55514,216.52.17.116:80 http
Once chaosreader finishes processing the packet capture file, the
results of the analysis can be viewed by changing to the directory
passed to the "-D" (output all files to this directory) option and
opening the file named index.html with a Web browser. This page contains
all of the connections that were detected displayed in chronological
order.
Each connection contains a unique connection descriptor, the date
the request was issued, the number of bytes sent between the two
end-points, and the source and destination IP addresses and port
numbers. Each connection also contains a table with hyperlinks to
the individual objects (e.g., images, HTML) transmitted between
the client and server.
To view the protocol headers along with the results of the requests,
the "as_html" link can be used. The "as_html" link is a great tool
for debugging Web applications, since the requests and the results
of those requests are displayed in chronological order. Two sample
screenshots are included in Figure 1 and 2 to show the connection
screen along with the screen displayed when the "as_html" link is
clicked.
Viewing Content and Headers with HTTP Live Headers
The curl and chaosreader utilities are great tools for debugging
Web applications, but they require a Unix shell and Perl interpreter
to utilize their full capabilities. This is not ideal for all users,
because some administrators are unable to get shell access to a
Unix system or are unable to install a Perl interpreter on their
desktop. In such cases, you can use the Firefox live headers plug-in
to debug Web-based applications. The Live HTTP Headers plug-in will
display the request and response headers as a page is loaded in
Firefox, and it provides numerous options to filter results and
control which data is collected.
To get started with the HTTP live headers plug-in, you can point
your Firefox browser to the main live headers Web site:
http://livehttpheaders.mozdev.org/
Once your browser renders the page, you can click the "Installation"
tab, then click on the version that matches the version of Firefox
you are using. Firefox will then install the plug-in from a remote
location and will add a new menu titled "HTTP Live Headers" to the
Tools drop-down menu.
If you would like more control over the installation process,
you can download the plug-in by clicking the "download it" link
on the live headers Installation page, or by right-clicking on the
file and using the "save as" option. Once the plug-in has been downloaded
to the local drive, you can use Firefox's "File -> Open File"
menu to open the file and begin the installation. Once the installation
completes, a new menu titled "HTTP Live Headers" will be available
under the Tools drop-down menu.
To open the Live HTTP Headers plug-in, you can click "Tools ->
Live HTTP Headers". This will open a new window with four tabs and
a large text box. Each time a Web site is visited in the main Firefox
window, the HTTP request and response headers will be displayed
in the large text box. To display the headers from specific pages,
you can click on the Config tab and add a regular expression to
the "Filter URLs with regexp" option. A Live HTTP Headers screenshot
is included in Figure 3.
Case Study: Debugging Sporadic Web Site Behavior
While I was working at my desk one Friday afternoon, a colleague
came by to discuss a problem he was experiencing. When he visited
a specific Web site, he was periodically receiving messages in his
browser stating that the "maximum number of redirects had been reached."
He asked whether I could recreate the problem, so I placed my current
task on hold and started debugging the issue to find the source
of the problem.
I began by connecting to the site with Firefox and repeatedly
refreshing the site. After refreshing the site 10-20 times, I received
the error message mentioned above. Because this appeared to be an
issue with redirects on one of a slew of Web servers behind a load
balancer, I needed a way to accurately pinpoint which server or
servers were sending the faulty redirects. I also needed a way to
capture the redirect location, which is reflected in the "Location"
attribute in the HTTP header.
After some careful thought, I decided to use a Bourne shell loop
and curl's "--user-agent" option to address both issues. The loop
would allow me to send multiple requests to the server with curl,
which I could parse to retrieve the Location header. Curl's "--user-agent"
option would allow me to set a unique string identifier, which could
be parsed out of the Web server access_log once I detected a failure.
The following loop is what resulted:
while :
do
DATE=`/bin/date`
echo "** Processing request at ${DATE} **" >> badserver.txt
curl -v --user-agent "CURL DEBUG (${DATE})" \
http://mysite.com 2>&1 | egrep "Location" >> badserver.txt
sleep 5
done
This loop will send one HTTP GET request to the server mysite.com
every five seconds, and the Location attribute and time of the request
will be logged to the file badserver.txt. I let this loop run 10-20
times, which seemed to be the number of connections required to trigger
the problem. Once I exited the loop, I saw the following entries in
the file badserver.txt:
** Processing request at Wed Oct 26 19:11:18 EDT 2005 **
< Location: https://mysite.com/
** Processing request at Wed Oct 26 19:11:24 EDT 2005 **
< Location: https://mysite.com/
** Processing request at Wed Oct 26 19:11:29 EDT 2005 **
< Location: http://mysite.com/
** Processing request at Wed Oct 26 19:11:34 EDT 2005 **
< Location: https://mysite.com/
The Location directives indicated that non-secure requests were being
redirected to a secure site on all but one of the servers. The one
server that was behaving differently was sending clients a non-secure
redirect. When the client followed the redirect to the non-secure
Web site, the Web server would reply with another non-secure redirect,
which was causing a redirect loop to occur. The browser noticed the
loop, and terminated the connection after a specific number of redirects
were performed. Once I had this information, I used the string "CURL
DEBUG" along with the date to identify the server that was sending
the broken redirect. This was accomplished by tailing the access_log
on each server and searching for the string "CURL DEBUG (Wed Oct 26
19:11:29 EDT 2005)":
$ tail -10000 access_log | egrep "CURL DEBUG (Wed Oct 26 19:11:29 EDT 2005)"
10.10.10.10 - - [26/Oct/2005:19:11:29 -0400] "GET / HTTP/1.1" 301 252 \
"-" "CURL DEBUG Wed Oct 26 13:11:29 EDT 2005"
Once I found the Web server that was returning the incorrect redirect
information, I fired up vi to review the Apache httpd.conf configuration
file. A quick search for the string "Redirect" revealed the following
configuration:
<VirtualHost *:80>
Redirect permanent / http://mysite.com/
</virtualhost>
The problem turned out to be a typographical error in the httpd.conf
configuration file, which was easily fixed by changing the Redirect
string to the following:
<VirtualHost *:80>
Redirect permanent / https://mysite.com/
</virtualhost>
Once this change was made, all of the servers began working as expected.
This problem could have been debugged from a number of angles and
with numerous software utilities and load-balancer features.
Conclusion
When Web-based applications break or become unresponsive, it is
essential to have a set of software tools to troubleshoot and pinpoint
problems. In this article, I presented a brief overview of three
tools that can be used to debug problems with Web-based applications.
For additional information on each utility discussed in this article,
and for references to additional utilities, please see the References
section of this article.
References
Chaosreader -- http://chaosreader.sourceforge.net/
Curl -- http://curl.haxx.se/
Ethereal -- http://www.ethereal.com/
Firefox Live HTTP Headers -- http://livehttpheaders.mozdev.org/
Siege -- http://joedog.org/
ssl-cert-check -- http://daemons.net/~matty/
Wget -- http://www.gnu.org/software/wget/
Acknowledgments
Ryan thanks the developers of chaosreader, curl, Ethereal, Firefox,
siege, and the wget software utilities!
Ryan Matteson works as a systems engineer and specializes in
Web technologies, SANs, and the OpenBSD, Linux, and Solaris operating
systems. When Ryan isn't busy working, he enjoys playing guitar
and maintaining his blog at: daemons.net/~matty. Questions
and comments about this article can be addressed to: matty@daemons.net. |