Java Web Application First Aid
Alan Berg
Monday evening maintenance slots are fun, rushing to provide deployments and upgrades within a narrow window of time. And, although you may have an excellent development and acceptance environment, reality teaches that failure can occur. Not every application clicks right into place the first time, and problems seem most likely to occur when the developers are nicely tucked into bed or on extended holiday in the nether regions of Siberia.
In this article, I will detail how to find the cause of some common issues arising when deploying Web applications with Tomcat [8] in combination with an Apache Server [1] connected by mod_jk [9]. Additionally, I will offer some words on the subject of common-sense practices.
The basic concept underlying this article is that by breaking a test environment, you gain a solid insight into potential failure modes. Such failure points for Apache, mod_jk, Tomcat, and Web application combinations are often inconvenient. As a systems administrator, you may have control over generic configuration settings but not the Java code itself or the hidden frameworks called by the code. Worse still, the configuration details have the potential to be deeply related to the particular Java frameworks used. For example, many frameworks such as Struts [7], Cocoon [2], and Hibernate [4] use numerous XML files to deliver massively enhanced functionality with a high degree of flexibility.
In this article, I will ignore this framework issue other than to note that understanding this type of application-focused configuration requires a quiet place and reasonable amount of time to learn. I will, however, deal with the more exposed breaking points, which are more valid for a systems administrator to resolve under time pressure.
An Example Infrastructure
In this section, I describe how to create a test infrastructure. You will not need the infrastructure to understand the following sections, but personally, I find it easier to learn by doing.
I based the test environment on Ubuntu 6.10 [10]. To install Apache2 and Mod_jk, as an empowered user:
sudo apt-get install apache2
sudo apt-get install libapache2-mod-jk
Installing Tomcat 5.5 (I used Tomcat 5.5.20) requires downloading and unpacking an archive. By default Tomcat uses ports above 1024, so you may use a user account for installing starting/stopping Tomcat, just make sure that Java 1.5 is installed and that the JAVA_HOME environment variable of the Tomcat user points to the correct directory.
At this point, we have a three-piece jigsaw puzzle with the Apache server, Mod_jk, and Tomcat needing to talk with each other properly. Let's write a minimum configuration with one virtual host listening at Port 80 and JKmounting the Tomcat server that is configured by default.
In Figure 1, the top region explains how the configuration is set up. The /etc/apache2/Apache2.conf file points to workers.properties, which configures mod_jk. Mod_jk talks via the ajp1.3 protocol with Tomcat. You will find the configuration for the Tomcat half of the communication under Tomcat home /conf/server.xml. The Web application has it own Java-specific logging found under the Tomcat home /webapps/1_4/WEB-INF/classes/logger.properties, where 1_4 is the name of the Web application.
To begin, let's modify the virgin apache2.conf file. At the end of the file, you should see the line:
Include /etc/apache2/sites-enabled/[^.#]*
Comment out that line and add:
JkWorkersFile /etc/apache2/workers.properties
JkLogFile /var/log/apache2/mod_jk.log
JkLogLevel error
Include /etc/apache2/test_sites/[^.#]*
This configuration tells the mod_jk-enabled Apache server to pick up the mod_jk configuration from workers.properties and create a log file in /var/log/apache2 that is called mod_jk.log. Then, Apache will load configuration files from the test_sites directory.
The workers.properties content is as follows:
worker.list=dev
worker.dev.type=ajp13
worker.dev.host=localhost
worker.dev.port=8009
The name of the worker threads is dev, and this name will be used to reference the workers in the virtual host later. The protocol used is ajp13 and is looking to talk to Tomcat on the localhost at port 8009; these settings are correct by default in the Tomcat deployment. In the Tomcat conf/server.xml file, you should see the corresponding entry:
<Connector port="8009" enableLookups="false" \
redirectPort="8443" protocol="AJP/1.3" />
Make the test_sites directory and add a virtual host in the newly made directory in a file named 000-default:
NameVirtualHost *:80
<VirtualHost *:80>
Servername test.test.org
DocumentRoot /var/www/dev
CustomLog /var/log/apache2/access_000_default.log combined
JkMount /* dev
JkMount / dev
</VirtualHost>
The virtual host listens at port 80 and JKmounts Tomcat from the Apache URL onwards. The virtual host's access logs go to /var/log/apache2/access_000_default.log in the combined format. The configuration that defines the combined format resides in apache2.conf:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \
\"%{User-Agent}i\"" combined
You can stop and start apache via:
sudo /etc/init.d/apache2 stop
sudo /etc/init.d/apache2 start
Next, start Tomcat, as the Tomcat user that you installed as, via bin/startup.sh under the Tomcat home. At this point, if you browse http://localhost, you should see the main page of Tomcat.
Before deploying the war file, you will need to create the directory /var/log/apps with permissions that the Tomcat user can write a file. To deploy the example war file as mentioned in the resources, simply dump the file in the webapps directory. Then wait a few seconds, and you will see that Tomcat has created a new directory named 1_4. If you browse to http://localhost/1_4, you will see our very primitive application.
Note that the application is deliberately simple so as not to distract from the story of failure points. The source code is included in the sub-directories under the webapps/1_4/WEB-INF/classes directory.
In the next section, we will break this somewhat primitive installation in various ways that I have seen in heavy-usage production environments. We will note how the system logs these events and look at some positive examples of how to respond.
A Word of Warning
Please take any suggestions given here with a pinch of salt. Although I have based the suggestions on personal experience from deployments for large user bases, you may have other and better ideas. With that warning in mind, however, I do believe that deliberately breaking your test infrastructure is good methodology for gaining experience.
Example Failures
Production environments tend to be different from development environments, which is why we normally use an acceptance environment as a buffer zone. However, even acceptance environments may be subtly or not so subtly different, and the differences tend to cause problems. In this section, I'll mention some failures I have seen during production deployments. The list is only a starting point, and no doubt over time you will add to it. I would even suggest that if you have the time and motivation, you should try to break this test infrastructure in new ways, while recording the effects on logging. This systematic methodology may help you later.
Unavailable Services
If the Apache server is not connected to the Tomcat server for whatever reason, when you browse the JKmounted URL, you will get a message similar to the following:
Service Temporarily Unavailable
The server is temporarily unable to service your request due to
maintenance downtime or capacity problems. Please try again later.
This translates to a 503 error in the access_000-default.log for example:
127.0.0.1 - - [03/Jan/2007:12:47:58 +0100] "GET / HTTP/1.1" \
503 323 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; \
rv:1.8.1.1) Gecko/20060601 Firefox/2.0.0.1 (Ubuntu-edgy)"
If your Web site is not to busy, you can lazily watch the event live on the access_000-default.log via:
tail -f access_000-default.log.
In the mod_jk.log, you will get a sensible answer with the same timestamp. Therefore, a grep on access log timestamp is a powerful method of accumulating related entries, for example:
grep -i 12:47:58 *
[Wed Jan 03 12:47:58 2007] [5179:0000] [error]
ajp_service::jk_ajp_common.c (1794): Error connecting to tomcat.
Tomcat is probably not started or is listening on the wrong
port. worker=dev failed
The first thing you may consider is to see whether the Tomcat server is up via:
ps -ef | grep java
If the server is up you should see an entry similar to:
alan 6076 1 9 13:27 pts/2 00:00:03 \
/home/alan/java5/bin/java \
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager \
-Djava.util.logging.config.file=/home/alan/tomcat55/conf/ \
logging.properties \
-Djava.endorsed.dirs=/home/alan/tomcat55/common/endorsed \
-classpath :/home/alan/tomcat55/bin/bootstrap.jar:/home/alan/ \
tomcat55/bin/commons-logging-api.jar \
-Dcatalina.base=/home/alan/tomcat55 \
-Dcatalina.home=/home/alan/tomcat55 \
-Djava.io.tmpdir=/home/alan/tomcat55/ \
temp org.apache.catalina.startup.Bootstrap start
Check which workers files exist via grep -ir JKWorkersFile * in the /etc/apache2 directory.
Okay, so the server is up. Next, check the workers.properties file to make sure that the right port and host are being pointed to, for example:
worker.dev.port=8009
Finally, verify that the conf/server.xml file under the Tomcat home is listening under the correct port, for example:
<Connector port="8009" enableLookups="false" \
redirectPort="8443" protocol="AJP/1.3" />
If all else fails, stop the Tomcat server, verify that there are no unexpected processes still running, and restart.
Potential Issues
What if the application is not at the expected URL? Apache has powerful URL rewriting and redirection possibilities via modules such as mod_rewrite [6]. Due to the power and the potential to use REGEX, it is quite easy to make generic rewrite rules that work across a series of servers. However, with the power also comes the opportunity for error. As usual, the KISS principle is best for REGEX. Furthermore, it is common practice to have redirects in HTML documents at the top of applications, for example, pointing to the currently best-skinned login page.
What if your war file name is different than expected? As an example, change the name of the example war file to 1_other and place back in the webapps directory. You will see a second deployed Webapp at http://localhost/1_other.
Another potential issue may be that JKmount or JKunmount points are incorrect. A subtle issue is that you look for everything under the root of the Tomcat server and not the root of the Webapp. To see this effect, try changing the Jkmount point in 000-default to:
JkMount /1_4/* dev
JkMount /1_4/ dev
If you then browse http://localhost, you will no longer see the raw root of Tomcat.
Note: JKunmount points do exactly what they sound like. This verb tells mod_jk to ignore the URL and allow Apache to deal with the request.
Logging Locations
A perennial issue for sys admins is log rotation and finding the right entries in the relevant log files to debug. Correctly configured logging is the most important source of debug information. However, sadly for Web applications, configuration information generally resides in more than one directory. The following information is useful for the smooth running of logging.
Logging exists in three directories for a typical deployed application. The first is Apache-related and, under our test infrastructure, resides under /var/log/apache2. The configuration for this resides in /etc/apache2/apache2.conf for the error log and the mod_jk log, 000-default for an access log. The relevant lines are:
Apache2.conf
ErrorLog /var/log/apache2/error.log
JkLogFile /var/log/apache2/mod_jk.log
JkLogLevel error
000-default
CustomLog /var/log/apache2/access_000_default.log combined
The developers have defined the configuration for Tomcat 5.5-specific errors in /conf/logger.properties under the Tomcat home. Personally, I would not change the Tomcat default values unless there is a strong reason to do so.
Applications usually log either using a specific Java library or printing to stdout via a System.out.println call. Tomcat then redirects Stdout to Tomcat home/logs/catalina.out. If something serious has happened in your application, the catalina.out is the first place to look.
In our example, if you look under the WEB-INF/classes, you will see a file named logger.properties. This file configures the logging for our application, which is deliberately using the highly popular log4j framework [5]. In general, the WEB-INF directory or sub-directories are the locations in which you can expect application-specific configuration to reside.
Log4j is highly configurable and can send logging information to files, mail, SMS, syslog, etc. and even rotates or truncates files depending on the details of logger.properties. In the current case, the entry log4j.appender.R.File defines the location of the files:
log4j.appender.R.File=/var/log/app/test.log
Note that we are rotating through ten log files of maximum size of 3000KB.
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.MaxFileSize=3000KB
log4j.appender.R.MaxBackupIndex=10
The configuration risks losing entries if there is a lot of logging, but avoids swamping a partition if the application runs for a long time without administration.
It is well worth modifying the logger.properties file in ways mentioned in reference [5] and observes the effect. I must admit that over the past few years I've spent hours searching for the right settings.
Logging requires that the logging directory exists and has the correct permissions so that the Tomcat user can write. If these conditions are not met, you will not see logging, and the failure events will be noted in catalina.out:
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /var/log/app/test.log \
(No such file or directory)
java.io.FileNotFoundException: /var/log/app/test.log \
(Permission denied)
If you find a reproducible error in your application, then you can tail -f on catalina.out and the application-specific logging. At the same time, you can browse the application and watch events unfold live.
Missing Jar Files
Java extends itself via the addition of Libraries. For a Web application, the most common location for the libraries to reside is in the Webapps WEB-INF/lib directory. If a jar file is missing, you may see an error similar to:
java.lang.NoClassDefFoundError: org/apache/log4j/Logger
In our case the log4j.jar file was missing.
Jar files are zip files with extra information and structure contained within. To verify that you have a missing library, you have the possibility to unzip the files. If you try that with the log4j.jar file, you will see the directory structure org/apache/log4j and file Logger.class. Note: All Web applications see libraries contained under the Tomcat home as /common/endorsed, lib directories.
Broken web.xml File
The webapps WEB-INF/web.xml file defines the context in which the application runs. There are two main ways that random events can damage the file. The first is that web.xml becomes logically inconsistent, and the second and more probable in production is that web.xml contains typos causing incorrectly formed XML. At a glance, the following file seems well formed, but it is not:
<web-app>
<display-name>1_4 demo</display-name>
<description>A simple Servlet</description>
<servlet>
<servlet-name>DemoServlet</servlet-name>
<servlet-class>berg.servlet.SimpleServlet</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>SimpleServlet</servlet-name>
<url-pattern>/*</url-pattern>
</servlet-mapping>
</web-app>
The error seen in catalina.out is clear:
2/01/2007 13:57:16 org.apache.catalina.startup.ContextConfig \
applicationWebConfig
SEVERE: Parse error in application web.xml file at \
jndi:/localhost/1_4/WEB-INF/web.xml
java.lang.IllegalArgumentException: Servlet mapping specifies \
an unknown servlet name SimpleServlet
The two <Servlet-name> tags should have the same names contained within.
If you remove the > from a tag in a well-formed XML document, the document is instantly broken. Viewing the XML file within Firefox will give you an easy to understand error message and a line number. For example, if in our working application, we remove the > from </web-app> we would get:
XML Parsing Error: unclosed token
And in catalina.out:
2/01/2007 14:21:38 org.apache.catalina.startup.ContextConfig \
applicationWebConfig
SEVERE: Occurred at line 14 column 21
Incorrectly Compiled Java Version
In a diverse production environment with legacy applications running on older versions of Tomcat, you can expect Java version issues occasionally. For the test infrastructure, compiling our Web app under Java 1.4 or Java 1.5 will not break it. However, if you deploy the Java 1.5 version of the Web application to a Tomcat 4 server that uses Java 1.4, then the application will fail with an error message mentioning unsupported major.minor version 49.0:
exception in thread "main" java .lang.UnsupportedClassVersionError:AlansTest \
(unsupported major.minor version 49.0)
This issue can be expected to arise from generation to generation, and we can expect issues compiling to Java 1.6 in the near future. Note: Java 1.4 has major.minor version 48.0.
Stack Traces
Stack traces in catalina.out are ugly, verbose, and mostly an obscure issue for those of us who have not written the misbehaving application in the first place. I would advise you to ignore most of the unnecessary words and grep on the terms error or "root cause".
Out of Memory
So, now the application works in your nice warm test lab. However, it may fail or be slow at the busiest points of the busiest days, and you may be getting memory errors. In this situation, you may need to quickly up the memory before you have time to actually figure the exact failure details out.
The Java virtual machine manages its own memory and looks for configuration in the environment variable JAVA_OPTS.
To find out the current settings:
ps -ef | grep java
If you are lucky, you will see as part of the output values similar to:
-Xms256m -Xmx512m
The setting is the minimum and maximum value for the heap size. If the server has the necessary resources try (in the short term) to double the numbers and then ask the developers to put in proper settings. There are clear limits on 32-bit virtual machines and, when you are reaching the 3-4 GB zone, this trick will not help you much and could make issues worse.
Memory management for large-scale deployments with Java has the potential to require detailed work from the responsible party. Memory management involves a garbage collector that removes resources that are no longer used. The way memory is allocated and used is complex and, for the very best settings, may require a nice fat JAVA_OPTS string. Therefore any changes during the maintenance slot should always be referenced back to the source providers.
JVM Verbosity
Okay, so your application is slow and fails sometimes under peak load, but it performs well under test situations. Now it is time to make the developers responsible by giving them the necessary memory management logging information in an extra file. To achieve this, add the following settings to JAVA_OPTS of the Tomcat user and restart Tomcat:
-Xloggc:<file> -XX:+PrintGCDetails
Where <file> is a value similar to /var/log/app/jvm.log. To graphically view the resulting output, I recommend the tool gcviewer [3].
Note that during full garbage collection, the whole application stops. If the stopping time is long, then different parts of the problem application may have real issues. For example, persistence managers keep pools of database connections open and have the potential to get confused. Knowing how many full GCs occur and for how long is essential, hence the need to push the correct logging back to the developers.
If your organization has had failures because of high real-life usage, then tools such as Jmeter can act as stress-testing sanity checks of new applications before they are placed into the wild and dangerous waters of production use.
Load Balancers and Round Robin DNS
Large applications must be able to scale up or scale out. Scaling out may involve a series of instances of Web applications on different machines pointing to the same database with a load balancer (popular) or Round Robin DNS (much less popular) to distribute the load. One irritating and trivial issue is knowing which application has failed. Personally, I like to see a viewable path in each application that points to the root of the Web application. In Java, this may be achieved in the relevant servlet or JSP via printing the string prefix:
prefix = getServletContext().getRealPath("/");
Of course, one might consider the giving away of any file information an unnecessary security risk. You are always welcome to print a random string per application that has meaning only to you.
Conclusions
The Apache infrastructure mentioned here is a popular, stable, brilliantly coded and heavily deployed structure. Still, things can and do go wrong. With enough gremlins typing at the keyboard long enough, somewhere a configuration file will have a typo or a bit of Java code will be incorrect. Additionally, under load, Java memory management is somewhat of an art.
Practicing failure is a handy methodology for decreasing the pain of a Monday evening maintenance slot. Eliminating the obvious issues, allows sys admins to focus on the trickier and harder to define problems.
References
1. Apache2 -- http://httpd.apache.org/
2. Cocoon -- http://cocoon.apache.org/
3. GCViewer -- http://www.tagtraum.com/gcviewer.html
4. Hibernate -- http://www.hibernate.org/
5. Log4j -- http://logging.apache.org/log4j/docs/documentation.html
6. Mod_rewrite -- http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html
7. Struts -- http://struts.apache.org/
8. Tomcat -- http://tomcat.apache.org/
9. Mod_jk -- http://tomcat.apache.org/connectors-doc/
10. Ubuntu -- http://www.ubuntu.com/
Alan Berg, Bsc. MSc. PGCE, has been a lead developer at the Central Computer Services at the University of Amsterdam for the past seven years. In his spare time, he writes computer articles. In previous incarnations, he was a technical writer, an Internet/Linux course writer, and a science teacher. He likes to get his hands dirty with the building and gluing of systems. He remains agile by playing computer games with his kids who (sadly) consistently beat him. You may contact him at: reply.to.berg@chello.nl.
|