Article Figure 1 Figure 2 Listing 1 Listing 2
Listing 3 Listing 4 Listing 5 Sidebar 1 Sidebar 2 aug2006.tar

Open Source Build Management for Java Projects -- Part 1

Craig Caulfield

The field of software engineering is now reaching a stage of maturity such that developers and systems administrators have a significant body of knowledge, best practices, and patterns to draw upon at all stages of the development process. For example, when it comes to setting up the infrastructure for a software development project, best practice suggests the following elements:

  • There should be a central code repository that represents that latest state of the project.
  • Developers should have a local workspace, isolated from the main development line, where they develop and test their assigned tasks. When these changes are stable, they can be committed to the main development line, preferably in small, regular chunks.
  • The project should be automatically built from source code on a regular basis (possibly daily or even multiple times during the day) so that any newly committed code that breaks the build can be detected and rectified quickly. An automatic build also means it should be possible for someone new to the project to check out a fresh version from the repository, build it from sources, and end up with a functioning product.

In a Java development environment, implementing this sort of infrastructure has typically meant using repositories such as CVS:

http://www.nongnu.org/cvs/ 
            
or Subversion:

http://subversion.tigris.org/
and perhaps a build management tool like CruiseControl:

http://cruisecontrol.sourceforge.net/
and then crafting various Ant build files to explicitly orchestrate each step of the build cycle. However, anyone who has had to maintain these build files over time will know that the task can quickly become unwieldy.

Fortunately, open source tools such as Maven 2 (http://maven.apache.org) can help. Rather than hand-cranking an Ant build file, build administrators define a sequence of Maven goals to, for example, clean out the build artifacts directory, recompile the project, run the unit tests, generate any documentation, and publish the results of the build. So that each goal knows where to go and what to do, Maven assumes there will be a standard directory structure in much the same way the J2EE specification defines a standard directory structure for Web applications. When used in conjunction with build management tools such as CruiseControl, which is Maven-aware, build administrators can provide their development teams with a simple but complete build environment for just the cost of a low-range box on which to run it.

The purpose of these articles is to describe how to create an open source Java software development build environment using Maven version 2 and CruiseControl version 2.4 like that shown in Figure 1. In this first part, I will focus on Maven, and I will cover CruiseControl next month. In my case, the build machine is an unremarkable Intel box running Linux SUSE 10.0, and it's assumed that the source code repository is already up and running (here, I'll be using Subversion).

Maven

Even though every software development project is unique, at a high level they all need to perform some common infrastructure tasks like compiling the source code, running tests, creating metrics and documentation, and deploying the final product. Naturally, there's variation in detail, but these are the tasks generally done.

Across several Apache projects, it was recognized that these common tasks could be abstracted in a way that provided some tangible benefits. For example, having a common development structure meant developers would be able to move between projects without having to learn each project's specific routine for getting things done, so they could become productive much sooner. Furthermore, all the rules for orchestrating the steps of a build were usually captured in make files or Ant build files. Maintaining these files for even a project of moderate size required a certain skill and care.

In response to these and other lessons, Maven creates a pattern-informed development infrastructure with these features:

  • A standard directory structure is defined along with a way to easily generate it. Then, across different projects, everyone will know where to find the source files, the test cases, and any project-specific resources, such as configuration files, images, or documents.
  • Dependencies are managed through remote and local repositories. A remote repository is a central warehouse for common dependencies like junit or the Apache Commons utilities (Maven uses Ibiblio http://www.ibiblio.org/maven2/ by default); while a local repository is a developer's cached collection of downloads from the remote repository. When a project declares a dependency, Maven sees whether a copy is already held in the local repository; if not, it will be automatically downloaded from the remote repository.

As I'll show later, it's also possible to create "internal" remote repositories where commercial or proprietary products can be stored, along with any other dependencies that can't be satisfied by the usual central repository.

  • It is easy to create up-to-date project documentation (javadoc, metrics, project information, dependencies, mailing lists, repository activity, etc.) and to publish that information as a Web site. Team members can quickly find project information and see what everyone else is doing so there's more chance of coordinating work, reusing code, sharing knowledge, and, most importantly, there are fewer surprises.
  • Each project declares how the final product will be packaged, for example, as a JAR, WAR, or EAR. Besides actually creating such an artifact, the packaging declaration indicates that a specific build lifecycle is followed. For JARs, the default packaging, the build lifecycle is the sequence: process-resources, compile, process-test-resources, test-compile, test, package, install, deploy. If any specific goal is invoked, Maven will ensure that any relevant previous goals will be invoked first.
  • In the real world, it's unlikely that all development will happen within a single Maven project: there may be a number of projects producing JAR files, with these perhaps being used in a Web application in yet another project, and maybe some EJBs in the mix as well. To Maven, this type of project structure is simply configured as another type of dependency.

So, in a nutshell, Maven is the realization of a number of fundamental configuration management and project management patterns.

Installing Maven

Download the binary Maven distribution from:

http://maven.apache.org
and unpack it to a location such as /usr/local/maven-2.0. Then create an environment variable called $MAVEN_HOME, point it to the installation directory, and add the $MAVEN_HOME/bin to $PATH.

Checking for the version should show if the installation has been successful:

cruise@snagglepuss:~> mvn -version
Maven version: 2.0 
Configuring Maven

The general behavior of Maven is controlled by a settings.xml file (Listing 1), which can be placed in the user's home directory for a local configuration or in the $MAVEN_HOME/conf directory for global configuration.

For example, the <localRepository> element can be used to change the default location of the locally cached repository. Meanwhile, the <servers> element can be used to define the authentication profiles Maven needs when connecting to particular servers. Matching <id> elements in the project object model (POM, discussed next) will use this authentication information when connecting to the server to deploy the project documentation or connect to the internal remote repository.

Creating and Configuring a Maven Project

Maven uses the idea of an archetype to define a standard directory structure for new projects. For example:

mvn archetype:create -DgroupId=sopwith -DartifactId=SunDeveloper
    
This command creates a Maven project called SunDeveloper, a top-level package called sopwith, along with some skeleton main and test source files, all inside the standard Maven directory structure:

SunDeveloper 
|-- pom.xml 
`-- src 
    |-- main 
    |   `-- java 
    |       `-- sopwith 
    |           `-- App.java 
    `-- test 
        `-- java 
            `-- sopwith 
                `-- AppTest.java 
Besides laying out the directory structure, Maven creates a project object model file (POM), pom.xml -- the single source of truth for all project information. The main elements of the out-of-the-box version (Listing 2) are:

  • The <project> element of the root element of all POM files.
  • The <modelVersion> element indicates which version of the object model this POM is using. Maven is still a work-in-progress, so the first step in debugging any problems is making sure that everything is at the right version level.
  • The <groupId> is a unique identifier (often fully qualified) for the organization or group that created the project.
  • The <artifactId>, <packaging>, and <version> elements work together to name the project's final product.
The <artifactId> is a unique name for the product, the <packaging> element declares how the product will be packaged (as a JAR, WAR, EAR, etc.), and the <version> element gives the version number. The format is <artifactId>-<version>.<extension>, so this project will produce SunDeveloper- 1.0-SNAPSHOT.jar.

As mentioned before, the <packaging> element also does double duty by indicating that a specific build lifecycle is followed. If any specific goal is invoked, Maven will ensure that any preceding lifecycle goals will be invoked first.

Maintaining the build process now means working with coarse-grained lifecycle goals rather than fine-grained Ant tasks.

  • The <name>, <url>, and <description> elements are all used by Maven's generated documentation (see the section on creating a documentation Web site for the details).
As you develop and refine your project by editing the POM, you influence how and when Maven's goals are executed. See the Maven online documentation and the XML Schema at:

http://maven.apache.org/maven-v4_0_0.xsd
    
for the complete description of the project object model.

The standard directory structure can be changed by defining new archetypes, but unless there's a compelling reason, it's best to work within Maven's bounds. One valid way in which the standard directory can be changed is by adding a resources directory under src/main to contain any project-specific resources such as configuration files, images, or documents. In this case, the directory structure would now look like:

SunDeveloper
|-- pom.xml 
`-- src 
    |-- main 
    |   |-- java 
    |   |   `-- sopwith 
    |   |       
     `-- App.java 
    |   `-- resources 
    |       `-- 
    instructions.txt 
    `-- test 
        `-- java 
            `-- sopwith 
                `-- AppTest.java 
When the final project artifact (JAR, WAR, EAR, etc.) is created, the contents of the resources directory will be copied to the base directory.

Executing Maven Goals

Even at this early stage, the project is ready to start responding to Maven goals. But, there's something to be aware of for new Maven installations -- each time Maven is invoked, it checks which plug-ins it needs to execute the goal at hand and downloads from the central repository anything not already held locally. Initially, this can take a few minutes while Maven gets itself set up, but thereafter things will happen more quickly. With this forewarning, compiling the project, which at this point consists of just the skeleton main program, simply means invoking the following Maven goal:

mvn compile 
This single goal will compile any source files and place the classes in a newly created target/classes directory. Compiling and running the test cases is equally simple:

mvn test 
Even though Maven's goals are a convenient way to execute most project tasks (see the sidebar "Some Useful Maven Plug-ins" for some of these), Maven doesn't have goals for everything. For example, a common requirement in an automated build environment would be updating the local working copy with any changes from the source code repository before starting a build, and perhaps creating a tagged version if the build completes successfully.

At present, there is no Maven or native Ant task that will do this. So, one solution is to create some custom Ant tasks such as those in Listing 3, and then take advantage of Maven's build lifecycle to make sure the tasks are invoked at the appropriate time. In my case, I have bolted the Subversion update onto the validate lifecycle phase as shown in the <build> element in Listing 4.

Maven goals can also be executed from within your favorite IDE. See Sidebar 2 for how this can be done with Eclipse.

Creating a Documentation Web Site

Archetypes are also used to create Maven's built-in documentation Web site:

mvn archetype:create -DgroupId=sopwith -DartifactId=SunDeveloperSite \
  -DarchetypeGroupId=org.apache.maven.archetypes \
  -DarchetypeArtifactId=maven-archetype-site 
    
This command creates another standard directory structure. For convenience, I copied the just-created site directory into the main project, so my directory structure now looks like:

SunDeveloper 
|-- pom.xml 
`-- src 
    |-- main 
    |   |-- java 
    |   |   `-- sopwith 
    |   |       `-- App.java 
    |   `-- resources 
    |       `-- instructions.txt 
    `-- site 
    |   |-- site.xml 
    |   `-- apt 
    |   `-- fml 
    |   `-- xdoc 
    `-- test 
        `-- java 
            `-- sopwith 
                `-- AppTest.java 
Once the documentation Web site has been generated, it can be customized in a number of ways:

  • The site's banners and menus can be changed via the site descriptor file, site/site.xml (Listing 5).
  • The site generator creates three main sub-directories under the site that can be used for FAQs, technical documents, and any other project-specific content.
  • Lots of information in the POM is used as fodder for the documentation Web site. This includes the project name, URL, and description, along with information about the development team members, mailing lists, issue tracking system, and source control tools. Listing 4 has some of the usual inclusions.

The basics of Maven's documentation Web site have now been created; it just needs to be fleshed out with some reports. Maven knows which reports it should generate by the contents of the <reporting> element in the POM. A typical collection is shown in Listing 4.

To generate the Web site, the following command is used:

mvn site 
Maven generates the documentation under the target directory, but naturally it can't just be left there.

To then deploy the documentation Web site, the <distributionManagement> element is added to the POM. Listing 4 shows a <distributionManagement> element that points to a directory at /opt/maven/sites/developer on http://snagglepuss, which is served by Apache. As mentioned previously, if a username and password are needed to connect to the deployment server, they can be specified in the <server> element of $MAVEN_HOME/conf/settings.xml.

The following command is then run to deploy the site to its final destination:

mvn site-deploy 
When calling this goal, an error message like "The authenticity of host 'snagglepuss' can't be established..." may come up. The solution is to create a directory called .ssh in the user's home directory. The next time the goal is called, Maven will create a known_hosts file in this new directory, and the problem won't happen again.

The documentation Web site should now be up and running (Figure 2).

Setting Up and Using Repositories

As mentioned before, Maven uses a number of repositories to manage each project's dependencies and build artifacts:

  • Remote repositories are central warehouses for commonly used dependencies like junit and are typically accessed using protocols such as http://, ftp://, scp:// or file://. By default, Maven uses Ibiblio (http://www.ibiblio.org/maven2) as its remote repository.
  • The local repository is a local cache of downloads from a remote repository and also contains build artifacts that have not yet been released.

When the POM declares a required dependency, such as junit in Listing 4, Maven first checks if a cached copy is held in the local repository; if not, the dependency is downloaded from the remote repository and then becomes available to the project.

So that all this happens seamlessly, all Maven repositories have a certain directory structure and naming convention, which maps to particular parts of the <dependency> element in the POM. The <groupId> tells Maven to look in a specific repository directory, <artifactId> then points to a specific subdirectory, <version> is yet another subdirectory down, and <type> points to a named file of this type. Browse through http://www.ibiblio.org/maven2/ to see this directory structure in action.

For the most part, Maven will handle any repository activity for you, but there may be times when you manually need to install a certain resource into your local repository. For this, Maven provides an install goal:

mvn install:install-file -Dfile=<path-to-file> -DgroupId=<group-id> \
    -DartifactId=<artefact-id> -Dversion=<version> -Dpackaging=<packaging>
    
Afterwards, the resource will be available to your project just like anything else in the local repository.

Normally, developers manually add resources to their local repository only when those resources can't be found in the remote repository. This practice can lead to subtle version conflicts and inconsistencies between developers' workspaces, because each developer is actively manipulating his local repository. It also means that developers can no longer clean out their local repository and rely on Maven to do a complete refresh on their behalf. A better way is to create an "internal" remote repository -- one that contains jars and resources that can't be found in Ibiblio, that are locally created, or which are commercially licensed. All developers can then point to this alternative repository.

An "internal" remote repository is just a local directory served by an appropriate protocol. In my case, I've created a repository directory at /opt/maven/remote and set it up to be served by Apache. So that Maven knows what is happening, a <repository> element is added to the Maven configuration file, settings.xml (Listing 1). A <repository> element with a matching <id> in the POM (Listing 4) then lets each project know it has an alternative place to search for dependencies.

With the "internal" remote repository set up, artifacts can be added to it so they can be drawn upon when necessary. For example, to deploy the MySQL Connector/J JAR file:

mvn deploy:deploy-file -DgroupId=MySQL -DartifactId=MySQLConnectorJ \
  -Dversion=3.1.12 -DgeneratedPom=true -Dpackaging=jar \
  -Dfile=mysql-connector-java-3.1.12-bin.jar -DrepositoryId=internalRepository \
  -Durl=scp://snagglepuss/opt/maven/remote
Maven Wrap-Up

Maven is a tool that developers and build administrators have long been waiting for. But, it is still a work in progress, so be prepared to do some research (read: trial and error) and to hit the newsgroups when things don't work as expected. Also, there have been some major changes between versions 1 and 2, which means finding good, version-right examples and documentation, or just any documentation in general, can be a problem. Even so, Maven is worth the perseverance.

With Maven up and running, we're half way towards the goal of an automated build environment. In part 2 of this series, I'll show how CruiseControl completes the setup.

Resources

Apache Maven Project -- http://maven.apache.org/

Beck, K. 2000. Extreme Programming Explained: Embrace Change. Boston: Addison-Wesley.

Berczuk, S. P. and Appleton, B. 2003. Software Configuration Management Patterns. Boston: Addison-Wesley.

Clark, M. 2004. Pragmatic Project Automation: How to Build, Deploy, and Monitor Java Applications. Raleigh, NC: The Pragmatic Programmers.

CruiseControl -- http://cruisecontrol.sourceforge.net/

Fowler, M. and Foemmel, M. "Continuous Integration". Available at: http://www.martinfowler.com/articles/continuousIntegration.html

Maraia, V. 2005. The Build Master: Microsoft's Software Configuration Management Best Practices. Upper Saddle River: Addison-Wesley.

Massol, V. and O'Brien, T. M. 2005. Maven: A Developer's Notebook. Sebastopol, CA: O'Reilly Media.

McConnell, S. 1996. Rapid Development: Taming Wild Software Schedules. Redmond, WA: Microsoft Press.

O'Brien, T. M. 2006. "Maven Project Info Reports Considered Dangerous". Available at: http://www.oreillynet.com/onjava/blog/2006/03/maven_project_info_reports_con.html

Craig Caulfield is a senior software engineer and build manager for a defense and commercial software house in Perth, Western Australia. He holds a Bachelor's degree in Computer Science, a Master's degree in Software Engineering, and certifications in Java, XML, DB2, UML, MySQL, and WebSphere. He can be contacted at: ccaulfi1@bigpond.net.au.