G1: Java's Garbage First Garbage Collector

Dr. Dobb's Digest September 2009

G1: Java's Garbage First Garbage Collector

A garbage collector works to reclaim areas of memory within an application that will never be accessed again

By Eric J. Bruno

Eric Bruno is a contributing editor for Dr. Dobb's. He can contacted at eric@ericbruno.com


At JavaOne 2009, Sun released Java SE 6 Update 14, which included a version of the much-anticipated Garbage First (G1) garbage collector. G1 is a low-pause, low-latency, sometimes soft real-time, collector that allows you to set max pause time goals and collection intervals through suggestions on the Java VM command line. Although it cannot guarantee it, G1 will attempt to meet your goals, and hence introduce as little latency as possible into your application. This in turn may also make the VM run more predictably as it attempts to meet the pause time goals you provide.

What Is Garbage Collection?

Many dynamic languages, such as C, C++, Pascal, and so on, require you to manage memory explicitly. This includes memory allocation, de-allocation, and all of the accounting that occurs in between. In this time frame, you must be sure to not lose track of the memory (thereby failing to ever free it), or the result will be a memory leak. Just as dangerous is the attempt to use an object (or access memory) after it has been de-allocated, through what is called a dangling pointer. Either one of these situations can result in undefined behavior, the accidental overwriting of other data, a security hole, or an abrupt crash.

Automatic memory management (garbage collection) removes the likelihood that these issues will occur since it's no longer left up to you to account for memory allocations. In C++, the concept of smart pointers is one solution, and in other languages, such as Lisp, SmallTalk, and Java, a full-featured garbage collector tracks the lifetimes of all objects in a running program. The history of garbage collection can be traced back to John McCarthy, who invented the concept as part of the Lisp programming language [McCarthy58].

In short, a garbage collector works to reclaim areas of memory within an application that will never be accessed again. At the most fundamental level, garbage collection involves two deceivingly simple steps:

Determine which objects can no longer be referenced by an application. This is done either via object reference counting, or object graphs (tracing). Reclaim the memory occupied by dead objects (the garbage).

Until recently, Java SE came with two main collectors: the parallel collector, and the concurrent-mark-sweep (CMS) collector -- see the sidebar Parallelism and Concurrency. As of the latest Java SE 6 update release, the G1 collector is another option. The plan is for G1 to eventually replace CMS as a low-pause, soft real-time collector. Let's take a look at how it works.

Parallelism and Concurrency

When speaking about garbage collection algorithms, parallelism describes the collector's ability to perform its work across multiple threads of execution. Concurrency describes its ability to do work while application threads are still running. Hence, a collector can be parallel but not concurrent, concurrent but not parallel, or both parallel and concurrent.

The Java parallel collector (the default) is parallel but not concurrent as it pauses application threads to do its work. The CMS collector is parallel and partially concurrent as it pauses application threads at many points (but not all) to do its work. The G1 collector is fully parallel and mostly concurrent, meaning that it does pause applications threads momentarily, but only during certain phases of collection. For more information on garbage collection, and common algorithms used, read my latest book entitled Real-Time Java Programming with Java RTS, available from Pearson Publishing.

—EJB

How Does G1 Work?

Garbage-First is a server-style garbage collector, targeted for multi-processors with large memories, that meets a soft real-time goal with high probability [Detlefs04]. It does this while also achieving high throughput, which is an important point when comparing it to other real-time collectors.

The G1 collector divides its work into multiple phases, each described below, which operate on a heap broken down into equally sized regions (see Figure 1). In the strictest sense, the heap doesn't contain generational areas, although a subset of the regions can be treated as such. This provides flexibility in how garbage collection is performed, which is adjusted on-the-fly according to the amount of processor time available to the collector.

Figure 1: With garbage-first, the heap is broken into equally sized regions.

Regions are further broken down into 512 byte sections called cards (see Figure 2). Each card has a corresponding one-byte entry in a global card table, which is used to track which cards are modified by mutator threads. Subsets of these cards are tracked, and referred to as Remembered Sets (RS), which is discussed shortly.

Figure 2: Each region has a remembered set of occupied cards.

The G1 collector works in stages. The main stages consist of remembered set (RS) maintenance, concurrent marking, and evacuation pauses. Let's examine these stages now.

RS Maintenance

Each region maintains an associated subset of cards that have recently been written to, called the Remembered Set (RS). Cards are placed in a region's RS via a write barrier, which is an efficient block of code that all mutator threads must execute when modifying an object reference. To be precise, for a particular region (i.e., region a), only cards that contain pointers from other regions to an object in region a are recorded in region a's RS (see Figure 3). A region's internal references, as well as null references, are ignored.

Figure 3: A region's RS tracks live references from outside the region.

In reality, each region's remembered set is implemented as a group of collections, with the dirty cards distributed amongst them according to the number of references contained within. Three levels of courseness are maintained: sparse, fine, and course. It's broken up this way so that parallel GC threads can operate on one RS without contention, and can target the regions that will yield the most garbage. However, it's best to think of the RS as one logical set of dirty cards, as the diagrams show.

Concurrent Marking

Concurrent marking identifies live data objects per region, and maintains the pointer to the next free byte, called top. There are, however, small stop-the-world pauses (described further below) that occur to ensure the correct heap state. A marking bitmap is maintained to create a summary view of the live objects within the heap. Each bit in the bitmap corresponds to one word within the heap (an area large enough to contain an object pointer; see Figure 4). A bit in the bitmap is set when the object it represents is determined to be a live object. In reality there are two bitmaps: one for the current collection, and a second for the previously completed collection. This is one way that changes to the heap are tracked over time.

Figure 4: Live objects are indicated with a marking bitmap.

Marking is done in three stages: