NoodleGlue: Bridging C/C++ and Java

C/C++ Users Journal July, 2005

C++/Java integration made easy

By Tree

Tree is the lead programmer at NoodleHeaven. He can be contacted at tree@noodleheaven.net.

Integrating Java with C/C++ code can be a painful process. In the early years of Java, a multitude of solutions were proposed, but the Java Native Interface was the official standard and other solutions disappeared. Granted, JNI certainly wasn't the easiest to use, but it did have the considerable advantage of being proposed by Sun, and it was designed to work on any platform, whereas other solutions were mostly platform specific. JNI provides a mechanism for Java code to call C routines, which are identified with specially constructed names. However, it doesn't have built-in support for C++ or for bridging complex types. It's no surprise that working with C++ code generally involves writing a considerable amount of very tedious code simply to deal with passing data across the JNI.

When developing the Noodle interactive music player [1], I needed to write the core of my engine in C++ and wanted to use several large open-source libraries written in C and C++. However, I still wanted to write most of my own code in Java. I identified these key requirements for my needs:

To ensure that bridging would be invisible to Java programmers, so that a bridged C++ class could be a first-class Java citizen, allowing subclassing and the like.
To bridge as much of the complete C++ feature set as possible, including the difficult stuff like templates.
Changing C/C++ APIs should cause minimal rebuild effort for the Java bindings.
Memory management and garbage collection should be as transparent as possible.
Must work cross-platform (in my case, Windows and OS X).
Should simplify JavaBean serialization for native libraries where possible.
Require no modification (or as little as possible) to the existing C++ code.

I looked at a number of solutions, notably SWIG [2] and GCJ [3]. While both have merit, they didn't fit all my needs. Bridging Java and C++ isn't easy. There are fundamental structural differences between C and C++ and Java—the implementation of memory management, for instance—so there probably isn't a single solution that fits all needs. Consequently, I developed NoodleGlue [4], a solution that automatically wraps C/C++ class libraries in JNI code for access from Java. Since I originally designed NoodleGlue to be able to use the OpenSceneGraph library [5], it works especially well for C++ code that implements a reference-counting object management strategy (as OSG does). It is still possible to work with manually deleted objects, albeit with a bit more hassle. NoodleGlue can also be useful for straight C APIs, too—I bridged the OpenGL API in an afternoon, for instance.

NoodleGlue

Although a perfect bridge using JNI is impossible, I realized that I could get close by creating proxy classes for the C++ and Java objects on either side of the JNI divide and by generating code to automatically marshal/ unmarshal the parameters to be passed across the bridge. I further realized that I could generate code that would even let Java classes inherit from C++ classes and override C++ methods. Gradually, as the need arose, I ended up bridging more and more C++ features and quirks, including templates and STL, enumerations, function pointers, and other exotic stuff that I never thought would be possible. What started out as a fairly modest project eventually became NoodleGlue, which is now probably the most comprehensive JNI bridging solution. It wasn't an easy ride.

NoodleGlue is an open-source project that includes a JNI code generator and JNI object manager. The name is perhaps a bit abstract, but entirely appropriate, given the sometimes hair-raising complexity of binding the seemingly endless subtleties of C and C++ to Java. I handle a vast amount of native syntax, from simple functions taking built-in types, to typedefed instances, to blood curdling nested template classes. I also handle enums, primitive pointers (int*, for instance), and abstract classes. There's still room for improvement; for example, I don't currently generate accessor functions for public fields and global variables, but NoodleGlue is still developing, so I expect these and more to come over the next few months.

Here's a partial list of NoodleGlue's features:

GCC and Visual Studio compiler support.
Code generator tools are totally cross platform.
C and C++ code library support.
Requires only the API header files.
Generated code can be extensively customized using a single XML file.
Generates Java code with support for Java 1.4 or better.
Classes, structs, and global methods all supported as static and local methods.
Enums (optionally generated as type-safe class objects or simple integer values).
Typedefs (for basic types, classes, and templates).
STL (list, map, set, and so on) and custom templates fully supported as instances, not as an abstract template structure.
Operator methods supported (operator==, operator+, for instance).
When C++ classes inherit from the Bridgable base class and use reference counting, it provides almost automatic object management.
The Bridgable superclass also lets Java classes inherit from and override C++ classes; even protected methods in C++ can be accessed and overridden.
Use of java.nio buffers for efficient access to raw data blocks in Java.
Custom translators can be defined to optimize the bridging of any class. For example, a Float Vector class can be passed as an array of floats, rather than as an object pointer.
Optional method parameters in C are supported with multiple generated methods in Java.
Complex method parameter types, such as typedefs or template types.
Class generation can be suppressed per class.
Pointers and references to simple types (int or float) supported as parameters and return values.
Class and method comments transfered in Doxygen format into JavaDoc.
Generates the Java package hierarchy based on class folder hierarchy, or the Java package can be specified per class
Allows renaming of classes and methods in Java.
Ordered code analysis to support library dependencies.
Pure abstract classes are bridged as Java interfaces.
Multiple inheritance can be handled when using pure abstract classes.
Abstract classes can also be subclassed in Java (when using Bridgable support).
Efficient caching to improve speed performance of bridging code.
Fast proxy object creation and destruction.
Identifies JavaBean-style property accessors (getters and setters) and generates xdoclet JavaBean tags in comments where appropriate (optional).
Allows customization of JavaBean description for classes, methods, and parameters.
Thread-safe instantiation and destruction of objects.

In essence, NoodleGlue comes in two parts:

A set of tools, collectively called "NoodleGen," to generate code to bridge between C/C++ and Java. I use Doxygen to parse the C++ header files, then transform this into a more data-rich "binding file," which I use to generate Java and C++ code to bridge the libraries classes, enums, functions, defines, and the like.
Runtime libraries (both Java and C++) to manage Java references natively, garbage collect native-side referenced objects, and provide caching and optimizations required by the generated JNI code.

How NoodleGlue Works

NoodleGlue works by generating three sets of code files called the "glueball":

A set of Java classes that mirror the C++ classes being bridged. These wrapper classes have all the same methods as their C++ peers, and can be treated as if they were normal Java objects.
A JNI file for each class, which is a standard C file containing the actual C routines that the Java JNI interface will call.
"Sham" C++ classes that subclass the original C++ classes. These classes are used to allow Java code to subclass the C++ class.

As Figure 1 illustrates, whenever Java code calls a method on a wrapper object, any parameters are first marshaled using a process called "translation," which converts the parameters into primitives that can be passed across the JNI interface. Control is then passed through the JNI to a C routine called the "JNI Handler." NoodleGlue generates a JNI Handler for each method in a class, whose job is to unmarshal the parameters back into a form that the original C++ method takes. The JNI Handler then calls the original method on the original C++ object, marshals the return value, and returns control back to Java. The Java wrapper object then unmarshals the return value and returns control to the original calling code.

The key to the process is the "translation" of parameters and return values. Because only primitive types can be passed through the JNI, whenever a reference (or pointer) to a C++ object is passed to Java, either as a return value or as a parameter to a method call, a Java wrapper object is created to represent it. If that object is then passed back to native code, a reference to the original C++ object is extracted from it. NoodleGlue allows extensive customization and optimization of the translation process for each type, thereby reducing the overhead of bridged method calls, but that's beyond the scope of this article. If you want to write your own, it's fairly easy to understand how translators work by looking at the built-in translators.

To allow Java classes to successfully subclass C++ classes, a special "Sham" class is created for each C++ class that's bridged. When an instance of a Java subclass of the original C++ class is created, one of these sham objects is instantiated. Whenever a C++ class calls a method on the sham object, it first checks whether that method has been overridden by the Java subclass, and if so, it passes control to Java. This results in a slight, but negligible, overhead on each method call to the sham object, even if a method isn't overridden.

The biggest headache when bridging Java and C++ code is the difference in object-management strategies. Java provides automatic garbage collection of objects, whereas C++ relies on manual deletion. If a Java object holds a reference to a C++ object, it's important that the native object isn't deleted, and vice versa. The only perfect solution to this problem is to integrate the Java garbage collector with C++ at the lowest level, as GCJ does. However, this requires a nonstandard JVM and lots of other compromises. My compromise is to use reference counting for C++ objects. The easiest way to use this is for your C++ classes to inherit from our Bridgable base class, and to use the reference and unreferenced methods it defines for object management. You can use classes that don't inherit from Bridgable, but you'll have to deal with object deletion yourself, and you also won't be able to subclass them.

Reference counting isn't a perfect object-management solution, and there are all the usual gotchas to watch out for when using Bridgable. As usual, the main issue is to avoid circular references, but when using NoodleGlue, you must also be aware that circular references can happen across the bridge. It's possible for a C++ object to hold a reference to a Java object, which in turn may directly or indirectly reference another Java object, which references the original C++ object. If this happens, you'll leak memory, so be aware. Obviously, it's best to avoid such problems as much as possible, but I do have a workaround for manually releasing "islands" like this when they're unavoidable.

The NoodleGlue runtime doesn't delete objects immediately as they become unreferenced, but instead puts them on a queue and a separate thread deletes them shortly afterwards. This avoids the problem of reference-counted objects being deleted when they are momentarily unreferenced as one object passes it to another, for instance.

The NoodleGen Code Generator

NoodleGen consists of two applications, Wrapgen and Doxygen2Wrapgen. Wrapgen generates all the actual source code for a glueball—the necessary JNI handlers, the sham C++ class, and the Java wrapper classes. As input, Wrapgen uses an XML file called the "Binding file" that describes all of the C++ types that need to be bridged. I used to write the binding files by hand, but it was an error-prone process and the files were extremely tedious to maintain—especially frustrating since they just duplicated information already present in the C++ header files. Eventually, I found a way to use Doxygen [6] to parse C++ header files directly and provide us with a structural description of the classes. I then wrote Doxygen2Wrapgen to transform the output from Doxygen into suitable input for Wrapgen. Although it's still theoretically possible to write the XML descriptors by hand, I now use the three-step process in Figure 2:

Doxygen parses the header files.
Doxygen2Wrapgen then analyzes the Doxygen output and processes it according to a Build Configuration file to create the Binding file (used to be written by hand).
The Binding file then drives Wrapgen as before.

Although a little hairy, this process can be automated with a script.

Doxygen2Wrapgen uses declarations and commands in a Build Configuration file to define important aspects, such as package names for classes, custom type translators, exclusion or renaming of entities (to avoid name clashes with Java built-in types, for example), and the like.

The Build Config file is where you'll do most of your work to customize NoodleGlue to the specific requirements of your code. The most important elements of the Build Config file specify where to find the source header files and what directory to use for writing out the generated code. You'll occasionally need to provide hints to help the bridging process and avoid potential problems. For instance, C and C++ have scope for implicit references such as "using namespace..." that must be declared to NoodleGlue with UsingNamespace declarations in the Build Configuration. Your code may use specially defined keywords for calling conventions or export specifications like _cdecl or CALLBACK. Because these are irrelevant to NoodleGlue and will only confuse it, I let you tell the parser to ignore them with the SuppressedWord command in the Build Config file. These examples can be seen in Listing 1.

To generate this translation code, NoodleGlue must know about all of the types used in your code. It gets most of this information from your header files, but there may be times (for instance with platform-specific types in system header files) when you have to help. It is in situations like this where generating java bindings can become most complex. The Build Configuration file lets you explicitly define types, rather than have them parsed from the headers files; it also lets you rename classes or methods, and provide extra information so that the generated Java code can, for instance, contain tags for JavaBean descriptors.

For especially awkward situations, or in places where optimization is important, I also let you write custom translators, which let you specify exactly how NoodleGlue passes a particular type across the JNI. Custom translators can let you use your knowledge of a particular class to make very significant optimizations. For instance, it's often much more efficient to copy immutable objects across the bridge (creating a new Java object that implements all the behavior of the original) than to pass a wrapped reference to the original object. The process of writing translators is beyond the scope of this article, but it isn't too difficult and you can learn a lot by looking at the default translators.

Once you've written a Build Configuration file, you simply need to run the NoodleGlue build script. It generates the glueball that contains the generated C++ and Java source code in the directories you specify in the Build Configuration file. The whole process is easily automated using a build system like Apache Ant or Make. Intervention should only be required when NoodleGen fails to understand some previously unknown syntax or finds genuine problems with the code it is wrapping (usually when methods are declared but have no body).

Runtime Support

NoodleGlue bindings contain code that depends upon the runtime library. While it's possible to make JNI code that doesn't need this type of external support, the NoodleGlue Runtime gives you a number of key benefits:

Garbage-collection thread for orphaned C++ proxy objects.
Reference-counted C++ entities can be memory-managed across the JNI bridge.
Pointers to raw memory can be accessed efficiently through Java's NIO buffers.
Support for pointers and references to primitive types.
Efficient caching of java class references can be done at the level of the app, instead of per function.
Transparent usage.

Starting up the runtime engine and loading the native libraries and Glueball libraries is transparent to Java programmers. All you need to do is make sure that the native libraries are accessible to the JVM on whatever platform you're using (for instance, they're in the working directory), then import and use the bridged Java classes just as you would any other Java class.

HelloWorld! Example

Listing 2 illustrates NoodleGlue in the simplest terms. The HelloWorldExample app displays a JFrame containing a Swing JTextArea that calls across JNI to be filled with the string "Hello World." All I am wrapping is a helloWorld function that returns a string as a const char* in Listing 2(a). First, I need to run Doxygen on the header file and request XML output. This output is then run though NoodleGen, with a build script customized for this library (Listing 1). In this script, I have given NoodleGen some hints to ignore some keywords such as export directives found in STL and NoodleGlue, and potential namespaces that may be implicit at times.

I provide the name of the output file to pass to the code generator (in this case, Definitions_HelloWorld.xml), then give three directories that NoodleGen must parse and extract definitions from. By default you need to add the STL and NoodleGlue directories. They are not implicit to give NoodleGlue maximum customizability. The final XML I want to parse is for the HelloWorld library with the single method, helloWorld. I give the location of the Doxygen XML output, give the name the library to be called, and also the base package to put generated Java code into. This file is then quickly processed and the output fed into the code generator. Listing 3 is the output for the HelloWorld library (HelloWorldJNI.cpp, HelloWorld.java, and the library initialization file, HelloWorldJNILibrary.java). These must build into native and java libraries, with respective dependencies on NoodleGlue libraries. Running HelloWorldExample presents a window with the text "Hello World!". This text came from the native function! That's all I needed—a quick cut-and-paste job on a Doxygen script followed by a small edit to a template NoodleGen build script.

JavaOSG Example

A real-world example of NoodleGlue usage is the open-source bindings for the OpenSceneGraph (OSG) library. This excellent OpenGL scene-graph library project, led by Robert Osfield, is the graphical basis of much of the software developed at NoodleHeaven. It lets us do virtually all of our 3D development in Java. NoodleGlue was developed very much in tandem with the company's need to bind to the OSG libraries. It provided many complex C++ constructs to handle, and has thrown up new ones over the three years we have worked with it. I'm pleased to say that we have always found a solution to whatever OSG has thrown our way (except for multiple inheritance so far!). Because OSG has smart pointer support, we have been able to make small changes to a single base class to provide the kind of runtime support mentioned previously—to allow for subclassing into Java and shared memory management. You can download the source for this project at [7].

Conclusion

NoodleGlue is designed to ease Java access to the many features of C and C++. It occasionally needs a few generic hints, but provides a controlled and comprehensive solution to quickly provide access to native libraries in Java. The runtime elements of NoodleGlue are also designed to be as unobtrusive as possible. So working with a NoodleGlue bridged library should be just like working with a pure Java library. For the most part, I believe NoodleGlue has achieved that aim.

References

[1] http://www.noodleheaven.com/.
[2] http://www.swig.org/.
[3] http://gcc.gnu.org/java/.
[4] http://www.noodleheaven.net/NoodleGlue/ or http://www.noodleglue.org/.
[5] http://www.OpenSceneGraph.org/.
[6] http://www.doxygen.org/.
[7] http://www.noodleglue.org/JavaOSG/.