EPerl: Perl, C++, and Java

C/C++ Users Journal July, 2005

Making it easy to execute Perl subroutines from within C++

By Kevin Wittmer

Kevin Wittmer earned a Bachelors of Science in Computer Engineering from Michigan State University. He can be reached at kevinwittme7@hotmail.com.

Perl is a powerful programming language supported by a large and active developer community. As a result, thousands of Perl modules and scripts have been developed, many of which are archived at the Comprehensive Perl Archive Network (CPAN) (http://www.cpan.org/). With an extensive repository of code available for reuse, opportunities to utilize this code clearly abound.

One way to reuse existing Perl code is to embed it in other languages. For example, you might be in the midst of developing a Java application and need to scrape some data off the Web. Rather than writing your own custom Java (or C++) web scraping routine, you can instead reach for one of several Perl modules freely available for web-page scraping. The small amount of Perl code required for invocation would be embedded in the source of a Java method in the same fashion that a SQL statement is embedded using the standard JDBC call-level interface. The results are automatically retrieved back into the Java runtime environment for manipulation and possible display.

In this article, I introduce EPerl, short for "Embedded Perl," a collection of C++ and Java classes that wrap around the Perl interpreter API. The motivation behind EPerl is straightforward—to make it easy to execute a Perl subroutine, script, or statement block from within C++ or Java. The underpinnings of the Java classes included in the EPerl distribution utilize JNI to invoke the C-based embedded Perl API, which are encapsulated by a few key C++ classes, also part of the EPerl implementation. The source code and other relevant EPerl files are available at http://www.cuj.com/code/.

Because all of the development of EPerl was done using ActiveState's ActivePerl 5.8.x for Windows, EPerl is currently available only for Windows. That said, I would consider future support for Linux or Solaris, and the motivation for this will be largely driven by the feedback and requests I receive in regards to EPerl. To date, my focus has been on realizing the following features using C++, Java, and Perl with ActivePerl in the Windows XP environment:

Executing (or evaluating) Perl statement blocks.
Invoking Perl subroutines.
Getting and setting (global) variables based on data type. The three types supported are integer, float, and string.
Capturing of standard output and error through redirection of the output stream.
State and statement execution is isolated on a Perl interpreter instance, this is key for supporting a single thread-interpreter execution model.

To develop EPerl, I used IntelliJ's IDEA editor and Microsoft's Visual Studio .NET 2003 IDE, for Java and C++, respectively. I created EPerl in the same spirit as other embedded languages for Java and .NET, a list that includes Forth, JavaScript, Python, and Tcl, to name a few. This concept of calling one programming language from another is a powerful mechanism, and surprisingly easy to accomplish.

The Basics

The simplest case for using EPerl is to invoke a subroutine that takes no arguments and returns nothing. Therefore, to invoke the Perl subroutine Bar:

sub Bar()
{
   print "Called subroutine Bar\n";
}

requires the Java code:

interpreter.eval("use Foo;");
interpreter.redirectOutput();
interpreter.callVoid("Bar");
String output = 
    interpreter.readStdoutBuffer();
interpreter.restoreOutput();

To spice this up, I redirected the output of this statement to a string. The ability to capture standard output (or error) and access in Java can be useful when working with legacy Perl code. In many cases, an existing piece of Perl code may emit results or errors to standard out or error. Capturing this output is critical to a robust Java-Perl implementation, and one of EPerl's most useful capabilities beyond the more basic subroutine and statement execution.

In many cases, passing data to a Perl subroutine from Java might be called for. In the next example, string arguments are passed. The DoubleFoo subroutine simply accepts the two inputs provided, concatenates these arguments, and returns them back to Java.

sub DoubleFoo()
{
   my $argument1 = shift;
   my $argument2 = shift;
   return $argument1.$argument2;
}

The Java code that executes this routine is:

String argument1 = 
    "Hello Perl, ", argument2 = 
         " said Java.";
String returnArgument = 
    interpreter.call
        ("DoubleFoo", argument1,
              argument2);

In the two previous examples, I specified what Perl module to use to locate the subroutine being called. Doing this across several subroutines becomes redundant when the routines reside in the same module. EPerl lets you avoid this redundancy through the use of interpreter properties, which are nothing more than a hash of key-value pairings. In the case of modules, the MODULE_LIST key along with an array of modules is passed to the interpreter. This array-based property can then be placed in a static const declaration and reused throughout the Java code that executes Perl statements:

ArrayList moduleList = new ArrayList();
moduleList.add("Foo");

Map properties = new HashMap();
properties.put(PerlInterpreter
     .MODULE_LIST, moduleList);

interpreter
     .setRuntimeProperties(properties);
interpreter.call
     ("DoubleFoo", "arg1", "arg2");

The following code fragment defines a JUnit test that helps validate marshaling of an array of string items passed into Perl, then back to Java. This type of class interface makes it easy to pass a variable number of arguments and receive back a variable number of arguments:

StringArray arrayOfString = 
    new StringArray("fooArray");
arrayOfString.add("abc");
arrayOfString.add("efg");
StringArray returnArrayOfString = 
   interpreter.call
      ("ArrayFoo", arrayOfString);
assertEquals("abc", 
      returnArrayOfString.get
           (0).toString());
assertEquals("efg", 
     returnArrayOfString.get
           (1).toString());

Of course, subroutines are not the only thing you'll want to execute. Sometimes, you may want to simply execute Perl scripts from within Java. To help accomplish this task in EPerl, the Java class PerlScript can be used:

PerlScript script = 
   new PerlScript(new File("comma.pl"));
script.execute(interpreter);

As was the case for subroutine execution, output redirection can also be specified for script execution. To redirect output, the interpreter methods redirectOuput, readStdoutBuffer, and restoreOutput can be invoked in the same manner as the first subroutine invocation example.

In addition to Perl scripts and subroutines, statement blocks of Perl code can be executed as an evaluation operation. This EPerl example provides a simple mathematical illustration of how to execute a Perl statement that squares the number "16," assigns the result to a Perl variable, then reads it back to Java as a statement block:

String varId = "x";
interpreter.eval("$" + varId + 
     " = 16; $" + varId + " **= 2;");
double result = 
     interpreter.getDouble(varId);

To help encapsulate this behavior in EPerl, the PerlStatement class helps make executing a Perl code block a matter of calling the execute method followed by a get to obtain the result from the interpreter:

String varId = "y";
PerlStatement statement = 
       new PerlStatement(
       '$' + varId + "= 3.14; $" + 
       varId + " **= 2;");
statement.execute(interpreter);
double result = 
       interpreter.getDouble(varId);

C++ Details

The Perl examples I've presented so far demonstrate many of the key features of EPerl. However, EPerl's underpinnings are based on C++, and includes the C-based Java Native Interface (JNI) that encapsulates interaction with the C-based Perl API. Listing 1 is the abstract C++ class for the embedded interpreter interface EPerl supports. This interface is implemented by the CPerlInterpreter class, which is ultimately responsible for invoking various functions in the C-based Perl API.

The composition of this interface specifies methods for calling Perl subroutines with different arrangements of type-specific arguments, as well as evaluating Perl statement blocks, which are more open-ended and thus more flexible. The getters and setters related to scalar access are specified here as well. Array and hash access is also supported, but currently only for strings. Finally, support for output redirection of the Perl interpreter is also specified in the base interpreter interface.

It's worth nothing that I have used namespaces in the C++ implementation of EPerl. More precisely, the EPerl classes reside in the eperl namespace. In addition to this namespace, several JNI utility functions reside in the jni namespace. The JNI-based function definitions, and the functional prototypes generated by the Java SDK javah tool, reside in the default global namespace.

For a better understanding of the flow between Java and C++, Listing 2 shows the JNI definition that is the entry point to the native implementation of the Java eval method. This exported JNI function resides in the Windows DLL (see Figure 1). This C function takes a number of arguments, some of which are the normal overhead included in JNI. The primary arguments include the interpreter ID, which is used to retrieve the C++ interpreter instance, and the Perl statement block.

After the runtime checks included in the if expression are evaluated, the contents of the Java string are copied to a C++ string. Next, the interpreter associated with the specified ID is retrieved and the Perl statement block passed in is executed by invoking the C++ Eval method.

As Listing 3 shows, executing the C++ Eval method on an instance of ActivePerlInterpreter is the point at which the C-based Perl API is invoked. In the case of eval, it involves executing the perl_eval_pv function. The arguments passed to this routine include the Perl statement block, cast to a const char*, and the Perl flag G_KEEPERR. The flag G_KEEPERR indicates that the Perl interpreter should retain any error so that it can be retrieved by accessing the ERRSV interpreter variable.

Perl for Output Redirection

Although the bulk of the implementation of EPerl is implemented in C++ and Java, there are some supporting Perl routines contained in the eperl.pm file. These subroutines help support output redirection, which allows for redirection of standard output and standard error to a scalar variable. Output redirection is enabled by invoking the Java method redirectOutput on the Perl interpreter. This operation is reduced to invoking the redirect_output Perl subroutine that manipulates the output streams and uses I/O scalars (perlio:scalar) to redirect future output to a scalar. Finally, subroutines also defined in eperl.pm clear and restore the output streams.

Conclusion

Future development of EPerl will be partially guided by community feedback. Ideas on the drawing table include an ATL implementation with supporting .NET classes, as well as a mechanism for invoking Java methods from Perl. Mixed-language development is a powerful concept that was recently best summarized by Robert Pike, an early developer of UNIX at Bell Labs and now with Google, when he said "OO is great for problems where an interface applies naturally to a wide range of types, not so good for managing polymorphism, and remarkably ill-suited for network computing. That's why I reserve the right to match the language to the problem, and even—often—to coordinate software written in several languages towards solving a single problem." As programming languages, C++, Java, and Perl all have their strengths and weaknesses. My goal with EPerl is to provide a new alternative for blending these languages together.