Distributed Computing with Ruby

Dr. Dobb's Journal September 2002

Using distributed Ruby to distribute task objects

By Phil Tomson

Phil is a freelance software engineer and the creator of the TaskMaster framework. He can be contacted at rubyfan@programmer.net.

Often it is necessary to parallelize a set of tasks so that instead of running on one machine serially, the tasks are spread across several machines to save time. For example, you might need to test how well a compiler's parser adheres to a particular language's specified grammar. The test suite for testing the language might contain several thousand test cases that normally take, say, eight hours to complete if all of the test cases are run serially on one machine. Of course, you would like to know if the latest changes you made to the parser broke anything, and you would like to not have to wait eight hours to get results. What if you could run the test cases on eight machines and get results in one hour, or even better, if you could run them on 16 machines and get answers in 30 minutes? I recently faced a similar problem. To solve it, I created TaskMaster, a task distribution framework using DRb, Ruby's distributed object system.

Ruby is a dynamic, object-oriented scripting language available on platforms such as Windows, Linux, and Mac OS X (see "Programming with Ruby," by Dave Thomas and Andy Hunt, DDJ, January 2001). Since Ruby's DRb (sometimes referred to as dRuby) is straightforward to use, I discovered that DRb made TaskMaster easy to implement. Ruby is available at http://www.ruby-lang.org/en/download.html and DRb at http://www.rubylang.org/en/raa-list.rhtml?name=druby+distributed+ruby and http://www2a.biglobe.ne.jp/~seki/ruby/drb-1.3.4.2.tar.gz. (DRb comes with the Windows version of Ruby; on UNIX, you may have to install it separately.) TaskMaster is available at http://www.aracnet.com/~ptkwt/ruby_stuff/ TaskMaster/ and from DDJ (see "Resource Center," page 5).

DRb lets you bind an object to a port on one machine and access that object's methods from another machine on the network — a form of Remote Procedure Call (RPC). However, DRb lets you do more than just simple RPC. Objects can be efficiently passed from one machine to another with DRb. It all works by using Ruby's built-in Marshal module to serialize objects and transfer them over the network to other computers. TaskMaster utilizes this ability to pass objects around a network to pass task objects to clients.

Listings One and Two illustrate the simplicity of using DRb. DRbServer, the DRb Server in Listing One, defines a simple class called HelloFrom, which defines the method say_hello, which returns a string composed of "hello from" and the name of the machine it's running on. A HelloFrom object is instantiated and referenced by the variable h. A DRbServer is instantiated on the next line with h bound to port 5555 of the machine running the server. Listing Two is the client. A DRbObject is instantiated and is bound to port 5555 of a machine called "Remote." (Of course, you need to have Ruby and dRuby installed on master and slave machines.)

Suppose the code from Listing One is started on the machine Remote. If Listing Two is then run on another machine that can access Remote, say a machine called "Local," then on Local you'll see the message:

hello from remote

Clearly, DRb is easy to use and there are no web servers to set up as required with XML-RPC and SOAP-based solutions. DRb forms the basis for object-oriented communications between server and clients in the TaskMaster framework.

The TaskMaster Framework

TaskMaster uses DRb to distribute user-defined runnable task objects from a master machine to several slave machines. I call these "runnable" objects because they must define a run method in the class definition. Since Ruby is dynamically typed, it doesn't matter that your runnable object is of a certain type, it only matters that the object has a run method defined in its public interface. The run method should define some task that the user wants to accomplish on the remote slave machines. For example, you could define a Testcase class and the run method of that class could be defined in such a way that it runs a test case for a compiler. Your Testcase class would now be able to construct runnable objects in the context of the TaskMaster framework. Testcase objects are created on the master machine and passed to your remote slaves. When a slave receives a runnable object, it invokes the run method on that object and the task is then run on the remote slave machine.

The classes that make up the TaskMaster framework include:

Users need to supply the following classes:

An Example TaskMaster Application

A typical TaskMaster application has four components:

Listing Three, which illustrates how you can use the TaskMaster framework, presents PiTask, an application that calculates an estimate of pi using the Monte Carlo method.

PiTask defines a run method that takes an argument iterations. For the given number of iterations, a random x- and y-coordinate is chosen (with the values of x and y between 0 and 1). A distance is then calculated from the origin (0,0). If the distance is less than or equal to 1, the random point falls within one quadrant of a circle and the hits variable is incremented. After thousands (or millions) of iterations, a value for pi can be calculated using the formula: pi_est = 4(hits/iterations). This is exactly what PiTask's run method does. After calculating the estimate, it assigns it to an instance variable (@my_pi).

The PiTask class also defines a harvest method. A task's harvest method is called from the master side when the Distributor finds that the task has completed. In this case, you simply return the value of pi that was estimated; the instance variable @my_pi is returned. Listing Four is the master script.

After setting up the list of slaves, the PiReporter class is defined. The PiReporter's report method receives the value returned by a PiTasks's harvest method. In this case, the report method just stores the estimated value of pi coming from a slave in a list. PiReporter's final_report method is called after all of the slave machines have completed their estimates; it calculates the average of the estimates and prints the results.

After setting a start time (so you can calculate how long the program takes to run) and calling require on the PiTask.rb file, the Distributor is instantiated with the slaveList. Distributor's remote_require method is then called with a list containing the string PiTask.rb — this sends the contents of that file to each of the slave machines. It essentially gives the slave machines the definition for the PiTask class so that they can receive and act upon PiTask objects, which will be sent to them.

A PiReporter is then instantiated and referenced by the reporter variable and the Distributor's reporter instance is then set (distrib.reporter = reporter) to use it.

Now comes the time to divide up the tasks between the available slave machines. You send a task for each available slave machine. You divide the total number of iterations (in this case, 1,000,000) by the number of slave machines. In this example, there are four slaves, so each one runs 250,000 iterations. After all the tasks are sent, you then wait for all of the tasks to complete (distrib.wait) and then call reporter.final_report to give the average for all machines.

In reality, if you really wanted to speed this up, you would probably write the computationally intensive part of PiTask's run method in C and create an extension for Ruby in a shared library file. You could then call the C function from this library from your run method. (TaskMaster includes an example of this.)

Listing Five is the slave script that should be running on your slave machines before starting the master.rb script in Listing Four.

This example illustrates the main points you need to be aware of when creating your own distributed applications with TaskMaster:

Conclusion

Using Ruby's DRb module to create distributed applications is relatively easy when compared to the alternatives. XML-RPC and SOAP tend to require more setup than DRb does due to their reliance on XML and HTTP. The downside to using DRb is that you won't be able to interoperate with other languages as you can with XML-RPC or SOAP (where XML acts as an intermediate translator of intent). However, if you are starting a new project from scratch and your distributed application doesn't need to interoperate with other languages, Ruby offers a compelling alternative for getting the job done quickly and easily. As an alternative, you can also use DRb to prototype your distributed application and then migrate the project to XML-RPC or SOAP using the available Ruby modules that support those protocols. For applications where you need to distribute tasks to multiple machines, TaskMaster offers a flexible off-the-shelf solution.

DDJ

Listing One

# -----------hello server ------------
require "drb/drb"
class HelloFrom
  def say_hello
    return "hello from " + Socket.gethostname
  end
end
h = HelloFrom.new
h_server = DRb::DRbServer.new("druby://0.0.0.0:5555", h)
h_server.thread.join
#-----------end of hello server-----

Back to Article

Listing Two

# ----------- client ------------
 require "drb/drb"
 primary = DRb::DRbServer.new
 ro = DRb::DRbObject.new(nil, "druby://remote:5555")
 puts ro.say_hello

Back to Article

Listing Three

######################################################
# PiTask.rb
# Estimates pi using the Monte Carlo method
# more iterations should produce more accurate results
#######################################################
require "TaskMaster"
class PiTask
   
   def run(iterations)
     hits = 0
     puts "iterations = #{iterations} iterations.type = #{iterations.type}"
     iterations.times { |i|
       #pick a random x,y
       x = rand
       y = rand
       dist = Math.sqrt(x*x + y*y)
       if dist <= 1.0
         hits += 1
       end
     }
     @my_pi = 4*(hits.to_f/iterations.to_f)
     puts "PiTask::run finished -> estimated pi = #@my_pi"
   end

   def harvest
     return @my_pi
   end

end

Back to Article

Listing Four

#####################################################
# master.rb - run on your master machine
#####################################################
require "TaskMaster"

iterations = 1000000
slaveList = ['frodo','sam','merry','pippin']
class PiReporter
  def initialize
    @estimates = []
  end

  def report(pi_est)
    @estimates << pi_est
  end

  def final_report
    sum = 0
    @estimates.each { |est|
      sum += est
    }
    print "The average pi estimate was: "
    print "#{sum/(@estimates.length)}\n"
  end
end
startTime = Time.now
filesToRequire = ["PiTask.rb"]
filesToRequire.each { |file| require file }
distrib = TaskMaster::Distributor.new(slaveList)
distrib.remote_require(filesToRequire)
reporter = PiReporter.new
distrib.reporter = reporter
(distrib.availableSlaves).each {
  distrib.send_task(PiTask.new(),(iterations/distrib.availableSlaves.length))
}
distrib.wait
reporter.final_report
endTime = Time.now
puts "Total time: #{endTime-startTime} seconds"

Back to Article

Listing Five

#######################################
# slave.rb - run on your slave machines
#######################################
require "TaskMaster"
slaveObj = TaskMaster::Slave.new()
puts "Slave #{Socket::gethostname} started"
slaveObj.wait

Back to Article