begin
process mother.conf
start nannies /* one per node in the client system */
perform capability discovery
download work unit from assignment server

while the work unit is not complete {
   prepare work unit
   start children and assign to nannies
   distribute work unit to children
   tell children to start scientific core
   
   while children are running {
      process checkpoint messages, transfer files
      process error messages, re-start from previous checkpoint
      if number of re-tries exceeds configuration variable stop
   }
}

send results or error to assignment server
release children
release nannies
clean up local environment

Figure 2: Pseudocode for the mother's core algorithm.

Back to Article