begin
process mother.conf
start nannies /* one per node in the client system */
perform capability discovery
download work unit from assignment server
while the work unit is not complete {
prepare work unit
start children and assign to nannies
distribute work unit to children
tell children to start scientific core
while children are running {
process checkpoint messages, transfer files
process error messages, re-start from previous checkpoint
if number of re-tries exceeds configuration variable stop
}
}
send results or error to assignment server
release children
release nannies
clean up local environment
Figure 2: Pseudocode for the mother's core algorithm.
Back to Article