PBQ: The Python Batch Queue

PBQ is a simple batch queue system, with the goal of completing a list of jobs on a bunch of machines with a shared file system without interfering with interactive users and/or more important batch jobs.

You need a working python installation (only tested with 1.5.1 upwards, should work with earlier python versions), but you do not have to be root to run it, you do not need any other special privileges, nor do you need excessive amounts of time for installation.

Disclaimer: There are tons of packages, which do more, do the same thing better, are more stable (I would presume), and even come with a fancy logo. PBQ is nothing like that.

PBQ is distributed under the GNU Public License, btw. It does what I wanted it to do: distributing of a large number (25.000) of small jobs on a bunch of machines with a shared file system for a total of over 1.000 Cpu-days (on a UltraSparc).

Installation

Download PBQ.tar.gz containing PBQSingle.py, PBQ.py and JobFileAppend, sample files and the GPL. Put everything in a directory of your choice, edit the "#!"-path to reflect the location of your python installation if you don't have /usr/bin/env. Done. That is, done if you are running Solaris, Irix or AIX. You may need to change bits and pieces to make it work on your box.

Specifying Machines

You need to specify the machines you want to use in a so-called hostfile. A sample hostfile looks like this:

    # Machine name  arch    max     wday    interactive 
    octopussy       sol2    8       8       0
    twister         sol2    2       2       0
    sisyphos        sol2    4       4       0
    sparcy13        sol2    1       0       0
    sparcy16        sol2    1       0       0

Here max is the maxload you want to ever see on the machine (should be the number of CPUs you have), wday the load you want to see during a workday (specified in PBQSingle.py in class Machine, method MaxLoad; currently M-F 8am- 8pm), interactive specifies how many CPUs you want to surrender to interactive users.

The arch is one of the following: sol2 for Solaris 2.x, aix4 for Aix 4.x and sgi for Silicon Graphics machines.

Specifying Jobs

You need to specify the jobs you want to run in a so-called jobfile. A sample jobfile looks like this:

    mcpdDriver.sh fp10_fn15.dsg fp10_fn15-14 0.1 0.15 
    mcpdDriver.sh fp10_fn15.dsg fp10_fn15-15 0.1 0.15 
    mcpdDriver.sh fp10_fn15.dsg fp10_fn15-16 0.1 0.15  
    mcpdDriver.sh fp10_fn15.dsg fp10_fn15-17 0.1 0.15 
    mcpdDriver.sh fp10_fn15.dsg fp10_fn15-18 0.1 0.15  
    mcpdDriver.sh fp10_fn15.dsg fp10_fn15-19 0.1 0.15

Each line contains exactly one command to run with all required parameters. NOTE: i/o redirection is NOT supported. If your executable writes to stdout, use a shell wrapper to handle this.

Also, the executable or the wrapper should return exit values. A non-zero exit value is supposed to indicate an error condition. If you always return zero, PBQ cannot detect whether an error occurred.

Starting Up

Once you have a hostfile and a jobfile you can simply start

    PBQSingle.py hostfile jobfile

on each of the machines. This will start the scheduler on one machine. For starting the scheduler on different machines than the one you are on, use rsh (I will provide a nicer way of doing that later on).

Did it work?

If starting PBQSingle.py was successful, you will find new files in the directory you were in:

Filename	Contents
`jobfile.[hostname].pid`	contains the pid of the scheduler
`jobfile.[hostname].log`	contains log messages.
`jobfile.[hostname].running`	running jobs
`jobfile.[hostname].stopped`	stopped jobs
`jobfile.[hostname].error`	jobs which exited abnormally

The logfile will give you pretty detailed information about jobs started, completed and aborted. The other files simply contain jobs - a single line from the jobfile, so you can simply catenate all .error-files and use that to re-run aborted jobs.

Appending jobs

Since several schedulers will read and write from/to the jobfile, all accesses are done with blocking file locking. To append all jobs in a file moreJobs to the jobfile use

    JobFileAppend jobfile moreJobs

Controlling Execution

PBQ.py allows you to control all running schedulers simultaneously. The synopsis is:

    PBQ.py hostfile jobfile command [host1 ... ]

If no hosts are specified, the command will be send to all schedulers (as defined by existence of jobfile.[host].pid). Commands are

Command	Synopsis
`stop`	stop all schedulers and all jobs. This is done by sending signal `USR1`.
`cont`	continue all schedulers and all jobs
`quit`	let running jobs finish and then quit PBQ-Managers. Do not start new jobs. This is done by sending signal `USR2`.
`term`	terminate all schedulers and all jobs immediately. Need to wait until next wake-up (typically 15 secs). Terminated jobs are not recorded seperatly, but `jobfile.[host].running` and `jobfile.[host].stopped` reflect jobs on the machine when termination occurred
`kill`	kill all schedulers. Does not kill jobs
`load`	display load of machines
`check`	checks schedulers on all machines

Recovering from errors

If systems are rebooted, or jobs and/or schedulers are killed, you can most likely recover without big problems.

Reboots: If machines went down, you can simply remove the .pid files, manually concatenate the .running and .stoppep files (since the jobs listed in there were on the machine at time of reboot and hence are not completed) of all hosts that were down and append everything to the job file. Start the schedulers and you are up and running again. You might want to use

    PBQ.py hostfile jobfile check

to find out which schedulers survived the crash.

Schedulers killed: This kind of error you can only notice by looking at the time stamps in the log-file getting old. If you suspect a scheduler has died, proceed as above.

Jobs killed: This kind of error you notice by looking at the .error files. If the errors are not due to a bug in your code and you assume that the jobs should have completed, you can take the jobs listed in the .error files and append them to the jobfile.

Sample session

I wrote a little shell script to start PBQSingle.py hostfile jobfile on all machines.

    nohup PBQStart

The nohup attaches the stdout and stderr of all started PBQSingle.py instances etc. to nohup.out, so you can use that information to track errors.

By looking at the .pid, .log and .running file you can immediately see if everything is working. The last line of the log-file will tell you the residual load (== max load - current load) and the number of running and stopped jobs.

If you suspect that something is going wrong (or users are complaining) you can stop everything with

     PBQ.py hostfile jobfile stop

Note, that kill -STOP for the PBQSingle.py won't cut it, since it would not stop jobs started from PBQSingle.py.

In case you need to terminate everything (say for system maintenance, reboots etc.) you have two options. If you have enough time to wait for running jobs to complete use

  PBQ.py hostfile jobfile quit

The time it needs depends on the jobs you have submitted. I usually try to break up problems into 30 minute jobs. If you need everything stopped right away use

  PBQ.py hostfile jobfile term

and wait for about a minute (wide safety margin: PBQSingle.py wakes up every 15 secs or so, and needs just one wake-up to terminate) and make sure all .pid files are gone (you can also signal just specific machines: see above).

If you want to restart everything catenate all .error, .running, and .stopped files into, say, moreJobs and append that to the jobfile with

    JobFileAppend jobfile moreJobs

If you have no more schedulers running you can use

    cat *.error, *.running, *.stopped >> jobfile

directly. Start again.

Scheduling Policy

The maximal load allowed on a machine depends on the value of

max (typically number of CPUs) as specified in hostfile
wday as specified in hostfile, during working hours on work days (M-F 8am - 8pm).
interactive as specified in hostfile
Number of interactive users

If the maximal load (as defined by the moving average over the shortest interval given by uptime, and depending on the factors above) is exceeded by more than 0.5, the scheduler will stop as many jobs as necessary to reduce load to below the maximal load.

If the load is below the maximal load, stopped jobs will be continued and/or new jobs will be started.

Interactive jobs and even background jobs running on the same nice level (jobs started by the scheduler run on nice 19) will always have priority over jobs started by the scheduler.

Issues

Huge logfiles: Log files are created with a rate of about 300 KByte/day per machine. If you run out of space, just delete them. PBQSingle.py will create new files with the same name. If you are paranoid, you might want to stop all schedulers before deleting the files, and cont everything afterwards.

Bugs etc.: None known at this time

Missing features:

Not totally configurable without hacking source
No support for executables for different architectures
No automatic recovery

For further information contact Alexander Schliep (alexander@schlieplab.org).

Team

Members: Alexander Schliep.

Publications

Pipenbacher et al.. ProClust: improved clustering of protein sequences with an extended graph-based approach. Bioinformatics 2002, 18 Suppl 2, S182–S191.

Bolten et al.. Clustering protein sequences-structure prediction by transitive homology. Bioinformatics 2001, 17:10, 935–41.

Pipenbacher. Evaluation and extension of a graph based clustering approach for the detection of remote homologs. Master's Thesis, University of Cologne, 2001.

Bolten. A graph based clustering method for the detection of remote homolog protein sequences. Master's Thesis, University of Cologne, 2000. In German..