PBQ: The Python Batch Queue
PBQ is a simple batch queue system, with the goal of completing a list of jobs on a bunch of machines with a shared file system without interfering with interactive users and/or more important batch jobs.
You need a working python installation (only tested with 1.5.1 upwards, should work with earlier python versions), but you do not have to be root to run it, you do not need any other special privileges, nor do you need excessive amounts of time for installation.
Disclaimer: There are tons of packages, which do more, do the same thing better, are more stable (I would presume), and even come with a fancy logo. PBQ is nothing like that.
PBQ is distributed under the GNU Public License, btw. It does what I wanted it to do: distributing of a large number (25.000) of small jobs on a bunch of machines with a shared file system for a total of over 1.000 Cpu-days (on a UltraSparc).
Installation
Download PBQ.tar.gz containing PBQSingle.py
, PBQ.py
and JobFileAppend
, sample files and the GPL. Put everything in a directory of your choice, edit the "#!"-path to reflect the location of your python installation if you don't have /usr/bin/env
. Done. That is, done if you are running Solaris, Irix or AIX. You may need to change bits and pieces to make it work on your box.
Specifying Machines
You need to specify the machines you want to use in a so-called hostfile. A sample hostfile looks like this:
# Machine name arch max wday interactive
octopussy sol2 8 8 0
twister sol2 2 2 0
sisyphos sol2 4 4 0
sparcy13 sol2 1 0 0
sparcy16 sol2 1 0 0
Here max
is the maxload you want to ever see on the machine (should be the number of CPUs you have), wday
the load you want to see during a workday (specified in PBQSingle.py
in class Machine
, method MaxLoad
; currently M-F 8am- 8pm), interactive
specifies how many CPUs you want to surrender to interactive users.
The arch is one of the following: sol2
for Solaris 2.x, aix4
for Aix 4.x and sgi
for Silicon Graphics machines.
Specifying Jobs
You need to specify the jobs you want to run in a so-called jobfile. A sample jobfile looks like this:
mcpdDriver.sh fp10_fn15.dsg fp10_fn15-14 0.1 0.15
mcpdDriver.sh fp10_fn15.dsg fp10_fn15-15 0.1 0.15
mcpdDriver.sh fp10_fn15.dsg fp10_fn15-16 0.1 0.15
mcpdDriver.sh fp10_fn15.dsg fp10_fn15-17 0.1 0.15
mcpdDriver.sh fp10_fn15.dsg fp10_fn15-18 0.1 0.15
mcpdDriver.sh fp10_fn15.dsg fp10_fn15-19 0.1 0.15
Each line contains exactly one command to run with all required parameters. NOTE: i/o redirection is NOT supported. If your executable writes to stdout, use a shell wrapper to handle this.
Also, the executable or the wrapper should return exit values. A non-zero exit value is supposed to indicate an error condition. If you always return zero, PBQ cannot detect whether an error occurred.
Starting Up
Once you have a hostfile and a jobfile you can simply start
PBQSingle.py hostfile jobfile
on each of the machines. This will start the scheduler on one machine. For starting the scheduler on different machines than the one you are on, use rsh
(I will provide a nicer way of doing that later on).
Did it work?
If starting PBQSingle.py
was successful, you will find new files in the directory you were in:
Filename | Contents |
jobfile.[hostname].pid | contains the pid of the scheduler |
jobfile.[hostname].log | contains log messages. |
jobfile.[hostname].running | running jobs |
jobfile.[hostname].stopped | stopped jobs |
jobfile.[hostname].error | jobs which exited abnormally |
The logfile will give you pretty detailed information about jobs started, completed and aborted. The other files simply contain jobs - a single line from the jobfile, so you can simply catenate all .error
-files and use that to re-run aborted jobs.
Appending jobs
Since several schedulers will read and write from/to the jobfile, all accesses are done with blocking file locking. To append all jobs in a file moreJobs to the jobfile use
JobFileAppend jobfile moreJobs
Controlling Execution
PBQ.py
allows you to control all running schedulers simultaneously. The synopsis is:
PBQ.py hostfile jobfile command [host1 ... ]
If no hosts are specified, the command will be send to all schedulers (as defined by existence of jobfile.[host].pid
). Commands are
Command | Synopsis |
stop | stop all schedulers and all jobs. This is done by
sending signal USR1 .
|
cont | continue all schedulers and all jobs |
quit | let running jobs finish and then quit PBQ-Managers.
Do not start new jobs. This is done by sending
signal USR2 . |
term | terminate all schedulers and all jobs immediately.
Need to wait until next wake-up (typically 15 secs).
Terminated jobs are not recorded seperatly, but
jobfile.[host].running and jobfile.[host].stopped
reflect jobs on the machine when termination occurred
|
kill | kill all schedulers. Does not kill jobs |
load | display load of machines |
check | checks schedulers on all machines |
Recovering from errors
If systems are rebooted, or jobs and/or schedulers are killed, you can most likely recover without big problems.
Reboots: If machines went down, you can simply remove the .pid
files, manually concatenate the .running
and .stoppep
files (since the jobs listed in there were on the machine at time of reboot and hence are not completed) of all hosts that were down and append everything to the job file. Start the schedulers and you are up and running again. You might want to use
PBQ.py hostfile jobfile check
to find out which schedulers survived the crash.
Schedulers killed: This kind of error you can only notice by looking at the time stamps in the log-file getting old. If you suspect a scheduler has died, proceed as above.
Jobs killed: This kind of error you notice by looking at the .error
files. If the errors are not due to a bug in your code and you assume that the jobs should have completed, you can take the jobs listed in the .error
files and append them to the jobfile.
Sample session
I wrote a little shell script to start PBQSingle.py hostfile jobfile
on all machines.
nohup PBQStart
The nohup attaches the stdout
and stderr
of all started PBQSingle.py
instances etc. to nohup.out
, so you can use that information to track errors.
By looking at the .pid
, .log
and .running
file you can immediately see if everything is working. The last line of the log-file will tell you the residual load (== max load - current load
) and the number of running and stopped jobs.
If you suspect that something is going wrong (or users are complaining) you can stop everything with
PBQ.py hostfile jobfile stop
Note, that kill -STOP
for the PBQSingle.py
won't cut it, since it would not stop jobs started from PBQSingle.py
.
In case you need to terminate everything (say for system maintenance, reboots etc.) you have two options. If you have enough time to wait for running jobs to complete use
PBQ.py hostfile jobfile quit
The time it needs depends on the jobs you have submitted. I usually try to break up problems into 30 minute jobs. If you need everything stopped right away use
PBQ.py hostfile jobfile term
and wait for about a minute (wide safety margin: PBQSingle.py
wakes up every 15 secs or so, and needs just one wake-up to terminate) and make sure all .pid
files are gone (you can also signal just specific machines: see above).
If you want to restart everything catenate all .error
, .running
, and .stopped
files into, say, moreJobs
and append that to the jobfile with
JobFileAppend jobfile moreJobs
If you have no more schedulers running you can use
cat *.error, *.running, *.stopped >> jobfile
directly. Start again.
Scheduling Policy
The maximal load allowed on a machine depends on the value of
max
(typically number of CPUs) as specified in hostfilewday
as specified in hostfile, during working hours on work days (M-F 8am - 8pm).interactive
as specified in hostfile- Number of interactive users
If the maximal load (as defined by the moving average over the shortest interval given by uptime, and depending on the factors above) is exceeded by more than 0.5, the scheduler will stop as many jobs as necessary to reduce load to below the maximal load.
If the load is below the maximal load, stopped jobs will be continued and/or new jobs will be started.
Interactive jobs and even background jobs running on the same nice level (jobs started by the scheduler run on nice 19
) will always have priority over jobs started by the scheduler.
Issues
Huge logfiles: Log files are created with a rate of about 300 KByte/day per machine. If you run out of space, just delete them. PBQSingle.py
will create new files with the same name. If you are paranoid, you might want to stop all schedulers before deleting the files, and cont everything afterwards.
Bugs etc.: None known at this time
Missing features:
- Not totally configurable without hacking source
- No support for executables for different architectures
- No automatic recovery
For further information contact Alexander Schliep (alexander@schlieplab.org).
Team
Members: Alexander Schliep.
Publications
Bolten et al.. Clustering protein sequences-structure prediction by transitive homology. Bioinformatics 2001, 17:10, 935–41. Pipenbacher. Evaluation and extension of a graph based clustering approach for the detection of remote homologs. Master's Thesis, University of Cologne, 2001. Bolten. A graph based clustering method for the detection of remote homolog protein sequences. Master's Thesis, University of Cologne, 2000. In German..