PBQ: The Python Batch Queue
PBQ is a simple batch queue system, with the goal of completing a list of jobs on a bunch of machines with a shared file system without interfering with interactive users and/or more important batch jobs.
You need a working python installation (only tested with 1.5.1 upwards, should work with earlier python versions), but you do not have to be root to run it, you do not need any other special privileges, nor do you need excessive amounts of time for installation.
Disclaimer: There are tons of packages, which do more, do the same thing better, are more stable (I would presume), and even come with a fancy logo. PBQ is nothing like that.
PBQ is distributed under the GNU Public License, btw. It does what I wanted it to do: distributing of a large number (25.000) of small jobs on a bunch of machines with a shared file system for a total of over 1.000 Cpu-days (on a UltraSparc).
Download PBQ.tar.gz containing
JobFileAppend, sample files and the GPL. Put everything in a directory of your choice, edit the "#!"-path to reflect the location of your python installation if you don't have
/usr/bin/env. Done. That is, done if you are running Solaris, Irix or AIX. You may need to change bits and pieces to make it work on your box.
You need to specify the machines you want to use in a so-called hostfile. A sample hostfile looks like this:
# Machine name arch max wday interactive octopussy sol2 8 8 0 twister sol2 2 2 0 sisyphos sol2 4 4 0 sparcy13 sol2 1 0 0 sparcy16 sol2 1 0 0
max is the maxload you want to ever see on the machine (should be the number of CPUs you have),
wday the load you want to see during a workday (specified in
class Machine, method
MaxLoad; currently M-F 8am- 8pm),
interactive specifies how many CPUs you want to surrender to interactive users.
The arch is one of the following:
sol2 for Solaris 2.x,
aix4 for Aix 4.x and
sgi for Silicon Graphics machines.
You need to specify the jobs you want to run in a so-called jobfile. A sample jobfile looks like this:
mcpdDriver.sh fp10_fn15.dsg fp10_fn15-14 0.1 0.15 mcpdDriver.sh fp10_fn15.dsg fp10_fn15-15 0.1 0.15 mcpdDriver.sh fp10_fn15.dsg fp10_fn15-16 0.1 0.15 mcpdDriver.sh fp10_fn15.dsg fp10_fn15-17 0.1 0.15 mcpdDriver.sh fp10_fn15.dsg fp10_fn15-18 0.1 0.15 mcpdDriver.sh fp10_fn15.dsg fp10_fn15-19 0.1 0.15
Each line contains exactly one command to run with all required parameters. NOTE: i/o redirection is NOT supported. If your executable writes to stdout, use a shell wrapper to handle this.
Also, the executable or the wrapper should return exit values. A non-zero exit value is supposed to indicate an error condition. If you always return zero, PBQ cannot detect whether an error occurred.
Once you have a hostfile and a jobfile you can simply start
PBQSingle.py hostfile jobfile
on each of the machines. This will start the scheduler on one machine. For starting the scheduler on different machines than the one you are on, use
rsh (I will provide a nicer way of doing that later on).
Did it work?
PBQSingle.py was successful, you will find new files in the directory you were in:
| ||contains the pid of the scheduler|
| ||contains log messages.|
| ||running jobs|
| ||stopped jobs|
| ||jobs which exited abnormally|
The logfile will give you pretty detailed information about jobs started, completed and aborted. The other files simply contain jobs - a single line from the jobfile, so you can simply catenate all
.error-files and use that to re-run aborted jobs.
Since several schedulers will read and write from/to the jobfile, all accesses are done with blocking file locking. To append all jobs in a file moreJobs to the jobfile use
JobFileAppend jobfile moreJobs
PBQ.py allows you to control all running schedulers simultaneously. The synopsis is:
PBQ.py hostfile jobfile command [host1 ... ]
If no hosts are specified, the command will be send to all schedulers (as defined by existence of
jobfile.[host].pid). Commands are
| ||stop all schedulers and all jobs. This is done by
sending signal |
| ||continue all schedulers and all jobs|
| ||let running jobs finish and then quit PBQ-Managers.
Do not start new jobs. This is done by sending
| ||terminate all schedulers and all jobs immediately.
Need to wait until next wake-up (typically 15 secs).
Terminated jobs are not recorded seperatly, but
| ||kill all schedulers. Does not kill jobs|
| ||display load of machines|
| ||checks schedulers on all machines|
Recovering from errors
If systems are rebooted, or jobs and/or schedulers are killed, you can most likely recover without big problems.
Reboots: If machines went down, you can simply remove the
.pid files, manually concatenate the
.stoppep files (since the jobs listed in there were on the machine at time of reboot and hence are not completed) of all hosts that were down and append everything to the job file. Start the schedulers and you are up and running again. You might want to use
PBQ.py hostfile jobfile check
to find out which schedulers survived the crash.
Schedulers killed: This kind of error you can only notice by looking at the time stamps in the log-file getting old. If you suspect a scheduler has died, proceed as above.
Jobs killed: This kind of error you notice by looking at the
.error files. If the errors are not due to a bug in your code and you assume that the jobs should have completed, you can take the jobs listed in the
.error files and append them to the jobfile.
I wrote a little shell script to start
PBQSingle.py hostfile jobfile on all machines.
The nohup attaches the
stderr of all started
PBQSingle.py instances etc. to
nohup.out, so you can use that information to track errors.
By looking at the
.running file you can immediately see if everything is working. The last line of the log-file will tell you the residual load (
== max load - current load) and the number of running and stopped jobs.
If you suspect that something is going wrong (or users are complaining) you can stop everything with
PBQ.py hostfile jobfile stop
kill -STOP for the
PBQSingle.py won't cut it, since it would not stop jobs started from
In case you need to terminate everything (say for system maintenance, reboots etc.) you have two options. If you have enough time to wait for running jobs to complete use
PBQ.py hostfile jobfile quit
The time it needs depends on the jobs you have submitted. I usually try to break up problems into 30 minute jobs. If you need everything stopped right away use
PBQ.py hostfile jobfile term
and wait for about a minute (wide safety margin:
PBQSingle.py wakes up every 15 secs or so, and needs just one wake-up to terminate) and make sure all
.pid files are gone (you can also signal just specific machines: see above).
If you want to restart everything catenate all
.stopped files into, say,
moreJobs and append that to the jobfile with
JobFileAppend jobfile moreJobs
If you have no more schedulers running you can use
cat *.error, *.running, *.stopped >> jobfile
directly. Start again.
The maximal load allowed on a machine depends on the value of
max(typically number of CPUs) as specified in hostfile
wdayas specified in hostfile, during working hours on work days (M-F 8am - 8pm).
interactiveas specified in hostfile
- Number of interactive users
If the maximal load (as defined by the moving average over the shortest interval given by uptime, and depending on the factors above) is exceeded by more than 0.5, the scheduler will stop as many jobs as necessary to reduce load to below the maximal load.
If the load is below the maximal load, stopped jobs will be continued and/or new jobs will be started.
Interactive jobs and even background jobs running on the same nice level (jobs started by the scheduler run on
nice 19) will always have priority over jobs started by the scheduler.
Huge logfiles: Log files are created with a rate of about 300 KByte/day per machine. If you run out of space, just delete them.
PBQSingle.py will create new files with the same name. If you are paranoid, you might want to stop all schedulers before deleting the files, and cont everything afterwards.
Bugs etc.: None known at this time
- Not totally configurable without hacking source
- No support for executables for different architectures
- No automatic recovery
Members: Alexander Schliep.
PublicationsBolten et al.. Clustering protein sequences-structure prediction by transitive homology. Bioinformatics 2001, 17:10, 935–41. Pipenbacher. Evaluation and extension of a graph based clustering approach for the detection of remote homologs. Master's Thesis, University of Cologne, 2001. Bolten. A graph based clustering method for the detection of remote homolog protein sequences. Master's Thesis, University of Cologne, 2000. In German..