PBS

PBS, Portable Batch System, is installed on obelix under /opt/OpenPBS/. This is a package that contains several small programs, most of which start with q... Here some of the most useful programs will be discussed.

qsub

Submitting a job

qsub is used to submit jobs to the queue on the cluster. It can take several arguments.

Running
qsub programname

from the homedirectory puts the program "programname" in the cluster queue. Once there is time on the cluster, the program will be launched. The way qsub works is that when there is time, it will log on to a node in the cluster and start the program from there. Please note that when it logs on to the cluster, it will by default start in the users homedirectory. If the program to start is not in the homedirectory, it is necessary to switch to this directory. qsub will store the directory from where it was launched in an environment variable named PBS_O_WORKDIR. To start a program that is not in the homedirectory, one can make a little script. It could look like this:

#begin script
cd $PBS_O_WORKDIR
./name_of_program
#end script

If my program exists in /b/jstaff, I would put this simple script in /b/jstaff/script, make the file executable and run

cd /b/jstaff/
qsub script

This will put the program in the queue, and run it when there is any free nodes.

Interactive jobs

The method for running jobs mentioned above will run everything in the background. Any output the program might generate will end up in a file named "script.o", and any error messages in "script.e". In some cases it can be desired to run a job interactively. This means that output from the program comes to the shell, and it is possible to give input to the program as it runs. This can be done by invoking

qsub -I

Launching the above qsub will simply log on to one node when one is free, and leave a prompt. One can then change to the correct directory and start the program, without thinking about running on a cluster. Thus it is not necessary to generate any scripts.

Multi CPU jobs

To run jobs on several CPUs, and eventually several machines, qsub has to be told to. This can be done by one of the following command:

qsub -l nodes=6

Which will ask for 6 CPUs. These CPUs can be on different nodes or two on the same node.
The following command:

qsub -l nodes=3:ppn=2

will ask for 3 nodes and two processors per node (ppn), so 6 CPUs alltogether.

The first command has a greater chance of success, since there might be six CPUs free, but not necessarily on only three different nodes.

qstat

qstat is a program to show the queue for the cluster. Simply running "qstat" will show what users has jobs running, and how long they have run. More information can be gained by typing
"qstat -a"
which will also show if there are any system requirements, and how many nodes a job takes.
"qstat -n"
will in addition to that also show which nodes are being used.
"qstat -s"
acts like "qstat -a", but in addition it tries to give a reason if a job is queued, or says when a job was started.

qdel

To delete a job from the queue, qdel can be used.
qdel
will delete the job from the queue.

Politeness

At the moment, there are no restrictions or limitations on the queue system. This means that it is possible to submit a job asking for all the nodes and let it run for ever. However, this will effectively block everybody else from using the cluster. Therefore we ask you to reduce the number of nodes you use, and the time you use them.

If self justice doesn't work in the long run, we will probably put limitations into the queueing system.

But remember: If you need the cluster, use it. That's why it is there!