qsub is used to submit jobs to the queue on the cluster. It can take several arguments.
Runningfrom the homedirectory puts the program "programname" in the cluster queue. Once there is time on the cluster, the program will be launched. The way qsub works is that when there is time, it will log on to a node in the cluster and start the program from there. Please note that when it logs on to the cluster, it will by default start in the users homedirectory. If the program to start is not in the homedirectory, it is necessary to switch to this directory. qsub will store the directory from where it was launched in an environment variable named PBS_O_WORKDIR. To start a program that is not in the homedirectory, one can make a little script. It could look like this:
#begin scriptIf my program exists in /b/jstaff, I would put this simple script in /b/jstaff/script, make the file executable and run
cd /b/jstaff/This will put the program in the queue, and run it when there is any free nodes.
The method for running jobs mentioned above will run everything in the
background. Any output the program might generate will end up in a file
named "script.o
Launching the above qsub will simply log on to one node when one is free, and leave a prompt. One can then change to the correct directory and start the program, without thinking about running on a cluster. Thus it is not necessary to generate any scripts.
To run jobs on several CPUs, and eventually several machines, qsub has to be told to. This can be done by one of the following command:
qsub -l nodes=6
Which will ask for 6 CPUs. These CPUs can be on different nodes or two on
the same node.
The following command:
will ask for 3 nodes and two processors per node (ppn), so 6 CPUs alltogether.
The first command has a greater chance of success, since there might be six CPUs free, but not necessarily on only three different nodes.
qstat is a program to show the queue for the cluster. Simply running "qstat"
will show what users has jobs running, and how long they have run. More
information can be gained by typing
"qstat -a"
which will also show if
there are any system requirements, and how many nodes a job takes.
"qstat -n"
will in addition to that also show which nodes are being used.
"qstat -s"
acts like "qstat -a", but in addition it tries to give a reason if a job
is queued, or says when a job was started.
To delete a job from the queue, qdel can be used.
qdel
will delete the job from the queue.
At the moment, there are no restrictions or limitations on the queue system. This means that it is possible to submit a job asking for all the nodes and let it run for ever. However, this will effectively block everybody else from using the cluster. Therefore we ask you to reduce the number of nodes you use, and the time you use them.
If self justice doesn't work in the long run, we will probably put limitations into the queueing system.
But remember: If you need the cluster, use it. That's why it is there!