Running Jobs
Work is submitted to Zen by putting it into a queuing system which allocates compute nodes to run your job. The
qstat
command is used to show what is in the queues (to find out more about
qstat
type
man qstat
). To delete a job from the queue use the
qdel
command.
To see how many nodes are free you can use the command
allfree
.
The job file
To submit a job to the queue you will need to set up a job file which describes how the job should be run.
Example job files can be found in
/usr/local/examples/qsub_script
on the log in node. To submit the job file to the queue use the command
qsub jobfile
Jobs should be set up so that the output is written to
/data
(see
FileSystems for more information on how to use Zen's disk space). You will need to include a line in the job file which changes into the correct working directory, as the job file will run in your home directory by default. The environment variable PBS_O_WORKDIR is set to the directory from which the job was submitted, so to run a job in the directory from which it was submitted include the command
cd ${PBS_O_WORKDIR}
in the job file.
If you are running serial jobs see
MultipleSerialJobs for the best way to run several serial jobs on a node.
When running a number of similar jobs it may be useful to set them up as a
Job array.
Wall time
Your job file should specify how long your job needs to run for. For example if you know that your job will complete in under 48 hours include the line
#PBS -l walltime=48:00:00
If a job reaches the specified wall time it will be terminated, so make sure you request enough time when you submit the job. However jobs which request shorter wall times are more likely to be scheduled to run, so make sure your wall time is realistic (i.e. don't request a month if you need a day.) If you do not specify a wall time in the job file then the job will be given a default wall time of one day. The maximum wall time is
400 168 hours. The wall time is the time measured by a clock on the wall, rather than any other measure of time (e.g. CPU time).
Queues
The job will be routed to an appropriate queue depending on the wall time specified, as shown in the table below.
The number of nodes available varies according to how long the job needs to run for because some nodes are reserved for running shorter jobs.
Receiving emails
You can set up the job file so that an email is sent when the job starts, aborts or ends. Use the
-m
option to control when emails are sent and use
-M
to set your email address.
Interactive sessions
An interactive session on a compute note can be set up using
qsub -I
, for example to run a 10 hour interactive session on one node
qsub -I -l nodes=1:ppn=12 -l walltime=10:00:00
MPI
If your job uses MPI you will need to load the appropriate MPI module to ensure that the necessary commands (e.g.
mpirun
,
mpif90
) are available. The most recent versions of Intel MPI and the Intel compilers can be made available by running
module load ics2013sp1
. See
CompilingAndLinking for more information on loading modules. For more information on Intel MPI see the Getting Started Guide
/opt/intel/impi/4.1.1.036/doc/Getting_Started.pdf
and the Reference Manual (
/opt/intel/impi/4.1.1.036/doc/Reference_Manual.pdf
)
If you need to specify how many MPI processes should run on a node then you can use the
-perhost
argument with the
mpiexec
command e.g. to run using 4 MPI processes per node
mpiexec -perhost 4 -genv I_MPI_DEVICE rdma:OpenIB-cma -np $NUMPROCS $MYBIN
The queuing system allocates nodes to run your job and the MPI library uses this information so that it knows which nodes to use. A list of nodes allocated to your job can be found in the file listed in the
$PBS_NODEFILE
environment variable (e.g. adding the line
cat $PBS_NODEFILE
to your job script lists the nodes which will be used. The order in which MPI processes are assigned to nodes can be specified using the
-machinefile
option to the
mpiexec command
. For example
mpiexec -machinefile mpd_hosts.txt
...
will start MPI processes on nodes in the order specified in the file
mpd_hosts.txt
.
Scheduling
Zen uses the
Maui scheduler to determine which jobs to run, when to run them and which nodes to use. Maui uses advance reservations to schedule the highest priority jobs for a future time and can backfill with shorter, lower node count jobs where possible (e.g. while waiting for 8 nodes to become free in order to run a high priority job it may be possible to run a short, single node job). Running the
allfree
command on the log in node shows the current number of available nodes and the backfill window.
--
DavidAcreman - 11 Jun 2008