View topic | Edit | WYSIWYGAttachPrintable
r43 - 11 Nov 2015 - 11:47:54 - DavidAcremanYou are here: Astrophysics Wiki >  Zen Web  > RunningJobs

Running Jobs

Work is submitted to Zen by putting it into a queuing system which allocates compute nodes to run your job. The qstat command is used to show what is in the queues (to find out more about qstat type man qstat). To delete a job from the queue use the qdel command. To see how many nodes are free you can use the command allfree.

The job file

To submit a job to the queue you will need to set up a job file which describes how the job should be run. Example job files can be found in /usr/local/examples/qsub_script on the log in node. To submit the job file to the queue use the command

qsub jobfile

Jobs should be set up so that the output is written to /data (see FileSystems for more information on how to use Zen's disk space). You will need to include a line in the job file which changes into the correct working directory, as the job file will run in your home directory by default. The environment variable PBS_O_WORKDIR is set to the directory from which the job was submitted, so to run a job in the directory from which it was submitted include the command cd ${PBS_O_WORKDIR} in the job file.

If you are running serial jobs see MultipleSerialJobs for the best way to run several serial jobs on a node.

When running a number of similar jobs it may be useful to set them up as a Job array.

Wall time

Your job file should specify how long your job needs to run for. For example if you know that your job will complete in under 48 hours include the line

#PBS -l walltime=48:00:00

If a job reaches the specified wall time it will be terminated, so make sure you request enough time when you submit the job. However jobs which request shorter wall times are more likely to be scheduled to run, so make sure your wall time is realistic (i.e. don't request a month if you need a day.) If you do not specify a wall time in the job file then the job will be given a default wall time of one day. The maximum wall time is 400 hours. The wall time is the time measured by a clock on the wall, rather than any other measure of time (e.g. CPU time).

Queues

The job will be routed to an appropriate queue depending on the wall time specified, as shown in the table below.

Run time Queue name Maximum wall timedown Maximum number of nodes
short all 24 hours 161
long std 400 hours 127
medium mpiexpress 72 hours 159

Receiving emails

You can set up the job file so that an email is sent when the job starts, aborts or ends. Use the -m option to control when emails are sent and use -M to set your email address.

Interactive sessions

An interactive session on a compute note can be set up using qsub -I, for example to run a 10 hour interactive session on one node

qsub -I -l nodes=1:ppn=12 -l walltime=10:00:00

MPI

If your job uses MPI you will need to load the appropriate MPI module to ensure that the necessary commands are available (e.g. mpirun). See CompilingAndLinking for more information on loading modules.

If you need to specify how many MPI processes should run on a node then you can use the -perhost argument with the mpiexec command e.g. to run using 4 MPI processes per node

mpiexec -perhost 4 -genv I_MPI_DEVICE rdma:OpenIB-cma -np $NUMPROCS $MYBIN

If you want to specify the order in which MPI processes are assigned to nodes this can be achieved using the -machinefile option to the mpiexec command e.g.

mpiexec -machinefile mpd_hosts.txt ...

will start MPI processes on nodes in the order specified in the file mpd_hosts.txt. For more information see the Intel MPI User Guide which can be found at /sw/sdev/impi/4.0.2/doc/Reference_Manual.pdf

Scheduling

Zen uses the Maui scheduler to determine which jobs to run, when to run them and which nodes to use. Maui uses advance reservations to schedule the highest priority jobs for a future time and can backfill with shorter, lower node count jobs where possible (e.g. while waiting for 8 nodes to become free in order to run a high priority job it may be possible to run a short, single node job).

Some useful Maui commands can be found in /usr/local/maui/bin e.g. showbf shows the available backfill window, showres shows current reservations and showq shows the queue as seen by Maui (rather than qstat which shows the queue as seen by Torque).

-- DavidAcreman - 11 Jun 2008

View topic | Edit |  | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r47 |r45 < r44 < r43 < r42 | More topic actions...

key Log In or Register
Information

Main Web Users Groups Index Search Changes Notifications Statistics Preferences


Webs Main Sandbox TWiki Zen Information

Main Web Users Groups Index Search Changes Notifications Statistics Preferences


Webs Main Sandbox TWiki Zen


 
Astrophysics Wiki


Edit Wysiwyg Attach Printable
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Astrophysics Wiki? Send feedback