Tips for allocating Job Resources
Cherry-creek currently two types of compute nodes:
cp | 48 "penguin" cherry-creek 2 (Penguin Computing Relion Servers) nodes (2 Intel Xeon E5-2640v3 8 core, 128Gb ram) |
ci | 24 "waterfall" cherry-creek 2 (Intel Servers) nodes (2 Intel Xeon E5-2697v2 12core, 192Gb ram) Please do not specifically request "ci" nodes unless your jobs need the additional memory. |
The "ci" nodes are typically very busy with long running jobs. If you request that your jobs run on them, they may wait a significant amount of time before the "ci" modes will become available.
Example resource specifications
The following are commonly used resource allocation parameters:
walltime=# | this is the TOTAL wall time that the job will be allowed to run (format: HH:MM:SS (hours, minutes, seconds)). There are limits placed on different job queues. The normal queue (workq) is limited to 744 hours (approx. 30 days), and the test queue (small) is limited to 15 minutes. |
select=# | this tells PBS/Pro that the following settings are to be applied to this number of separate nodes (known as a "chunk"). |
ncpus=# | specifies the number of # cpu cores to be assigned to this chunk. |
mpiprocs=# | specifies the number of # MPI processes assigned to this chunk. (Normally this is the same as the ncpus value). |
mem=# | specifies the amount of memory (in mb, gb, etc) to be assigned to this chunk. |
cput=# | specifies the amount of cpu time for the chunk (format: HH:MM:SS (hours, minutes, seconds)). A reasonable value for cpu time is the number of mpiprocs*walltime. |
Qlist=string | Specifies the type of node to run on cp,ci are the current choices. |
For example, your MPI job requires 2 chunks of four cpu cores. It also requires 30Gb of memory and you want it to run on a ci node. The resource request would look like:
#PBS -q workq
#PBS -l walltime=24:00:00
#PBS -l select=2:ncpus=4:mpiprocs=4:mem=30gb:cput=96:00:00:Qlist=ci
Also, the system may use as much as 4Gb of ram for the operating system on each node, so do not allocate more than 124Gb on the cp nodes, or 188Gb for the ci nodes. If you instruct PBS that you need more memory than is available the job will sit in a queue until a node with sufficient memory is available (which may never happen).