Batch system (Slurm) overview and important notes
The SLURM scheduling system is used to manage jobs on all Cyfronet's supercomputers.
Before running any jobs, please familiarize yourself with the grant system: PLGrid grants
This page includes a short guide on how to set job parameters to achieve good computation efficiency and optimize the usage of resources, which should result in shorter queue times. This guide aims to be as generic as possible and doesn't cover all possible cases, if your application has specific requirements feel free to experiment beyond the suggestions included in this guide. If in doubt, feel free to contact the PLGrid helpdesk or consult Slurm documentation for an in-depth explanation of topics and options discussed on this page:
Motivation
Choosing a good, close to optimal, job configuration has many benefits, which include:
- Jobs with specific configurations are easier to allocate for the scheduler, thus have shorter queue times.
- More jobs are running at the same time. This leads to a shorter "time to result" if you have a set of jobs.
- Optimal usage of assigned resources. Some applications can utilize many cores, while others achieve the best results with fewer cores.
Consult the hardware
We need to know the underlying hardware to choose a proper job configuration. You can find the hardware configuration of cluster nodes in the manual for the particular cluster or ask Slurm to tell you how the nodes are configured through the scontrol show node <nodename> command. In the case of Ares, a CPU node has 48 cores and 184 GB of memory available for the user. Note that there is a specific ratio of memory for each CPU, similarly, there is a certain number of CPUs and memory for each GPU in the case of GPU nodes.
How to determine if a job has good efficiency
There are some general guidelines applicable to most cases, which include the following:
- Consult the output of
hpc-jobs-historycommand, which includes the "efficiency" column. This column is a rough estimate of used CPU time. Low values suggest that the application is not using all the cores or time is spent on things other than computation, including IO, memory allocation, etc. - Note the duration of jobs, which is a universal guideline on how fast the computations are performed.
- Test a chosen set of job configurations, and determine a performance/CPU ratio. This will allow for determining the best number of CPUs for the job.
Of course, the above guidelines are not definitive. E.g., if a job has the best performance to CPU ratio for single core jobs, which last a week, it is not an optimal configuration. If the job can scale with reasonable efficiency up to a certain point where the job takes 1 day to execute, this is the best option. As a general guideline, keeping job runtime from 1 hour to 3 days maximum is best. Too short jobs might result in significant scheduling overhead.
If your application includes a specific metric for determining performance, you're in luck! Examples of such applications include NAMD and GROMACS. In such cases, we have a clear indication of performance right from the start of calculations and determining the optimal job configuration boils down to testing a few possibilities and estimating the cost of running a job versus performed computation steps.
Optimize the queue time
You can ask the scheduler to provide the estimated start time of your job. This can be done by issuing the sbatch --test-only script.sh command. This command doesn't submit the job, but it returns a pessimistic estimate of when the job will be started, keep in mind that it is just an estimate, and in most cases, the queue time will be shorter.
To shorten the queue times, we can apply the following methods:
- If you make a job easier to schedule, it might be started faster. This includes:
- Applying suggestions from previous points and requesting fractions of nodes or full nodes.
- Setting a realistic estimate of job runtime. This way, the scheduler might squeeze your job into a free spot more easily.
- Choose an optimal configuration for your job. If your job doesn't benefit from larger resources, reduce the job size. Smaller jobs might take longer to execute but are usually started faster due to backfilling and available resources.
- Plan your work! Sometimes there is no way around the long queue, so one way to mitigate the wait times is to plan your work ahead of time and submit jobs in advance.
Accessing GPGPU nodes
A special partition named plgrid-gpu has been set up for jobs which require access to GPGPU hardware. In order to schedule such jobs, it is first necessary to obtain a grant specifically dedicated to GPGPU processing. This grant should not be used to run any other types of computational tasks.
When applying for your grant, please specify that you require access to plgrid-gpu. It is also recommended (although not strictly required) to put gpu in your grant name (e.g. gpucomputations) - this will help prevent potential mix-ups.
All applications for access to plgrid-gpu are evaluated on a case-by-case basis by the resource provider.