Update 2.1 Batch job submission authored by Odunlami Marc's avatar Odunlami Marc
...@@ -7,7 +7,7 @@ The [Slurm](https://slurm.schedmd.com) scheduler manages the resource allocation ...@@ -7,7 +7,7 @@ The [Slurm](https://slurm.schedmd.com) scheduler manages the resource allocation
* The cluster nodes are put in **partitions**, depending on their hardware characteristics and the policy defined to used them. * The cluster nodes are put in **partitions**, depending on their hardware characteristics and the policy defined to used them.
* Partitions are accessible by **accounts**, representing group of users granted with some permissions. * Partitions are accessible by **accounts**, representing group of users granted with some permissions.
A resource allocation must specify a partition, an account, and other parameters such as the maximal execution time, the number of cores and the maximal used memory of the job. These Slurm parameters are written in **job scripts** and submitted to Slurm with the following command: A resource allocation must specify a partition, an account, and other parameters such as the maximal execution time, the number of cores and the maximal used memory of the job. These Slurm parameters are written in **job scripts** and submitted to Slurm with the `sbatch` command:
``` ```
sbatch script.sh sbatch script.sh
``` ```
...@@ -67,12 +67,12 @@ squeue -u username ...@@ -67,12 +67,12 @@ squeue -u username
``` ```
## Job and cluster monitoring ## Job and cluster monitoring
In the output of the *squeue* command, the "ST" column provides the state of the job. The most common states are: In the output of the `squeue` command, the "ST" column provides the state of the job. The most common states are:
* **R**: running. * **R**: running.
* **PD**: pending. The job is awaiting or resources. * **PD**: pending. The job is awaiting or resources.
* **S**: suspended. This typically happens when the job is preempted by another job. In this case, no action is required. Slurm will resume the job when the preemptor job ends. * **S**: suspended. This typically happens when the job is preempted by another job. In this case, no action is required. Slurm will resume the job when the preemptor job ends.
The title of the last column displayed by *squeue* is "NODELIST(REASON)": The title of the last column displayed by `squeue` is "NODELIST(REASON)":
* For running jobs, displays the list of allocated nodes. * For running jobs, displays the list of allocated nodes.
* For pending jobs, displays the pending reason: * For pending jobs, displays the pending reason:
* **Resources**: the resources requested by the job are not currently available since used by other jobs. * **Resources**: the resources requested by the job are not currently available since used by other jobs.
...@@ -95,7 +95,7 @@ The `sinfo` command displays the current state of compute nodes: ...@@ -95,7 +95,7 @@ The `sinfo` command displays the current state of compute nodes:
## Accounting ## Accounting
Slurm is connected to a database recording job acccounting data. The `sacct` and `sreport` commands allow to access this accounting information. Slurm is connected to a database recording job acccounting data. The `sacct` and `sreport` commands allow to access this accounting information.
Show information on the job *job_id*: Show information on the *job_id* job:
``` ```
# Short format # Short format
sacct -j job_id sacct -j job_id
...@@ -103,17 +103,17 @@ sacct -j job_id ...@@ -103,17 +103,17 @@ sacct -j job_id
sacct -l -j job_id sacct -l -j job_id
``` ```
Display jobs starting and ending between January 1, 2019 and January 1, 2020 ont the bigmem01 node Display jobs starting and ending between January 1, 2019 and January 1, 2020 on the bigmem01 node:
``` ```
sacct --nodelist=bigmem01 --starttime=2019-01-01 --endtime=2020-01-01 sacct --nodelist=bigmem01 --starttime=2019-01-01 --endtime=2020-01-01
``` ```
Display the number of hours computed by a user *username* between June 1, 2019 and January 1, 2020 on each account. Display the number of hours computed by *username* between June 1, 2019 and January 1, 2020 on each account:
``` ```
sreport -t hours user TopUsage Start=2019-06-01 End=2020-01-01 Users=username sreport -t hours user TopUsage Start=2019-06-01 End=2020-01-01 Users=username
``` ```
Display the global number of hours computed by a user *username* between June 1, 2019 and January 1, 2020. Display the global number of hours computed by *username* between June 1, 2019 and January 1, 2020:
``` ```
sreport -t hours user TopUsage Group Start=2019-06-01 End=2020-01-01 Users=username sreport -t hours user TopUsage Group Start=2019-06-01 End=2020-01-01 Users=username
``` ```