... | ... | @@ -70,16 +70,16 @@ squeue -u username |
|
|
In the output of the *squeue* command, the "ST" column provides the state of the job. The most commons are:
|
|
|
* **R**: running.
|
|
|
* **PD**: pending. The job is awaiting or resources.
|
|
|
* **S**: suspended. This typically happens when the job is preempted by another job.
|
|
|
* **S**: suspended. This typically happens when the job is preempted by another job. In this case, wait until Slurm resumes the job.
|
|
|
|
|
|
The title of the last column displayed by *squeue* is "NODELIST(REASON)":
|
|
|
* For running jobs, displays the list of allocated nodes.
|
|
|
* For pending jobs, displays the pending reason:
|
|
|
* **Resources**: the resources requested by the job are not currently available since used by other jobs.
|
|
|
* **Priority**: the job priority is lower than the priority of other jobs.
|
|
|
* **QOSMaxCpuPerJobLimit**
|
|
|
* **BeginTime**: your job has been requeue by the system and is waiting to start.
|
|
|
* **
|
|
|
* **QOSMaxCpuPerJobLimit**: the maximal number of authorized allocated cores has been reached by *username*, the job is waiting for some running jobs of *username* to end.
|
|
|
* **BeginTime**: the job has been requeued by the system to fix an issue and is waiting to start again.
|
|
|
* **Held state**: the job is hold by Slurm. To unlock it, do `scontrol release job_id`.
|
|
|
|
|
|
## Error analysis
|
|
|
* QOSMaxCpuPerJobLimit
|
... | ... | |