... | ... | @@ -46,7 +46,7 @@ Here are some self-explanatory job script examples for several softwares/applica |
|
|
### Special partitions: free, bigmem, gpu
|
|
|
* Every user is member of the free and bigmem accounts. These accounts are needed to access the free and bigmem partitions, respectively.
|
|
|
* The gpu partition is only accessible to the members of the gpu account.
|
|
|
* The number of concurrently allocated cores per user on the free, bigmem and gpu partitions is not limited.
|
|
|
* The number of concurrently allocated cores per user on the free, bigmem and gpu partitions is not limited.
|
|
|
|
|
|
## Usual Slurm user commands
|
|
|
Submit a job:
|
... | ... | @@ -77,8 +77,8 @@ The title of the last column displayed by *squeue* is "NODELIST(REASON)": |
|
|
* For pending jobs, displays the pending reason:
|
|
|
* **Resources**: the resources requested by the job are not currently available since used by other jobs.
|
|
|
* **Priority**: the job priority is lower than the priority of other jobs.
|
|
|
* **QOSMaxCpuPerJobLimit**: the maximal number of authorized allocated cores has been reached by *username*, the job is waiting for some running jobs of *username* to end.
|
|
|
* **BeginTime**: the job has been requeued by the system to fix an issue and is waiting to start again.
|
|
|
* **QOSMaxCpuPerJobLimit**: the maximal number of authorized allocated cores has been reached by *username* ; the job is waiting for some running jobs of *username* to end.
|
|
|
* **BeginTime**: the job has been requeued by the system to fix an issue ; it awaits for Slurm authorization to be started again (this delay is needed for Slurm to analyze the job again, generally no more than three minutes).
|
|
|
* **Held state**: the job is hold by Slurm. To unlock it, do `scontrol release job_id`.
|
|
|
|
|
|
## Error analysis
|
... | ... | |