squeue -t PD
NODELIST (REASON)shows why the job isn't running
JOB REASON CODEStowards the end
scontrol release JOBID
To prevent a single user from monopolizing all resources, we use GrpTRESRunMins. See your limits with
sacctmgr list user $USER witha \ format=Account,MaxCPUs,GrpTRESRunMins%50,GrpTRES%40
In triton we use a hierarchical fair share algorithm which assings a priority value to each pending job.
scontrol show job JOBIDwill show the estimated start time of your job (if the scheduler has ever reached it)
slurm history 3days
sacct -j JOBID
ExitCode=0:0means (slurm thinks) the job completed successfully
slurm-JOBID.outby default) often contains the reason.