Monitoring Job Status
PBS and Moab provide multiple tools to view queue, system, and job status. Below are the most common and useful of these tools.
qstat
Use qstat -a to check the status of submitted jobs.
nid00004: ORNL/CCS
Time In Req'd Req'd Elap
Job ID Username Queue Jobname SessID Queue Nodes Time S Time
------ -------- -------- ---------- ------ ------- ------ ----- - -----
107 user1 short run128 5095 000:14 128 02:00 R 00:13
108 user2 long job1 6860 000:55 1024 12:00 R 00:54
109 user1 sys job -- -- 3500 12:00 Q --
Total compute nodes allocated: 1152
The first column is the ID of each job (which has been truncated), and the second column is the owner. The S column gives the status of each job. Here are some common job-status values.
| Status value | Meaning |
|---|---|
| E | Exiting after having run |
| H | Held |
| Q | Queued; eligible to run |
| R | Running |
| S | Suspended |
| T | Being moved to new location |
| W | Waiting for its execution time |
showq
The Moab utility showq can be used to view a more detailed description of the queue. The utility will display the queue in the following states:
- Active
- These jobs are currently running.
- Eligible
- These jobs are currently queued awaiting resources. A user is allowed two jobs in the eligible state.
- Blocked
- These jobs are currently queued but are not eligible to run. Common reasons for jobs in this state are jobs on hold and the owning user currently having two jobs in the eligible state.
checkjob
The Moab utility checkjob can be used to view details of a job in the queue. For example, if job 736 is a job currently in the queue in a blocked state, the following can be used to view why the job is in a blocked state:
>checkjob 736
The return may contain a line similar to the following:
BlockMsg: job 736 violates idle HARD MAXJOB limit of 2 for user <userid> (Req: 1 InUse: 2)
This line indicates the job is in the blocked state because the owning user has reached the limit of two job currently in the eligible state.
showstart
The Moab utility showstart can be used to view an estimated start time for a given job. For example,
> showstart 736 job 736 requires 2048 procs for 1:00:00:00 Estimated Rsv based start in 3:41:18 on Tue March 1 19:21:18 Estimated Rsv based completion in 1:03:41:18 on Wed March 2 19:21:18 >
psview
psview is a very useful Cray utility that displays job information as seen by psched, the underlying system scheduler. It will show if a job is waiting to start, migrating, or in a system queued state.
Queued jobs will be listed in the Posted list. If a job is being migrated by the system or waiting to start, it will be noted in the Notes column.
