我的 pbs/Torque 作业中只有一半正在被安排

我的 pbs/Torque 作业中只有一半正在被安排

我的超级计算中心最近从 SGE 迁移到了 pbs/Torque。现在,当我安排阵列作业时,阵列中只有一半的作业得到安排。当它们完成后,另一半得到安排。尽管这些作业的利用率很高,但这种情况仍然会发生。

例如,我刚刚调度了一个包含 10 个作业的数组。这是 10 分钟后的 qstat 输出:

[myuserna@sub ~]$ qstat -t
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
3100[1].systemm2           ...-to-work.sh-1 myuserna        00:07:40 R short          
3100[2].systemm2           ...-to-work.sh-2 myuserna        00:07:32 R short          
3100[3].systemm2           ...-to-work.sh-3 myuserna        00:09:55 R short          
3100[4].systemm2           ...-to-work.sh-4 myuserna        00:09:44 R short          
3100[5].systemm2           ...-to-work.sh-5 myuserna        00:09:07 R short          
3100[6].systemm2           ...-to-work.sh-6 myuserna               0 Q short          
3100[7].systemm2           ...-to-work.sh-7 myuserna               0 Q short          
3100[8].systemm2           ...-to-work.sh-8 myuserna               0 Q short          
3100[9].systemm2           ...-to-work.sh-9 myuserna               0 Q short          
3100[10].systemm2          ...to-work.sh-10 myuserna               0 Q short          
[myuserna@sub ~]$ 

关于如何修复调度程序有什么线索吗?

以下是调度程序配置的相关部分:

create queue short
set queue short queue_type = Execution
set queue short Priority = 10000
set queue short max_user_queuable = 500
set queue short max_running = 200
set queue short resources_max.walltime = 24:00:00
set queue short resources_default.nodes = 1
set queue short max_user_run = 50
set queue short enabled = True
set queue short started = True
#

#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = systemm2
set server acl_roots = root@*
set server managers = [email protected]
set server operators = [email protected]
set server default_queue = route
set server log_events = 511
set server mail_from = adm
set server resources_default.walltime = 01:00:00
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server mom_job_sync = True
set server keep_completed = 300
set server submit_hosts = submit-1
set server submit_hosts += submit-0
set server auto_node_np = True
set server next_job_number = 6217
set server max_job_array_size = 512
set server max_slot_limit = 5

答案1

请咨询您的管理员。可以限制每个用户每个队列使用的插槽数量。

更新:好的,现在您已更新问题以显示

set server max_slot_limit = 5

我确信这回答了这个问题。

相关内容