我正在尝试在集群上设置 QoS 限制,具体来说,我想确保没有人能够将作业提交到特定分区。因此,我在分区上启用了 CPU 限制,它按预期工作,但我发现重新启动 slurm 后这些限制没有保留,这是正常的吗?
$/opt/slurm/bin/scontrol update PartitionName=login-queue QoS=login-node
$scontrol show partition login-queue
PartitionName=login-queue
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=login-node
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=1 MaxTime=UNLIMITED MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED MaxCPUsPerSocket=UNLIMITED
NodeSets=login-queue_nodes
Nodes=login-queue-st-t3medium-1
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=2 TotalNodes=1 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
TRES=cpu=2,mem=3891M,node=1,billing=2
ResumeTimeout=GLOBAL SuspendTimeout=GLOBAL SuspendTime=GLOBAL PowerDownOnIdle=NO
$systemctl restart slurmctld.service
$scontrol show partition login-queue
PartitionName=login-queue
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=1 MaxTime=UNLIMITED MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED MaxCPUsPerSocket=UNLIMITED
NodeSets=login-queue_nodes
Nodes=login-queue-st-t3medium-1
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=2 TotalNodes=1 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
TRES=cpu=2,mem=3891M,node=1,billing=2
QoS=N/A
重启后
有没有办法设置它们以便它们持久存在?很高兴分享所需的任何其他日志