我正在尝试配置我们的 Slurm 以根据过去的使用情况分配用户的工作。目标是让作业优先级考虑过去主要 GPU 的使用情况。我知道我必须使用 FairShare 树来配置其权重,但我不确定如何为 Fairshare 配置要考虑的变量。我似乎只是不明白这是如何运作的要点。
当前配置是:
###Job Priority##
#Fair tree, multifactor with its parameters (http://slurm.schedmd.com/fair_tree.html)
#PriorityFlags=FAIR_TREE
#PriorityType=priority/multifactor
#PriorityDecayHalfLife=14-0
#PriorityMaxAge=7-0
#PriorityUsageResetPeriod=NONE
#PriorityWeightAge=100
#PriorityWeightFairshare=10000
#Unused weights
#PriorityWeightJobSize=0
#PriorityWeightPartition=0
#PriorityWeightQOS=0
可能的配置(如果我错了请纠正我):
###Job Priority##
#Fair tree, multifactor with its parameters (http://slurm.schedmd.com/fair_tree.html)
PriorityType=priority/multifactor
#PriorityDecayHalfLife=14-0 PriorityMaxAge=7-0 PriorityWeightAge=1000 PriorityWeightFairshare=10000
#Unused weights PriorityWeightJobSize=100 PriorityWeightPartition=0 PriorityWeightQOS=0
PriorityUsageResetPeriod=MONTHLY
AccountingStorageTRES=gres/gpu,gres/gpu:geforce_rtx_1080,gres/gpu:titan,gres/gpu:quadro,gres/gpu:geforce_rtx_3090,gres/gpu:v100
TRESBillingWeights="CPU=1.0,Mem=0.25G,gres/gpu=1.0"
这能解决问题吗?更改 TRESBilling 的权重?
感谢那些愿意回答的人!