我面临的问题是 slurmctld 和 slurmd 在使用相同的 slurm.conf 文件方面不同步,所以我们有这个:
error: Node node1 appears to have a different slurm.conf than the slurmctld. This could cause issues with communication and functionality. Please review both files and make sure they are the same. If this is expected ignore, and set DebugFlags=NO_CONF_HASH in your slurm.conf.
error: Node node2 appears to have a different slurm.conf than the slurmctld. This could cause issues with communication and functionality. Please review both files and make sure they are the same. If this is expected ignore, and set DebugFlags=NO_CONF_HASH in your slurm.conf.
error: Node node3 appears to have a different slurm.conf than the slurmctld. This could cause issues with communication and functionality. Please review both files and make sure they are the same. If this is expected ignore, and set DebugFlags=NO_CONF_HASH in your slurm.conf.
error: Node node4 appears to have a different slurm.conf than the slurmctld. This could cause issues with communication and functionality. Please review both files and make sure they are the same. If this is expected ignore, and set DebugFlags=NO_CONF_HASH in your slurm.conf.
有没有办法(除了解析日志错误)来查询 slurmctld/slurmd个别地关于它们正在运行的配置,以了解是否需要重新启动或重新配置它们中的任何一个?我认为,获得哈希值应该足以将它们相互比较。
slurm.conf
更新:还知道读取文件的时间会很方便。
答案1
我建议使用无配置在浆液会议中。当守护进程启动时,您仍然会在 slurm 日志中收到错误消息,但可以安全地忽略它们。所有 slurmd 系统都会从 slurm 控制器中获取正确的配置。