如果起搏器出现故障，如何强制其继续重新启动 SystemD 资源（而不是将其置于“停止”状态）？

2024-6-2 • tag-icon

如果起搏器出现故障，如何强制其继续重新启动 SystemD 资源（而不是将其置于“停止”状态）？

我的目标是使用虚拟 IP (VIP) 实现 2 个节点 HTTP 负载均衡器。对于此任务，我选择了pacemaker（虚拟 IP 切换）和球童用于 HTTP 负载均衡器。负载均衡器的选择不是这个问题的重点。:)

我的要求很简单 - 我希望将虚拟 IP 分配给运行健康且正常运行的 Caddy 实例的主机。

以下是我使用 Pacemaker 实现的方法：

# Disable stonith feature
pcs property set stonith-enabled=false

# Ignore quorum policy
pcs property set no-quorum-policy=ignore

# Setup virtual IP
pcs resource create ClusterIP ocf:heartbeat:IPaddr2 ip=123.123.123.123

# Setup caddy resource, using SystemD provider. By default it runs on one instance at a time, so clone it and cloned one by default runs on all nodes at the same time.
# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/configuring_the_red_hat_high_availability_add-on_with_pacemaker/ch-advancedresource-haar
pcs resource create caddy systemd:caddy clone

# Enable constraint, so both VirtualIP assigned and application running _on the same_ node.
pcs constraint colocation add ClusterIP with caddy-clone INFINITY

但是，如果我通过 SSH 连接到分配了虚拟 IP 的节点，格式错误的 Caddy 配置文件并执行此操作systemctl restart caddy- 一段时间后，pacemaker 会检测到 caddy 启动失败并将其置于stopped状态。

我如何强制起搏器继续重启我的 SystemD 资源而不是将其置于stopped状态？

最重要的是 - 如果我修复配置文件并执行systemctl restart caddy，它就会启动，但起搏器会进一步保持它的stopped状态。

最重要的是 - 如果我停止另一个节点，则虚拟 IP 不会在任何地方分配，原因如下：

# Enable constraint, so both VirtualIP assigned and application running _on the same_ node.
pcs constraint colocation add ClusterIP with caddy-clone INFINITY

有人能指出我做错的事情的正确方向吗？

答案1

在 Pacemaker 中，某些故障被认为是致命的，一旦遇到，就需要手动清理（除非您已配置节点级防护，通过防护故障节点来为您清理它们）。

您需要告诉 Pacemakerstart操作失败并不致命。我通常还会设置一个失败超时，在没有隔离的集群中，它会在几秒钟后自动清除操作失败。

pcs property set start-failure-is-fatal=false
pcs property set failure-timeout=300

答案1

相关内容