我升级了我的服务器。然后我在我的服务器上一一启动了corosync服务。我首先在 3 台服务器上启动,等待 5 分钟。然后我在其他服务器上启动了接下来的 4 个 corosync,并且 7 个服务器同时崩溃了。我使用 corosync 已有 5 年了。我正在使用;
Kernel: 4.14.32-1-lts
Corosync 2.4.2-1
Pacemaker 1.1.18-1
我以前从未见过这个。我想新的 corosync 版本中有些东西被破坏了,真的非常糟糕!
Kernel: 4.14.70-1-lts
Corosync 2.4.4-3
Pacemaker 2.0.0-1
-
这是我的 corosync.conf:https://paste.ubuntu.com/p/7KCq8pHKn3/ 你能告诉我如何找到问题的原因吗?
Sep 25 08:56:03 SRV-2 corosync[29089]: [TOTEM ] A new membership (10.10.112.10:56) was formed. Members joined: 7
Sep 25 08:56:03 SRV-2 corosync[29089]: [VOTEQ ] Waiting for all cluster members. Current votes: 7 expected_votes: 28
Sep 25 08:56:03 SRV-2 corosync[29089]: [VOTEQ ] Waiting for all cluster members. Current votes: 7 expected_votes: 28
Sep 25 08:56:03 SRV-2 corosync[29089]: [VOTEQ ] Waiting for all cluster members. Current votes: 7 expected_votes: 28
Sep 25 08:56:03 SRV-2 corosync[29089]: [VOTEQ ] Waiting for all cluster members. Current votes: 7 expected_votes: 28
Sep 25 08:56:03 SRV-2 corosync[29089]: [QUORUM] Members[7]: 1 2 3 4 5 6 7
Sep 25 08:56:03 SRV-2 corosync[29089]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 25 08:56:03 SRV-2 corosync[29089]: [VOTEQ ] Waiting for all cluster members. Current votes: 7 expected_votes: 28
Sep 25 08:56:03 SRV-2 systemd[1]: Created slice system-systemd\x2dcoredump.slice.
Sep 25 08:56:03 SRV-2 systemd[1]: Started Process Core Dump (PID 43798/UID 0).
Sep 25 08:56:03 SRV-2 systemd[1]: corosync.service: Main process exited, code=dumped, status=11/SEGV
Sep 25 08:56:03 SRV-2 systemd[1]: corosync.service: Failed with result 'core-dump'.
Sep 25 08:56:03 SRV-2 kernel: watchdog: watchdog0: watchdog did not stop!
Sep 25 08:56:03 SRV-2 systemd-coredump[43799]: Process 29089 (corosync) of user 0 dumped core.
Stack trace of thread 29089:
#0 0x0000000000000000 n/a (n/a)
Write failed: Broken pipe
coredumpctl info
PID: 23658 (corosync)
UID: 0 (root)
GID: 0 (root)
Signal: 11 (SEGV)
Timestamp: Mon 2018-09-24 09:50:58 +03 (1 day 3h ago)
Command Line: corosync
Executable: /usr/bin/corosync
Control Group: /system.slice/corosync.service
Unit: corosync.service
Slice: system.slice
Boot ID: 79d67a83f83c4804be6ded8e6bd5f54d
Machine ID: 9b1ca27d3f4746c6bcfcdb93b83f3d45
Hostname: SRV-1
Storage: /var/lib/systemd/coredump/core.corosync.0.79d67a83f83c4804be6ded8e6bd5f54d.23658.153777185>
Message: Process 23658 (corosync) of user 0 dumped core.
Stack trace of thread 23658:
#0 0x0000000000000000 n/a (n/a)
PID: 5164 (corosync)
UID: 0 (root)
GID: 0 (root)
Signal: 11 (SEGV)
Timestamp: Tue 2018-09-25 08:56:03 +03 (4h 9min ago)
Command Line: corosync
Executable: /usr/bin/corosync
Control Group: /system.slice/corosync.service
Unit: corosync.service
Slice: system.slice
Boot ID: 2f49ec6cdcc144f0a8eb712bbfbd7203
Machine ID: 9b1ca27d3f4746c6bcfcdb93b83f3d45
Hostname: SRV-1
Storage: /var/lib/systemd/coredump/core.corosync.0.2f49ec6cdcc144f0a8eb712bbfbd7203.5164.1537854963>
Message: Process 5164 (corosync) of user 0 dumped core.
Stack trace of thread 5164:
#0 0x0000000000000000 n/a (n/a)
我找不到更多日志,所以无法挖掘问题。
答案1
降级到“corosync 2.4.2-1”后问题解决。为什么你们对这个话题投“-”票?它是如此清晰,就像你看到的那样,这是 corosync 或拱门建造者的错。
如果您遇到问题,只需降级即可节省时间。