故障转移群集由于神秘的 IP 冲突而无法故障转移?

故障转移群集由于神秘的 IP 冲突而无法故障转移?

我的故障转移群集遇到了一个神秘问题,

Cluster name: PrintCluster01.domain.com
Members: PrintServer01.domain.com  andPrintServer02.domain.com

在故障转移群集管理 - 群集事件中,我收到了严重错误消息 1135 和 1177:

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 15/06/2011 9:07:49 PM
Event ID: 1177
Task Category: None
Level: Critical
Keywords: 
User: SYSTEM
Computer: PrintServer01.domain.com
Description:
The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk. 
Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.


Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 15/06/2011 9:07:28 PM
Event ID: 1135
Task Category: None
Level: Critical
Keywords: 
User: SYSTEM
Computer: PrintServer01.domain.com
Description:
Cluster node 'PrintServer02' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

经过进一步调查,我在这里发现了一些有趣的错误,来自 PrintServer02 上的事件查看器中记录的第一个严重错误消息:

Log Name: System
Source: Tcpip
Date: 15/06/2011 9:07:29 PM
Event ID: 4199
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: PrintServer02-VM.domain.com
Description:
The system detected an address conflict for IP address 192.168.127.142 with the system having network hardware address 00-50-56-AE-29-23. Network operations on this system may be disrupted as a result.

192.168.127.142 --> PrintServer01 的辅助 IP,怎么可能与 PrintServer01 节点之一发生冲突?详细信息如下:

**From PrintServer01**
Ethernet adapter Local Area Connection* 8:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
 Physical Address. . . . . . . . . : 02-50-56-AE-29-23
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 169.254.1.183(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.0.0
 Default Gateway . . . . . . . . . :
 NetBIOS over Tcpip. . . . . . . . : Enabled

我已在所有集群成员中仔细检查过所有 IP 地址现在都是唯一的。

然而,我确信 IP 是静态的,而不是由 DHCP 提供的,如下面的 IPCONFIG 结果所示:

From **PrintServer01** (the Active Node)
Windows IP Configuration

Host Name . . . . . . . . . . . . : PrintServer01
 Primary Dns Suffix . . . . . . . : domain.com
 Node Type . . . . . . . . . . . . : Hybrid
 IP Routing Enabled. . . . . . . . : No
 WINS Proxy Enabled. . . . . . . . : No
 DNS Suffix Search List. . . . . . : domain.com
 domain.com.au

Ethernet adapter Local Area Connection* 8:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
 Physical Address. . . . . . . . . : 02-50-56-AE-29-23
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 169.254.1.183(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.0.0
 Default Gateway . . . . . . . . . :
 NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Cluster Public Network:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection
 Physical Address. . . . . . . . . : 00-50-56-AE-29-23
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 192.168.127.155(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 IPv4 Address. . . . . . . . . . . : 192.168.127.88(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 IPv4 Address. . . . . . . . . . . : 192.168.127.142(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 IPv4 Address. . . . . . . . . . . : 192.168.127.143(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 IPv4 Address. . . . . . . . . . . : 192.168.127.144(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 Default Gateway . . . . . . . . . : 192.168.127.254
 DNS Servers . . . . . . . . . . . : 192.168.127.10
 192.168.127.11
 Primary WINS Server . . . . . . . : 192.168.127.10
 Secondary WINS Server . . . . . . : 192.168.127.11
 NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Cluster Private Network:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection #2
 Physical Address. . . . . . . . . : 00-50-56-AE-43-EC
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 10.184.2.2(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 Default Gateway . . . . . . . . . :
 NetBIOS over Tcpip. . . . . . . . : Disabled


From **PrintServer02**
Windows IP Configuration

Host Name . . . . . . . . . . . . : PrintServer02
 Primary Dns Suffix . . . . . . . : domain.com
 Node Type . . . . . . . . . . . . : Hybrid
 IP Routing Enabled. . . . . . . . : No
 WINS Proxy Enabled. . . . . . . . : No
 DNS Suffix Search List. . . . . . : domain.com
 domain.com.au

Ethernet adapter Local Area Connection* 8:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
 Physical Address. . . . . . . . . : 02-50-56-AE-5F-E5
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 169.254.2.86(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.0.0
 Default Gateway . . . . . . . . . :
 NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Cluster Public Network:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection
 Physical Address. . . . . . . . . : 00-50-56-AE-79-FA
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 192.168.127.172(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 IPv4 Address. . . . . . . . . . . : 192.168.127.119(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 Default Gateway . . . . . . . . . : 192.168.127.254
 DNS Servers . . . . . . . . . . . : 192.168.127.10
 192.168.127.11
 Primary WINS Server . . . . . . . : 192.168.127.11
 Secondary WINS Server . . . . . . : 192.168.127.10
 NetBIOS over Tcpip. . . . . . . . : Enabled

Ethernet adapter Cluster Private Network:

Connection-specific DNS Suffix . :
 Description . . . . . . . . . . . : Intel® PRO/1000 MT Network Connection #2
 Physical Address. . . . . . . . . : 00-50-56-AE-77-8D
 DHCP Enabled. . . . . . . . . . . : No
 Autoconfiguration Enabled . . . . : Yes
 IPv4 Address. . . . . . . . . . . : 10.184.2.3(Preferred)
 Subnet Mask . . . . . . . . . . . : 255.255.255.0
 Default Gateway . . . . . . . . . :
 NetBIOS over Tcpip. . . . . . . . : Disabled

任何帮助将不胜感激。

谢谢,AWT

答案1

当群集中的多个节点同时尝试使资源组(及其关联的 IP)联机时,就会发生 IP 地址冲突错误。

如果集群节点暂时失去联系,就会发生这种情况。每个节点都认为另一个节点已发生故障,因此“被动”节点将使所有资源组联机,而实际上它们在“主动”节点上仍处于联机状态。

当其中一个 ESX(i) 主机过载时,我在我们的 VMWare 环境中看到了这个问题 - 有时甚至只是在 HBA 总线重新扫描期间,MSCS 节点就会突然短暂地失去联系,从而出现这种混乱。

答案2

使用此页面上的脚本查询虚拟机 mac 地址:

http://www.virtuallyghetto.com/2011/05/how-to-query-for-macs-on-internal.html

将其与您的异常 MAC 地址进行匹配并仔细检查机器。

答案3

我认为任何逻辑服务 IP 都应具有 /32 子网掩码。网络应由物理 IP 提供服务,该物理 IP 应具有与所用子网匹配的子网掩码。

答案4

我通过自动分配 IP 并再次手动分配 IP 解决了此问题。这要求删除不存在的设备,这解决了问题。

相关内容