我正在将 Windows 上的 kafka/zookeeper 集群迁移到 Debian wheezy。
- Java版本:1.7.0_80
- Debian 版本:7.9
- Zookeeper版本:3.3.5+dfsg1-2 0
- 卡夫卡版本:2.10-0.8.2.1
如果我使用其他 Debian 服务器的 IP 地址在 Debian 服务器上配置 Zookeeper,则一切正常。如果我改用 DNS 名称,则 Debian 服务器上的领导者选举会失败。
在 Debian 服务器上,我可以使用“host”命令查找任何其他 Debian 服务器的 IP,因此 DNS 解析正常。
一切都是自动化的:服务器创建、Debian 安装、zookeeper 安装、zookeeper 配置;因此,手动配置错误的窗口处于最低限度,并且易于重现或更改。
使用clientPortAddress=DNSNAME
没有任何区别;它仍然失败。 iptables 中没有配置任何内容。这些服务器之间没有防火墙。
其中服务器1-3为Windows 2012R2服务器,服务器4-6为Debian服务器。
此配置有效:
server.1=testkafka400:2888:3888
server.2=testkafka401:2888:3888
server.3=testkafka402:2888:3888
server.4=10.1.132.152:2888:3888
server.5=10.1.132.153:2888:3888
server.6=10.1.132.154:2888:3888
此配置不起作用:
server.1=testkafka400:2888:3888
server.2=testkafka401:2888:3888
server.3=testkafka402:2888:3888
server.4=testkafka403:2888:3888
server.5=testkafka404:2888:3888
server.6=testkafka405:2888:3888
当我使用 DNS 名称时,我得到以下输出 - 其中异常会重复出现。请注意,以下日志来自包含以下内容的集群设置:仅有的Debian 服务器,使用 DNS 名称,以便进行测试。如果我切换到IP,集群就可以工作并且可以进行选举。
[2015-11-03 13:55:52,309] INFO Reading configuration from: /etc/zookeeper/config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2015-11-03 13:55:52,322] INFO Defaulting to majority quorums (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
[2015-11-03 13:55:52,344] INFO autopurge.snapRetainCount set to 3 (org.apache.zookeeper.server.DatadirCleanupManager)
[2015-11-03 13:55:52,344] INFO autopurge.purgeInterval set to 24 (org.apache.zookeeper.server.DatadirCleanupManager)
[2015-11-03 13:55:52,345] INFO Purge task started. (org.apache.zookeeper.server.DatadirCleanupManager)
[2015-11-03 13:55:52,454] INFO Purge task completed. (org.apache.zookeeper.server.DatadirCleanupManager)
[2015-11-03 13:55:52,472] INFO Starting quorum peer (org.apache.zookeeper.server.quorum.QuorumPeerMain)
[2015-11-03 13:55:52,581] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory)
[2015-11-03 13:55:52,601] INFO tickTime set to 3000 (org.apache.zookeeper.server.quorum.QuorumPeer)
[2015-11-03 13:55:52,601] INFO minSessionTimeout set to -1 (org.apache.zookeeper.server.quorum.QuorumPeer)
[2015-11-03 13:55:52,601] INFO maxSessionTimeout set to -1 (org.apache.zookeeper.server.quorum.QuorumPeer)
[2015-11-03 13:55:52,601] INFO initLimit set to 20 (org.apache.zookeeper.server.quorum.QuorumPeer)
[2015-11-03 13:55:52,626] INFO Reading snapshot /etc/zookeeper/data/version-2/snapshot.0 (org.apache.zookeeper.server.persistence.FileSnap)
[2015-11-03 13:55:52,675] INFO My election bind port: testkafka403.prod.local/127.0.1.1:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
[2015-11-03 13:55:52,713] INFO LOOKING (org.apache.zookeeper.server.quorum.QuorumPeer)
[2015-11-03 13:55:52,715] INFO New election. My id = 4, proposed zxid=0x100000014 (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2015-11-03 13:55:52,717] INFO Notification: 1 (message format version), 4 (n.leader), 0x100000014 (n.zxid), 0x1 (n.round), LOOKING (n.state), 4 (n.sid), 0x1 (n.peerEpoch) LOOKING (my state) (org.apache.zookeeper.server.quorum.FastLeaderElection)
[2015-11-03 13:55:52,732] WARN Cannot open channel to 5 at election address testkafka404.prod.local/10.1.132.153:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.net.SocketTimeoutException
at java.net.SocksSocketImpl.remainingMillis(SocksSocketImpl.java:111)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Thread.java:745)
[2015-11-03 13:55:52,737] WARN Cannot open channel to 6 at election address testkafka405.prod.local/10.1.132.154:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
at java.lang.Thread.run(Thread.java:745)
[2015-11-03 13:55:52,919] WARN Cannot open channel to 6 at election address testkafka405.prod.local/10.1.132.154:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
我们确实希望能够使用 DNS 名称,但不知道应该从哪里开始寻找解决方案。也许我们错过了安装或激活重要的 Debian 或 Java 功能?
答案1
好的,我知道这里发生了什么。当我尝试在 Linux 虚拟机上的 Vagrant 中设置 3 节点 Spring-XD 集群时,我看到了同样的问题。
此配置有效:
server.1=172.28.128.3:2888:3888
server.2=172.28.128.4:2888:3888
server.3=172.28.128.7:2888:3888
但这个没有:
server.1=spring-xd-1:2888:3888
server.2=spring-xd-2:2888:3888
server.3=spring-xd-3:2888:3888
“确凿的证据”是我的动物园管理员日志中的这一行:
2015-11-26 20:48:31,439 [myid:1] - INFO [Thread-2:QuorumCnxManager$Listener@504] - 我选择绑定端口:spring-xd-1/127.0.0.1:3888
那么,为什么 Zookeeper 要将选举端口绑定在环回接口上呢?嗯...
我的/etc/hosts
其中一台虚拟机看起来像这样:
127.0.0.1 spring-xd-1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
## vagrant-hostmanager-start
172.28.128.3 spring-xd-1
172.28.128.4 spring-xd-2
172.28.128.7 spring-xd-3
## vagrant-hostmanager-end
127.0.0.1
我从行中删除了主机名/etc/hosts
,并在所有 3 个节点上退回了 Zookeeper 服务,并且嘭!一切都变得玫瑰花。所以,现在每台机器上的主机文件如下所示:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
## vagrant-hostmanager-start
172.28.128.3 spring-xd-1
172.28.128.4 spring-xd-2
172.28.128.7 spring-xd-3
## vagrant-hostmanager-end
我猜您在 Windows 上没有看到这个问题,因为C:\Windows\System32\drivers\etc\hosts
默认情况下主机文件 ( ) 没有条目。通过添加类似的127.0.0.1
行,您应该能够在 Windows 上重现该问题。
我称之为动物园管理员错误。编辑主机文件足以证明问题并在 Vagrant 中修复它,但我不会推荐它用于任何“真实”环境。
编辑:根据http://ccl.cse.nd.edu/operations/condor/hostname.shtml,这似乎是 Linux 上的集群应用程序的一个相当常见的问题,建议按照我上面的描述编辑主机文件。但是,那有关集群设置的 Zookeeper 文档没有提到它。