我有一个正在运行的 HBase 集群,我正在尝试向该集群添加一些新服务器,但是“SocketException:参数无效”和“FailedServerException:该服务器位于失败的服务器列表中”日志中不断产生错误。
2014-07-02 22:28:01,140 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to master. Retrying. Error was:
java.net.SocketException: Invalid argument
at sun.nio.ch.Net.connect(Native Method)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:534)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:193)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:528)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:492)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:392)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:438)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1141)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:988)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87)
at com.sun.proxy.$Proxy10.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:141)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:2040)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2086)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:748)
at java.lang.Thread.run(Thread.java:701)
2014-07-02 22:28:31,764 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to master. Retrying. Error was:
org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is in the failed servers list: <MY_MASTER_SERVER>/<MY_MASTER_NAME>:<MY_MASTER_PORT>
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:427)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1141)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:988)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87)
at com.sun.proxy.$Proxy10.getProtocolVersion(Unknown Source)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:141)
at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:2040)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2086)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:748)
at java.lang.Thread.run(Thread.java:701)
到目前为止我找不到新旧服务器之间的任何区别:
- 两者都运行 Ubuntu 12.04 并安装所有最新更新,以及 Cloudera 的 CDH4 for HBase
- /etc/hosts 中也没有主 HBase 的条目(尽管我尝试在新服务器上添加一个,但仍然有同样的问题)
- 防火墙应配置相同,没有任何本地网络限制(注意:在新服务器上,我可以通过 telnet 连接到端口 60000,即我的 HBase Master 端口,没有任何错误)
在调试这个问题时,我看到网上提到 IPv6 配置可能会导致问题,但据我所知,新旧服务器都有 Ubuntu 为此使用的默认配置。
关于如何进一步调试和/或可能存在什么问题,您有什么想法吗?