我们有 4 个节点 Riak 安装。它们在安装了 Ubuntu 12.04 LTS Precise 的服务器上运行。我们已于 2012 年 8 月 1 日安装了 1.1.4,并在 1.2.0 可用时升级了它。
服务器名称为:
f1 - 10.10.0.12 - 这是第一个安装的服务器。我们已将其他服务器加入到此服务器。这也为 Riak 控制提供服务。s2 - 10.10.0.22 - s3 - 10.10.0.23 - s4 - 10.10.0.24 - 此服务器也为 Riak 控制提供服务。
今天早上,我们在应用程序日志中看到“可用节点不足”错误,并重新启动了所有节点。除“f1”外,其中 3 个节点可用
更新:当我准备此消息时,有 3 个节点变得不可用,需要重新启动 Riak。
wolfiem@f01:~$ sudo /etc/init.d/riak start
Riak failed to start within 15 seconds,
see the output of 'riak console' for more information.
If you want to wait longer, set the environment variable
WAIT_FOR_ERLANG to the number of seconds to wait.
我试图将 WAIT_FOR_ERLANG 值设置为 60 秒,但是不行。
在 vm.args 中添加此行不起作用:
-env WAIT_FOR_ERLANG 60
我也尝试从终端进行设置,但也没有用。
wolfiem@f01:~$ export WAIT_FOR_ERLANG=60
它仍然显示“Riak 未能在 15 秒内启动”
这是 console.log 输出:
2012-09-11 10:58:02.532 [info] <0.7.0> Application lager started on node '[email protected]'
2012-09-11 10:58:02.560 [warning] <0.148.0>@riak_core_ring_manager:reload_ring:231 No ring file available.
2012-09-11 10:58:02.585 [error] <0.164.0> CRASH REPORT Process <0.164.0> with 0 neighbours exited with reason: eaddrnotavail in gen_server:init_it/6 line 320
这是 error.log 输出
2012-09-11 10:58:02.585 [error] <0.164.0> CRASH REPORT Process <0.164.0> with 0 neighbours exited with reason: eaddrnotavail in gen_server:init_it/6 line 320
这是 crash.log 输出:
2012-09-11 10:58:02 =CRASH REPORT====
crasher:
initial call: mochiweb_socket_server:init/1
pid: <0.164.0>
registered_name: []
exception exit: {eaddrnotavail,[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,320}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
ancestors: [riak_core_sup,<0.135.0>]
messages: []
links: [<0.136.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 377
stack_size: 24
reductions: 403
neighbours:
您可以在下面找到 riak 控制台输出:
wolfiem@f01:~$ riak console
Attempting to restart script through sudo -H -u riak
Exec: /usr/lib/riak/erts-5.9.1/bin/erlexec -boot /usr/lib/riak/releases/1.2.0/riak -embedded -config /etc/riak/app.config -pa /usr/lib/riak/basho-patches -args_file /etc/riak/vm.args -- console
Root: /usr/lib/riak
Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:8:8] [async-threads:64] [kernel-poll:true]
=INFO REPORT==== 11-Sep-2012::10:44:18 ===
alarm_handler: {set,{system_memory_high_watermark,[]}}
** /usr/lib/riak/lib/observer-1.1/ebin/etop_txt.beam hides /usr/lib/riak/lib/basho-patches/etop_txt.beam
** Found 1 name clashes in code paths
10:44:19.099 [info] Application lager started on node '[email protected]'
10:44:19.130 [warning] No ring file available.
10:44:19.158 [error] CRASH REPORT Process <0.164.0> with 0 neighbours exited with reason: eaddrnotavail in gen_server:init_it/6 line 320
/usr/lib/riak/lib/os_mon-2.2.9/priv/bin/memsup: Erlang has closed.
=INFO REPORT==== 11-Sep-2012::10:44:19 ===
alarm_handler: {clear,system_memory_high_watermark}
Erlang has closed
{"Kernel pid terminated",application_controller,"{application_start_failure,riak_core,{shutdown,{riak_core_app,start,[normal,[]]}}}"}
Crash dump was written to: /var/log/riak/erl_crash.dump
Kernel pid terminated (application_controller) ({application_start_failure,riak_core,{shutdown,{riak_core_app,start,[normal,[]]}}})
答案1
这里:
http://smartcloud.blogspot.hu/2013/01/setting-riak-cluster-in-amazon-ec2-just.html
它说with 0 neighbours exited with reason
错误是由于(至少部分)正在运行的 riak 实例导致的,该实例位于某个端口或其他资源上。
对我来说,这是一个正在运行的 epmd 实例,我用 找到了它ps ax |grep riak
。 将其杀死后,问题就消失了。