在高 I/O 下,DRBD 将崩溃并导致服务器瘫痪;有没有办法优化 DRBD 以防止再次发生这种情况。下面列出了我当前的配置、错误和规格。如果您需要更多信息,请告诉我。提前致谢。
最新的 drbd 配置(与辅助配置相同):
[root@23 ~]# cat /etc/drbd.d/drbd0.res
resource drbd0 {
startup {
degr-wfc-timeout 30; # default is 2 minutes.
}
disk {
on-io-error detach;
fencing dont-care;
disk-barrier no;
disk-flushes no;
al-extents 3389;
}
net {
max-buffers 8000;
max-epoch-size 8000;
sndbuf-size 512k;
unplug-watermark 16;
after-sb-1pri discard-secondary;
}
on 23 {
device /dev/drbd0;
disk /dev/sdb1;
address 10.251.30.148:7789;
flexible-meta-disk internal;
}
on 23-t2 {
device /dev/drbd0;
disk /dev/sdb1;
address 10.48.25.66:7789;
flexible-meta-disk internal;
}
}
崩溃后的错误:
"echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message
INFO: task drbd_w_drbd1:2412 blocked for more that 120 seconds
"echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message
INFO: task master:2506 blocked for more that 120 seconds
"echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message
INFO: task java:2653 blocked for more that 120 seconds
"echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message
INFO: task jbd2/drbd1-8:2234 blocked for more that 120 seconds
"echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message
INFO: task cdpserver:2380 blocked for more that 120 seconds
"echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message
INFO: task cdpserver:2396 blocked for more that 120 seconds
"echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message
INFO: task cdpserver:2409 blocked for more that 120 seconds
"echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message
INFO: task cdpserver:2416 blocked for more that 120 seconds
"echo 0 > proc/sys/kernel/hung_task_timeout_secs" disables this message
BUG: soft lockup - CPU#10 stuck for 67s! [scsi_eh_6:616]
BUG: soft lockup - CPU#10 stuck for 67s! [scsi_eh_6:616]
aacraid: acc_fib_send: first asynshronous command timed out
Usually a result of a PCI interrup routing problem"
update mother board BIOS or consider utilizing one of
the SAFE mode kernel option (acpi, apic etc)
当前设置:
CentOS release 6.3
2.6.32-279.5.2.el6.x86_64
drbd-8.4.1-1.el6.x86_64
2XE5620
12GM of mem
Adaptec 5805
/dev/drbd0 15T
/dev/drbd1 15T
答案1
您还没有解释崩溃在这种情况下意味着什么。在您的“崩溃后”消息中,DRBD 看起来确实仍在运行。cat /proc/drbd
事件发生后会说什么?什么ps -ef|grep -i [d]rbd
?
无论如何,在我看来,您的磁盘和/或存储控制器的性能不足以承受高 IO 负载,因此导致系统(尤其是 DRBD)在刷新磁盘写入时等待太长时间。如果是这种情况,那么这是您的硬件设置问题,而不是 DRBD 问题。但为了确定,您可能需要将此问题提交给 DRBD 邮件列表。