堆叠站点上 DRBD 磁盘 drbd10 的 I/O 较高

堆叠站点上 DRBD 磁盘 drbd10 的 I/O 较高

我们有 4 台 Redhat Boxes Dell PowerEdge R630(例如 a、b、c、d),具有以下操作系统/软件包。

RedHat EL 6.5 MySql Enterprise 5.6 DRBD 8.4 Corosync 1.4.7

我们已经设置了 4 路堆叠 drbd 资源,如下所示:

集群 Cluster-1:服务器 a 和 b 相互连接到本地局域网 集群 Cluster-2:服务器 c 和 d

集群 Cluster-1 和 Cluster-2 通过虚拟 IP 堆叠 drbd 连接,并且属于不同的数据中心。

在每台服务器本地创建了 1GB 大小的 drbd0 磁盘,并进一步附加到 drbd10。

总体设置包括 4 层:Tomcat 前端应用程序 -> rabbitmq -> memcache -> mysql/drbd

我们遇到了非常高的磁盘 IO,即使现在活动还不是必须的。但流量/活动将在几周内增加,因此我们担心这会对性能造成非常严重的影响。只有堆叠站点的 I/O 使用率才会升高(有时达到 90% 及以上)。辅助站点没有这个问题。有时当应用程序处于理想状态时,使用率会升高。

因此请分享一些建议/调整指南,以帮助我们解决问题;

resource clusterdb {
protocol C;
handlers {
pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notifyemergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notifyemergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergencyshutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
}
startup {
degr-wfc-timeout 120; # 2 minutes.
outdated-wfc-timeout 2; # 2 seconds.
}
disk {
on-io-error detach;
no-disk-barrier;
no-md-flushes;
}

net {
cram-hmac-alg "sha1";
shared-secret "clusterdb";
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}

syncer {
rate 10M;
al-extents 257;
 on-no-data-accessible io-error;
 }

 on sever-1 {
 device /dev/drbd0;
 disk /dev/sda2;
 address 10.170.26.28:7788;
 meta-disk internal;
 }
 on ever-2 {
 device /dev/drbd0;
 disk /dev/sda2;
 address 10.170.26.27:7788;
 meta-disk internal;
 }
}

堆叠配置:-

    resource clusterdb_stacked {
  protocol A;
handlers {
pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notifyemergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notifyemergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergencyshutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
}
startup {
degr-wfc-timeout 120; # 2 minutes.
outdated-wfc-timeout 2; # 2 seconds.
}
disk {
on-io-error detach;
no-disk-barrier;
no-md-flushes;
}

net {
cram-hmac-alg "sha1";
shared-secret "clusterdb";
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}

syncer {
rate 10M;
al-extents 257;
 on-no-data-accessible io-error;
 }

  stacked-on-top-of clusterdb {
    device     /dev/drbd10;
    address   10.170.26.28:7788;
  }
 stacked-on-top-of clusterdb_DR {
    device     /dev/drbd10;
    address    10.170.26.60:7788; 
  }
}

所请求的数据:-

Date || svctm(w_wait)|| %util
10:32:01 3.07 55.23 94.11
10:33:01 3.29 50.75 96.27
10:34:01 2.82 41.44 96.15
10:35:01 3.01 72.30 96.86
10:36:01 4.52 40.41 94.24
10:37:01 3.80 50.42 83.86
10:38:01 3.03 72.54 97.17
10:39:01 4.96 37.08 89.45
10:41:01 3.55 66.48 70.19
10:45:01 2.91 53.70 89.57
10:46:01 2.98 49.49 94.73
10:55:01 3.01 48.38 93.70
10:56:01 2.98 43.47 97.26
11:05:01 2.80 61.84 86.93
11:06:01 2.67 43.35 96.89
11:07:01 2.68 37.67 95.41

根据评论更新问题:-

与本地相比,它实际上很高。

本地服务器之间

[root@pri-site-valsql-a]#ping pri-site-valsql-b
PING pri-site-valsql-b.csn.infra.sm (10.170.24.23) 56(84) bytes of data.
64 bytes from pri-site-valsql-b.csn.infra.sm (10.170.24.23): icmp_seq=1 ttl=64 time=0.143 ms
64 bytes from pri-site-valsql-b.csn.infra.sm (10.170.24.23): icmp_seq=2 ttl=64 time=0.145 ms
64 bytes from pri-site-valsql-b.csn.infra.sm (10.170.24.23): icmp_seq=3 ttl=64 time=0.132 ms
64 bytes from pri-site-valsql-b.csn.infra.sm (10.170.24.23): icmp_seq=4 ttl=64 time=0.145 ms
64 bytes from pri-site-valsql-b.csn.infra.sm (10.170.24.23): icmp_seq=5 ttl=64 time=0.150 ms
64 bytes from pri-site-valsql-b.csn.infra.sm (10.170.24.23): icmp_seq=6 ttl=64 time=0.145 ms
64 bytes from pri-site-valsql-b.csn.infra.sm (10.170.24.23): icmp_seq=7 ttl=64 time=0.132 ms
64 bytes from pri-site-valsql-b.csn.infra.sm (10.170.24.23): icmp_seq=8 ttl=64 time=0.127 ms
64 bytes from pri-site-valsql-b.csn.infra.sm (10.170.24.23): icmp_seq=9 ttl=64 time=0.134 ms
64 bytes from pri-site-valsql-b.csn.infra.sm (10.170.24.23): icmp_seq=10 ttl=64 time=0.149 ms
64 bytes from pri-site-valsql-b.csn.infra.sm (10.170.24.23): icmp_seq=11 ttl=64 time=0.147 ms
^C
--- pri-site-valsql-b.csn.infra.sm ping statistics ---
11 packets transmitted, 11 received, 0% packet loss, time 10323ms
rtt min/avg/max/mdev = 0.127/0.140/0.150/0.016 ms

两个堆叠的服务器之间

[root@pri-site-valsql-a]#ping dr-site-valsql-b
PING dr-site-valsql-b.csn.infra.sm (10.170.24.48) 56(84) bytes of data.
64 bytes from dr-site-valsql-b.csn.infra.sm (10.170.24.48): icmp_seq=1 ttl=64 time=9.68 ms
64 bytes from dr-site-valsql-b.csn.infra.sm (10.170.24.48): icmp_seq=2 ttl=64 time=4.51 ms
64 bytes from dr-site-valsql-b.csn.infra.sm (10.170.24.48): icmp_seq=3 ttl=64 time=4.53 ms
64 bytes from dr-site-valsql-b.csn.infra.sm (10.170.24.48): icmp_seq=4 ttl=64 time=4.51 ms
64 bytes from dr-site-valsql-b.csn.infra.sm (10.170.24.48): icmp_seq=5 ttl=64 time=4.51 ms
64 bytes from dr-site-valsql-b.csn.infra.sm (10.170.24.48): icmp_seq=6 ttl=64 time=4.52 ms
64 bytes from dr-site-valsql-b.csn.infra.sm (10.170.24.48): icmp_seq=7 ttl=64 time=4.52 ms
^C
--- dr-site-valsql-b.csn.infra.sm ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, time 6654ms
rtt min/avg/max/mdev = 4.510/5.258/9.686/1.808 ms
[root@pri-site-valsql-a]#

显示高 I/O 的输出:-

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
drbd0             0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.06    0.00    0.00   99.94

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
drbd0             0.00     0.00    0.00    2.00     0.00    16.00     8.00     0.90    1.50 452.25  90.45

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.25    0.00    0.13    0.50    0.00   99.12

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
drbd0             0.00     0.00    1.00   44.00     8.00   352.00     8.00     1.07    2.90  18.48  83.15

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.13    0.00    0.06    0.25    0.00   99.56

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
drbd0             0.00     0.00    0.00   31.00     0.00   248.00     8.00     1.01    2.42  27.00  83.70

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.19    0.00    0.06    0.00    0.00   99.75

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
drbd0             0.00     0.00    0.00    2.00     0.00    16.00     8.00     0.32    1.50 162.25  32.45

编辑属性文件。但仍然没有运气

disk {
on-io-error detach;
no-disk-barrier;
no-disk-flushes;
no-md-flushes;
c-plan-ahead 0;
c-fill-target 24M;
c-min-rate 80M;
c-max-rate 300M;
al-extents 3833;
}

net {
cram-hmac-alg "sha1";
shared-secret "clusterdb";
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
max-epoch-size 20000;
max-buffers 20000;
unplug-watermark 16;
}

syncer {
rate 100M;
 on-no-data-accessible io-error;
 } 

答案1

我在您的配置中没有看到堆叠资源。您也没有提到任何版本号,但看到 al-extents 设置得如此之低,让我认为您正在运行一些古老的版本 (8.3.x) 或遵循一些非常古老的说明。

无论如何,假设您使用协议 A 进行堆叠设备的复制(异步),当 IO 峰值时,您仍然会快速填满您的 TCP 发送缓冲区,并因此在缓冲区刷新时达到 IO 等待;DRBD 需要将其复制的写入放在某处,并且只能有这么多未确认的复制写入在进行中。

IO 等待会增加系统负载。如果暂时断开堆叠的资源,系统负载是否会稳定下来?这是验证是否存在此问题的一种方式。您还可以使用 netstat 或 ss 等工具查看 TCP 缓冲区,以了解负载较高时缓冲区的满载程度。

除非您的站点之间的连接的延迟和吞吐量非常惊人(暗光纤等),否则您可能需要/想要研究使用 LINBIT 的 DRBD Proxy;它可以让您使用系统内存来缓冲写入。

相关内容