为什么对 MD RAID 10 的随机写入速度非常慢(12.5 MB/秒)?

为什么对 MD RAID 10 的随机写入速度非常慢(12.5 MB/秒)?

我刚刚fio在 OpenSuSE 15.4 机器上对 SATA 2 驱动器(WDC_WD5003AZEX;每个 500 GB)的 6 磁盘级 10 MD RAID 进行了测试,该机器由 Intel(R) Xeon(R) CPU E3-1231 v3 @ 3.40GHz 驱动,并配备 8 GB RAM。随机读取测试的结果看起来不错,但随机写入的速度却低得惊人,只有 12.5 MB/秒!另一方面,随机读取的速度为 231 MB/秒。查看完整结果:

jacek@valen:~/bin> fio --rw=randwrite --name=IOPS-write --bs=4k --direct=1 --filename=TESTDATEI --numjobs=4 --ioengine=libaio --iodepth=32 --refill_buffers --group_reporting --runtime=60 --time_based --size=50M
IOPS-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
...
fio-3.23
Starting 4 processes
IOPS-write: Laying out IO file (1 file / 50MiB)
Jobs: 4 (f=4): [w(4)][100.0%][w=9.89MiB/s][w=2531 IOPS][eta 00m:00s]
IOPS-write: (groupid=0, jobs=4): err= 0: pid=12887: Sun Jan 21 18:23:08 2024
  write: IOPS=3048, BW=11.9MiB/s (12.5MB/s)(715MiB/60052msec); 0 zone resets
    slat (usec): min=2, max=305918, avg=1187.21, stdev=7230.65
    clat (usec): min=64, max=536393, avg=40664.78, stdev=42592.39
     lat (usec): min=71, max=579561, avg=41852.09, stdev=43656.54
    clat percentiles (usec):
     |  1.00th=[   118],  5.00th=[   424], 10.00th=[  1156], 20.00th=[  8455],
     | 30.00th=[ 16712], 40.00th=[ 23725], 50.00th=[ 31065], 60.00th=[ 39060],
     | 70.00th=[ 49021], 80.00th=[ 61604], 90.00th=[ 84411], 95.00th=[113771],
     | 99.00th=[212861], 99.50th=[270533], 99.90th=[337642], 99.95th=[358613],
     | 99.99th=[442500]
   bw (  KiB/s): min= 2432, max=27336, per=100.00%, avg=12253.51, stdev=1008.81, samples=476
   iops        : min=  608, max= 6834, avg=3063.38, stdev=252.20, samples=476
  lat (usec)   : 100=0.68%, 250=2.33%, 500=2.66%, 750=1.94%, 1000=1.58%
  lat (msec)   : 2=3.26%, 4=2.61%, 10=6.78%, 20=12.81%, 50=36.33%
  lat (msec)   : 100=22.13%, 250=6.21%, 500=0.67%, 750=0.01%
  cpu          : usr=0.18%, sys=0.61%, ctx=138620, majf=0, minf=54
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,183050,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=11.9MiB/s (12.5MB/s), 11.9MiB/s-11.9MiB/s (12.5MB/s-12.5MB/s), io=715MiB (750MB), run=60052-60052msec

Disk stats (read/write):
    md127: ios=8/182940, merge=0/0, ticks=96/3361540, in_queue=3361636, util=98.53%, aggrios=1/60902, aggrmerge=0/157, aggrticks=17/653169, aggrin_queue=655853, aggrutil=79.82%
  sde: ios=0/60919, merge=0/262, ticks=0/821612, in_queue=823072, util=75.04%
  sdd: ios=5/60702, merge=0/85, ticks=67/388820, in_queue=392116, util=60.59%
  sdc: ios=2/60685, merge=0/102, ticks=21/415700, in_queue=419322, util=60.80%
  sdb: ios=0/60949, merge=0/263, ticks=0/1298230, in_queue=1301479, util=79.82%
  sda: ios=0/61158, merge=0/54, ticks=0/264924, in_queue=267843, util=59.16%
  sdf: ios=1/61004, merge=0/177, ticks=14/729732, in_queue=731287, util=73.09%
jacek@valen:~/bin> fio --rw=randread --name=IOPS-read --bs=4k --direct=1 --filename=TESTDATEI --numjobs=4 --ioengine=libaio --iodepth=32 --refill_buffers --group_reporting --runtime=60 --time_based --size=50M
IOPS-read: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
...
fio-3.23
Starting 4 processes
Jobs: 4 (f=4): [r(4)][100.0%][r=368MiB/s][r=94.3k IOPS][eta 00m:00s]
IOPS-read: (groupid=0, jobs=4): err= 0: pid=13206: Sun Jan 21 18:25:18 2024
  read: IOPS=56.4k, BW=220MiB/s (231MB/s)(12.9GiB/60025msec)
    slat (nsec): min=1859, max=950573k, avg=44532.47, stdev=1617249.31
    clat (usec): min=14, max=1483.2k, avg=2222.19, stdev=12209.79
     lat (usec): min=42, max=1529.8k, avg=2266.83, stdev=12392.98
    clat percentiles (usec):
     |  1.00th=[    43],  5.00th=[    52], 10.00th=[    70], 20.00th=[   105],
     | 30.00th=[   169], 40.00th=[   269], 50.00th=[   396], 60.00th=[   578],
     | 70.00th=[   865], 80.00th=[  1369], 90.00th=[  3458], 95.00th=[  7635],
     | 99.00th=[ 39060], 99.50th=[ 61604], 99.90th=[126354], 99.95th=[170918],
     | 99.99th=[434111]
   bw (  KiB/s): min= 5936, max=476944, per=100.00%, avg=226563.49, stdev=28812.08, samples=473
   iops        : min= 1484, max=119236, avg=56640.85, stdev=7203.01, samples=473
  lat (usec)   : 20=0.01%, 50=4.49%, 100=14.51%, 250=19.37%, 500=17.91%
  lat (usec)   : 750=10.36%, 1000=6.48%
  lat (msec)   : 2=12.68%, 4=5.15%, 10=5.32%, 20=1.77%, 50=1.24%
  lat (msec)   : 100=0.54%, 250=0.16%, 500=0.02%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=2.41%, sys=8.98%, ctx=2545513, majf=0, minf=183
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=3386197,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=220MiB/s (231MB/s), 220MiB/s-220MiB/s (231MB/s-231MB/s), io=12.9GiB (13.9GB), run=60025-60025msec

Disk stats (read/write):
    md127: ios=3375877/48, merge=0/0, ticks=5487920/3772, in_queue=5491692, util=99.90%, aggrios=562564/53, aggrmerge=1801/1, aggrticks=884536/2399, aggrin_queue=888959, aggrutil=94.03%
  sde: ios=533817/50, merge=2499/0, ticks=888034/1377, in_queue=890671, util=85.05%
  sdd: ios=563242/54, merge=1249/4, ticks=845451/3034, in_queue=850599, util=91.99%
  sdc: ios=559723/55, merge=1707/3, ticks=1064233/3354, in_queue=1070553, util=94.03%
  sdb: ios=571615/57, merge=1539/0, ticks=809376/2134, in_queue=813386, util=92.03%
  sda: ios=554047/57, merge=2899/0, ticks=1230739/3124, in_queue=1236615, util=93.41%
  sdf: ios=592943/50, merge=917/0, ticks=469384/1375, in_queue=471930, util=86.55%
jacek@valen:~/bin> ls -al TEST*
-rw-r--r-- 1 jacek users 52428800 21. Jan 18:23 TESTDATEI

系统负载相当高,特别是在写入测试期间:

Netdata 输出 CPU 使用率

RAID 如下所示:

╭─root@valen ~  
╰─➤  cat /proc/mdstat
Personalities : [raid10] 
md127 : active raid10 sda2[1] sdc2[2] sdd2[3] sdb2[0] sdf2[4] sde2[5]
      1463592384 blocks super 1.0 64K chunks 2 near-copies [6/6] [UUUUUU]
      bitmap: 2/11 pages [8KB], 65536KB chunk

unused devices: <none>

╭─root@valen ~  
╰─➤  df -h
Dateisystem    Größe Benutzt Verf. Verw% Eingehängt auf
devtmpfs        4,0M       0  4,0M    0% /dev
tmpfs           3,9G    248K  3,9G    1% /dev/shm
tmpfs           1,6G     25M  1,6G    2% /run
tmpfs           4,0M       0  4,0M    0% /sys/fs/cgroup
/dev/md127p1     64G    3,5G   57G    6% /
/dev/md127p1     64G    3,5G   57G    6% /bak
/dev/md127p1     64G    3,5G   57G    6% /opt
/dev/md127p1     64G    3,5G   57G    6% /root
/dev/md127p1     64G    3,5G   57G    6% /tmp
/dev/md127p1     64G    3,5G   57G    6% /var
/dev/md127p1     64G    3,5G   57G    6% /usr/local
/dev/sda1       475M    115M  336M   26% /boot
/dev/md127p2    128G     24G  105G   19% /srv
/dev/md127p3    1,2T    922G  283G   77% /home
tmpfs           791M       0  791M    0% /run/user/0

是什么原因导致写入速度如此之低?是测试本身的错误吗?不是,因为我已经遇到过写入速度非常慢的情况,尤其是写入大文件(> 10 MB)时。

相关内容