NVMe 驱动器在软件 RAID 1 中的写入性能

NVMe 驱动器在软件 RAID 1 中的写入性能

我们刚刚收到两台全新的 Supermicro 服务器1028U-TN10RT+配备 10 个 NVMe 插槽,其中两个英特尔 DC P3600800GB 硬盘。

我们迫切希望测试这些驱动器的性能,因为规格承诺提供非常好的读取(高达 2.6gb/s)和写入(高达 1gb/s)性能。我们将这两个驱动器置于软件 RAID 1 配置中,因为这是我们想要在生产中使用的配置。我们使用 FIO 进行了测试,结果有些令人困惑。

完整结果如下,但概括如下:RAID1 阵列中的两个驱动器实现约 550MB/s 的随机写入速度(这是较好的运行之一),其中单个驱动器(无 RAID)的写入速度可达约 920MB/s。

使用软件 RAID 的开销是否太高?我们还能进行其他调整吗?

系统有 128GB RAM 并且运行 CentOS 7.1,内核版本升级到 4.2.4。

fio --name=randwrite --ioengine=libaio --iodepth=64 --rw=randwrite \
    --bs=64k --direct=1 --size=32G --numjobs=8 --runtime=240 \
    --group_reporting

直接安装的单个驱动器、xfs 文件系统上的结果:

randwrite: (groupid=0, jobs=8): err= 0: pid=9307: Tue Oct 27 14:36:35 2015
  write: io=217971MB, bw=929843KB/s, iops=14528, runt=240043msec
    slat (usec): min=5, max=933, avg=24.10, stdev= 9.29
    clat (usec): min=32, max=135283, avg=35212.65, stdev=27746.71
     lat (usec): min=49, max=135300, avg=35237.02, stdev=27746.76
    clat percentiles (usec):
     |  1.00th=[  215],  5.00th=[ 2224], 10.00th=[ 5600], 20.00th=[12992],
     | 30.00th=[16768], 40.00th=[19328], 50.00th=[23168], 60.00th=[33536],
     | 70.00th=[47872], 80.00th=[63232], 90.00th=[79360], 95.00th=[88576],
     | 99.00th=[102912], 99.50th=[107008], 99.90th=[116224], 99.95th=[119296],
     | 99.99th=[125440]
    bw (KB  /s): min=42411, max=298624, per=12.51%, avg=116326.24, stdev=24050.53
    lat (usec) : 50=0.01%, 100=0.27%, 250=0.87%, 500=0.77%, 750=0.55%
    lat (usec) : 1000=0.47%
    lat (msec) : 2=1.67%, 4=3.43%, 10=7.17%, 20=27.37%, 50=28.86%
    lat (msec) : 100=26.99%, 250=1.55%
  cpu          : usr=1.75%, sys=4.98%, ctx=3056950, majf=0, minf=56673
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=0/w=3487535/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: io=217971MB, aggrb=929842KB/s, minb=929842KB/s, maxb=929842KB/s, mint=240043msec, maxt=240043msec

Disk stats (read/write):
  nvme2n1: ios=0/4691372, merge=0/0, ticks=0/154695600, in_queue=155446639, util=100.00%

使用 md、RAID 1 时的结果:

randwrite: (groupid=0, jobs=8): err= 0: pid=8553: Tue Oct 27 14:32:03 2015
  write: io=130141MB, bw=555110KB/s, iops=8673, runt=240069msec
    slat (usec): min=20, max=349051, avg=130.51, stdev=2000.03
    clat (usec): min=59, max=912669, avg=58782.87, stdev=50750.42
     lat (usec): min=95, max=927440, avg=58913.81, stdev=51010.14
    clat percentiles (usec):
     |  1.00th=[  668],  5.00th=[ 3472], 10.00th=[ 8512], 20.00th=[21888],
     | 30.00th=[32640], 40.00th=[41728], 50.00th=[48896], 60.00th=[58112],
     | 70.00th=[71168], 80.00th=[86528], 90.00th=[114176], 95.00th=[142336],
     | 99.00th=[216064], 99.50th=[250880], 99.90th=[577536], 99.95th=[716800],
     | 99.99th=[872448]
    bw (KB  /s): min=   70, max=175104, per=12.56%, avg=69708.68, stdev=20589.85
    lat (usec) : 100=0.02%, 250=0.29%, 500=0.43%, 750=0.38%, 1000=0.36%
    lat (msec) : 2=1.22%, 4=2.98%, 10=5.56%, 20=7.47%, 50=32.45%
    lat (msec) : 100=34.50%, 250=13.81%, 500=0.39%, 750=0.08%, 1000=0.05%
  cpu          : usr=1.28%, sys=6.46%, ctx=1727469, majf=0, minf=69488
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=0/w=2082262/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: io=130141MB, aggrb=555110KB/s, minb=555110KB/s, maxb=555110KB/s, mint=240069msec, maxt=240069msec

Disk stats (read/write):
    md0: ios=0/2615652, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=11136/2630386, aggrmerge=0/0, aggrticks=10763/72152582, aggrin_queue=72527830, aggrutil=99.40%
  nvme0n1: ios=22273/2619265, merge=0/0, ticks=21526/14920779, in_queue=14979917, util=49.15%
  nvme1n1: ios=0/2641508, merge=0/0, ticks=0/129384385, in_queue=130075743, util=99.40%

mdadm --detail /dev/md0

/dev/md0:
        Version : 1.2
  Creation Time : Tue Oct 27 13:12:34 2015
     Raid Level : raid1
     Array Size : 781278208 (745.08 GiB 800.03 GB)
  Used Dev Size : 781278208 (745.08 GiB 800.03 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Tue Oct 27 14:54:24 2015
          State : clean 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : localhost.localdomain:0  (local to host localhost.localdomain)
           UUID : cf2ce291:0c52f361:bc40dffa:918595d9
         Events : 706

    Number   Major   Minor   RaidDevice State
       0     259        3        0      active sync   /dev/nvme0n1p1
       1     259        1        1      active sync   /dev/nvme1n1p1

答案1

这可能是使用的内部写入意图位图的副作用。使用mdadm <dev> --grow --bitmap=none将其删除,然后重试fio

无论如何,我强烈建议你反对在没有启用位图的阵列的情况下进入生产阶段,因为崩溃/断电将迫使阵列进行完整的逐字节扫描/比较。写入意图位图将保证很多更快的恢复。

相关内容