使用 kvm 启动虚拟机时 SSD 性能缓慢

使用 kvm 启动虚拟机时 SSD 性能缓慢

在我的主机硬件上我有 1G 的速度


在我使用 kvm 创建的虚拟机上,它下降到大约 20MB

我的主机运行的是 ubuntu 22.04 LTS



我正在使用基于文件的虚拟机。我创建了 raw 和 qcow2 类型的磁盘,我看到的唯一区别是指定时创建文件磁盘。

我尝试通过 virt-manager 在磁盘上设置 nocache

这是设备信息 在此处输入图片描述



单次 4KiB 随机写入过程:最糟糕的测试


$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
Starting 1 process
random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [w(1)][100.0%][eta 00m:00s]                          
random-write: (groupid=0, jobs=1): err= 0: pid=114493: Tue Jan 24 12:42:44 2023
  write: IOPS=10.1k, BW=39.5MiB/s (41.5MB/s)(4096MiB/103604msec); 0 zone resets
    slat (nsec): min=1920, max=587633, avg=3761.73, stdev=3026.96
    clat (usec): min=11, max=2551.6k, avg=26.49, stdev=2593.73
     lat (usec): min=13, max=2551.7k, avg=30.25, stdev=2593.74
    clat percentiles (usec):
     |  1.00th=[   20],  5.00th=[   22], 10.00th=[   22], 20.00th=[   22],
     | 30.00th=[   22], 40.00th=[   23], 50.00th=[   23], 60.00th=[   23],
     | 70.00th=[   23], 80.00th=[   24], 90.00th=[   25], 95.00th=[   26],
     | 99.00th=[   32], 99.50th=[   34], 99.90th=[   44], 99.95th=[  165],
     | 99.99th=[  545]
   bw (  KiB/s): min=24864, max=152592, per=100.00%, avg=135295.44, stdev=25421.57, samples=62
   iops        : min= 6216, max=38148, avg=33823.85, stdev=6355.39, samples=62
  lat (usec)   : 20=1.13%, 50=98.80%, 100=0.01%, 250=0.05%, 500=0.01%
  lat (usec)   : 750=0.02%
  lat (msec)   : 2=0.01%, 500=0.01%, 750=0.01%, >=2000=0.01%
  cpu          : usr=5.71%, sys=7.64%, ctx=1063940, majf=0, minf=366
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1048577,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=39.5MiB/s (41.5MB/s), 39.5MiB/s-39.5MiB/s (41.5MB/s-41.5MB/s), io=4096MiB (4295MB), run=103604-103604msec

Disk stats (read/write):
    dm-0: ios=0/240696, merge=0/0, ticks=0/16578288, in_queue=16578288, util=85.10%, aggrios=0/242596, aggrmerge=0/3006, aggrticks=0/20300771, aggrin_queue=20300770, aggrutil=89.20%
  sda: ios=0/242596, merge=0/3006, ticks=0/20300771, in_queue=20300770, util=89.20%

$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][eta 00m:00s]                          
random-write: (groupid=0, jobs=1): err= 0: pid=114600: Tue Jan 24 12:45:29 2023
  write: IOPS=11.2k, BW=43.7MiB/s (45.8MB/s)(4096MiB/93810msec); 0 zone resets
    slat (nsec): min=1800, max=637861, avg=3705.65, stdev=2443.65
    clat (usec): min=10, max=582234, avg=22.74, stdev=706.46
     lat (usec): min=12, max=582238, avg=26.45, stdev=706.47
    clat percentiles (usec):
     |  1.00th=[   17],  5.00th=[   20], 10.00th=[   21], 20.00th=[   21],
     | 30.00th=[   21], 40.00th=[   21], 50.00th=[   22], 60.00th=[   22],
     | 70.00th=[   22], 80.00th=[   22], 90.00th=[   24], 95.00th=[   25],
     | 99.00th=[   31], 99.50th=[   33], 99.90th=[   44], 99.95th=[  151],
     | 99.99th=[  537]
   bw (  KiB/s): min=44784, max=185360, per=100.00%, avg=147168.42, stdev=18660.88, samples=57
   iops        : min=11196, max=46340, avg=36792.07, stdev=4665.22, samples=57
  lat (usec)   : 20=6.13%, 50=93.79%, 100=0.01%, 250=0.05%, 500=0.01%
  lat (usec)   : 750=0.02%
  lat (msec)   : 2=0.01%, 250=0.01%, 500=0.01%, 750=0.01%
  cpu          : usr=6.33%, sys=7.47%, ctx=1079749, majf=0, minf=327
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1048577,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=43.7MiB/s (45.8MB/s), 43.7MiB/s-43.7MiB/s (45.8MB/s-45.8MB/s), io=4096MiB (4295MB), run=93810-93810msec

Disk stats (read/write):
    dm-0: ios=0/257987, merge=0/0, ticks=0/14471372, in_queue=14471372, util=80.94%, aggrios=0/259380, aggrmerge=0/3269, aggrticks=0/20576252, aggrin_queue=20576252, aggrutil=88.06%
  sda: ios=0/259380, merge=0/3269, ticks=0/20576252, in_queue=20576252, util=88.06%

$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][eta 00m:00s]                          
random-write: (groupid=0, jobs=1): err= 0: pid=114700: Tue Jan 24 12:48:03 2023
  write: IOPS=10.5k, BW=41.0MiB/s (43.0MB/s)(4096MiB/99783msec); 0 zone resets
    slat (nsec): min=1931, max=543062, avg=3706.35, stdev=3369.72
    clat (usec): min=11, max=659263, avg=22.63, stdev=643.97
     lat (usec): min=14, max=659267, avg=26.33, stdev=643.98
    clat percentiles (usec):
     |  1.00th=[   19],  5.00th=[   21], 10.00th=[   21], 20.00th=[   21],
     | 30.00th=[   22], 40.00th=[   22], 50.00th=[   22], 60.00th=[   22],
     | 70.00th=[   22], 80.00th=[   23], 90.00th=[   24], 95.00th=[   25],
     | 99.00th=[   29], 99.50th=[   33], 99.90th=[   43], 99.95th=[  139],
     | 99.99th=[  537]
   bw (  KiB/s): min= 5648, max=166179, per=100.00%, avg=144625.43, stdev=22760.25, samples=58
   iops        : min= 1412, max=41544, avg=36156.28, stdev=5690.11, samples=58
  lat (usec)   : 20=3.87%, 50=96.05%, 100=0.01%, 250=0.05%, 500=0.01%
  lat (usec)   : 750=0.02%, 1000=0.01%
  lat (msec)   : 20=0.01%, 750=0.01%
  cpu          : usr=5.86%, sys=7.61%, ctx=1080511, majf=0, minf=359
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1048577,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=41.0MiB/s (43.0MB/s), 41.0MiB/s-41.0MiB/s (43.0MB/s-43.0MB/s), io=4096MiB (4295MB), run=99783-99783msec

Disk stats (read/write):
    dm-0: ios=0/245070, merge=0/0, ticks=0/17235960, in_queue=17235960, util=83.79%, aggrios=0/246419, aggrmerge=0/3660, aggrticks=0/22057670, aggrin_queue=22057670, aggrutil=88.55%
  sda: ios=0/246419, merge=0/3660, ticks=0/22057670, in_queue=22057670, util=88.55%

此测试在运行 openstack(控制器 2)的虚拟机上进行,其中 openstack 中有 3 个单个裸虚拟机,没有在 kvm 上运行任何应用程序

$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
Starting 1 process
random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]                         
random-write: (groupid=0, jobs=1): err= 0: pid=451129: Tue Jan 24 13:04:09 2023
  write: IOPS=250, BW=1001KiB/s (1026kB/s)(826MiB/844616msec); 0 zone resets
    slat (nsec): min=604, max=487941, avg=3069.50, stdev=3227.61
    clat (usec): min=2, max=116745k, avg=576.78, stdev=253872.83
     lat (usec): min=9, max=116745k, avg=579.85, stdev=253872.85
    clat percentiles (usec):
     |  1.00th=[   11],  5.00th=[   13], 10.00th=[   14], 20.00th=[   15],
     | 30.00th=[   15], 40.00th=[   19], 50.00th=[   22], 60.00th=[   24],
     | 70.00th=[   26], 80.00th=[   31], 90.00th=[   40], 95.00th=[   49],
     | 99.00th=[   76], 99.50th=[   91], 99.90th=[  359], 99.95th=[  685],
     | 99.99th=[  873]
   bw (  KiB/s): min=13680, max=195824, per=100.00%, avg=130092.46, stdev=52846.56, samples=13
   iops        : min= 3420, max=48956, avg=32523.08, stdev=13211.60, samples=13
  lat (usec)   : 4=0.01%, 10=0.96%, 20=46.60%, 50=48.11%, 100=3.99%
  lat (usec)   : 250=0.23%, 500=0.03%, 750=0.06%, 1000=0.02%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, >=2000=0.01%
  cpu          : usr=0.10%, sys=0.13%, ctx=264372, majf=0, minf=29
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,211466,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=1001KiB/s (1026kB/s), 1001KiB/s-1001KiB/s (1026kB/s-1026kB/s), io=826MiB (866MB), run=844616-844616msec

Disk stats (read/write):
    dm-0: ios=232/163901, merge=0/0, ticks=144/7660152, in_queue=7660296, util=17.91%, aggrios=221/160213, aggrmerge=11/3722, aggrticks=159/1113901, aggrin_queue=1983749, aggrutil=43.00%
  vda: ios=221/160213, merge=11/3722, ticks=159/1113901, in_queue=1983749, util=43.00%

$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]                          
random-write: (groupid=0, jobs=1): err= 0: pid=452551: Tue Jan 24 13:25:06 2023
  write: IOPS=286, BW=1145KiB/s (1172kB/s)(973MiB/869962msec); 0 zone resets
    slat (nsec): min=1014, max=520262, avg=3532.80, stdev=4003.56
    clat (nsec): min=910, max=57218M, avg=259432.63, stdev=114674189.43
     lat (usec): min=13, max=57218k, avg=262.97, stdev=114674.22
    clat percentiles (usec):
     |  1.00th=[   14],  5.00th=[   16], 10.00th=[   18], 20.00th=[   19],
     | 30.00th=[   21], 40.00th=[   22], 50.00th=[   23], 60.00th=[   24],
     | 70.00th=[   27], 80.00th=[   29], 90.00th=[   34], 95.00th=[   42],
     | 99.00th=[   70], 99.50th=[   77], 99.90th=[  172], 99.95th=[  502],
     | 99.99th=[22676]
   bw (  KiB/s): min= 5336, max=161784, per=100.00%, avg=110630.83, stdev=54549.81, samples=18
   iops        : min= 1334, max=40446, avg=27657.67, stdev=13637.43, samples=18
  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 20=28.78%, 50=68.68%, 100=2.30%
  lat (usec)   : 250=0.17%, 500=0.02%, 750=0.02%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 20=0.01%, 50=0.01%, >=2000=0.01%
  cpu          : usr=0.13%, sys=0.17%, ctx=260439, majf=0, minf=30
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,248968,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=1145KiB/s (1172kB/s), 1145KiB/s-1145KiB/s (1172kB/s-1172kB/s), io=973MiB (1020MB), run=869962-869962msec

Disk stats (read/write):
    dm-0: ios=124/189939, merge=0/0, ticks=64/6847936, in_queue=6848000, util=72.81%, aggrios=79/179513, aggrmerge=45/10455, aggrticks=26/1126630, aggrin_queue=2028077, aggrutil=90.71%
  vda: ios=79/179513, merge=45/10455, ticks=26/1126630, in_queue=2028077, util=90.71%

您可以从中看到它从 43MB/s 降到了 1MB/s。这是一个大问题

此测试在 Openstack VM 控制器 2 上进行,但虚拟化软件是 ESXi

$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
Starting 1 process
random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]                          
random-write: (groupid=0, jobs=1): err= 0: pid=530128: Tue Jan 24 13:18:47 2023
  write: IOPS=3149, BW=12.3MiB/s (12.9MB/s)(1722MiB/139918msec); 0 zone resets
    slat (nsec): min=1385, max=749909, avg=11219.59, stdev=9674.52
    clat (nsec): min=610, max=149012k, avg=122940.18, stdev=866525.51
     lat (usec): min=35, max=149020, avg=134.16, stdev=866.28
    clat percentiles (usec):
     |  1.00th=[   35],  5.00th=[   35], 10.00th=[   46], 20.00th=[   51],
     | 30.00th=[   60], 40.00th=[   63], 50.00th=[   64], 60.00th=[   68],
     | 70.00th=[   70], 80.00th=[   72], 90.00th=[   79], 95.00th=[   89],
     | 99.00th=[  221], 99.50th=[ 1467], 99.90th=[13829], 99.95th=[16188],
     | 99.99th=[19530]
   bw (  KiB/s): min= 9672, max=99544, per=100.00%, avg=29553.08, stdev=21110.49, samples=119
   iops        : min= 2418, max=24886, avg=7388.23, stdev=5277.64, samples=119
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 50=17.22%, 100=79.51%
  lat (usec)   : 250=2.37%, 500=0.14%, 750=0.08%, 1000=0.06%
  lat (msec)   : 2=0.12%, 4=0.01%, 10=0.30%, 20=0.18%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=3.14%, sys=6.60%, ctx=564104, majf=0, minf=30
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,440722,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=12.3MiB/s (12.9MB/s), 12.3MiB/s-12.3MiB/s (12.9MB/s-12.9MB/s), io=1722MiB (1805MB), run=139918-139918msec

Disk stats (read/write):
    dm-0: ios=0/240336, merge=0/0, ticks=0/3124100, in_queue=3124100, util=91.31%, aggrios=0/235436, aggrmerge=0/5071, aggrticks=0/2887407, aggrin_queue=2887407, aggrutil=92.02%
  sda: ios=0/235436, merge=0/5071, ticks=0/2887407, in_queue=2887407, util=92.02%

$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]                          
random-write: (groupid=0, jobs=1): err= 0: pid=530294: Tue Jan 24 13:21:08 2023
  write: IOPS=6080, BW=23.8MiB/s (24.9MB/s)(2393MiB/100740msec); 0 zone resets
    slat (nsec): min=1367, max=1029.8k, avg=11761.38, stdev=10525.79
    clat (nsec): min=915, max=62359k, avg=82333.89, stdev=390799.49
     lat (usec): min=35, max=62382, avg=94.10, stdev=391.00
    clat percentiles (usec):
     |  1.00th=[   36],  5.00th=[   37], 10.00th=[   47], 20.00th=[   59],
     | 30.00th=[   65], 40.00th=[   67], 50.00th=[   69], 60.00th=[   71],
     | 70.00th=[   72], 80.00th=[   74], 90.00th=[   82], 95.00th=[   98],
     | 99.00th=[  192], 99.50th=[  253], 99.90th=[ 8356], 99.95th=[ 9372],
     | 99.99th=[16057]
   bw (  KiB/s): min=23136, max=95208, per=100.00%, avg=41702.67, stdev=13481.11, samples=117
   iops        : min= 5784, max=23802, avg=10425.62, stdev=3370.29, samples=117
  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 50=13.24%, 100=82.03%, 250=4.21%
  lat (usec)   : 500=0.22%, 750=0.10%, 1000=0.02%
  lat (msec)   : 2=0.06%, 4=0.01%, 10=0.06%, 20=0.05%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu          : usr=6.24%, sys=13.79%, ctx=755651, majf=0, minf=29
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,612557,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=23.8MiB/s (24.9MB/s), 23.8MiB/s-23.8MiB/s (24.9MB/s-24.9MB/s), io=2393MiB (2509MB), run=100740-100740msec

Disk stats (read/write):
    dm-0: ios=0/353311, merge=0/0, ticks=0/2510080, in_queue=2510080, util=93.10%, aggrios=0/325545, aggrmerge=0/28769, aggrticks=0/2168746, aggrin_queue=2168746, aggrutil=93.35%
  sda: ios=0/325545, merge=0/28769, ticks=0/2168746, in_queue=2168746, util=93.35%

$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]                          
random-write: (groupid=0, jobs=1): err= 0: pid=530405: Tue Jan 24 13:23:08 2023
  write: IOPS=5930, BW=23.2MiB/s (24.3MB/s)(2308MiB/99631msec); 0 zone resets
    slat (nsec): min=1378, max=1395.4k, avg=12724.69, stdev=10859.25
    clat (nsec): min=797, max=22413k, avg=83620.52, stdev=356081.74
     lat (usec): min=35, max=22415, avg=96.35, stdev=356.19
    clat percentiles (usec):
     |  1.00th=[   36],  5.00th=[   48], 10.00th=[   57], 20.00th=[   65],
     | 30.00th=[   69], 40.00th=[   71], 50.00th=[   71], 60.00th=[   72],
     | 70.00th=[   73], 80.00th=[   76], 90.00th=[   81], 95.00th=[   93],
     | 99.00th=[  184], 99.50th=[  219], 99.90th=[ 8291], 99.95th=[10290],
     | 99.99th=[14091]
   bw (  KiB/s): min=26568, max=100256, per=100.00%, avg=40559.51, stdev=9507.31, samples=116
   iops        : min= 6642, max=25064, avg=10139.87, stdev=2376.85, samples=116
  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 50=6.37%, 100=89.89%, 250=3.36%
  lat (usec)   : 500=0.15%, 750=0.09%, 1000=0.02%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.04%, 20=0.06%, 50=0.01%
  cpu          : usr=6.64%, sys=14.57%, ctx=711625, majf=0, minf=28
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,590890,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=23.2MiB/s (24.3MB/s), 23.2MiB/s-23.2MiB/s (24.3MB/s-24.3MB/s), io=2308MiB (2420MB), run=99631-99631msec

Disk stats (read/write):
    dm-0: ios=0/302542, merge=0/0, ticks=0/2060836, in_queue=2060836, util=83.71%, aggrios=0/302903, aggrmerge=0/388, aggrticks=0/1961686, aggrin_queue=1961686, aggrutil=83.91%
  sda: ios=0/302903, merge=0/388, ticks=0/1961686, in_queue=1961686, util=83.91%

我有三星 SSD 870 QVO 2TB,总共 4TB 运行 raid 0

这是我的 kvm xml



我相信三星 870 是一款消费级驱动器,它的性能会下降,并且很有可能出现相关故障,尤其是在最有可能运行 Ceph 的多节点集群中。

以下型号的 2 TB 版本(7.6 TB 版本)将是更好的选择:SAMSUNG MZ7LH7T6HMLA-00005

请特别注意扇区对齐,大多数操作系统在 1 MiB 边界上创建分区,并从扇区 2048 上启动第一个分区(考虑模拟的 512 字节扇区大小)。

在下面的例子中,我将显示单位切换为扇区。打印输出还显示模拟(逻辑)扇区大小为每扇区 512 字节,而驱动器将数据分布在 4 KiB 页(物理扇区大小)中。Parted 还有一个内置命令来检查分区的扇区对齐情况:

[root@kvm1a ~]# parted /dev/sda
GNU Parted 3.4
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) unit s
(parted) p
Model: ATA SAMSUNG MZ7LH7T6 (scsi)
Disk /dev/sda: 15002931888s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start      End           Size          File system  Name        Flags
 1      2048s      4095s         2048s                      bbp         bios_grub
 2      4096s      62918655s     62914560s                  non-fs      raid
 3      62918656s  65015807s     2097152s                   non-fs      raid
 4      65015808s  65220607s     204800s       xfs          ceph data
 5      65220608s  15002929151s  14937708544s               ceph block

(parted) align-check optimal 1
1 aligned
(parted) align-check optimal 2
2 aligned
(parted) align-check optimal 3
3 aligned
(parted) align-check optimal 4
4 aligned
(parted) align-check optimal 5
5 aligned

分区应该从扇区 2048 开始,这样可以得到一个干净的 1 MiB 起始边界,即 2048 x 512(扇区大小)= 1048576(1 MiB)。许多人认为这会浪费空间,并尝试从扇区 1 开始创建分区。然而,这会导致问题,因为第一个可寻址扇区实际上是 0,而不是 1。扇区 0 是为 MBR/GPT 分区表和引导跳转代码保留的。

如果有人发现这有用,这里有一个脚本,它验证计算节点上所有 Ceph RBD 映射映像上分区的起始扇区:

  rbd showmapped | grep /dev/rbd | awk '{print $3" "$5}' | while read disk dev; do
    parted --script $dev 'unit s p'| grep -P '^\s+\d' | while read partition start info; do
      if [ $num != $((num/2048*2048)) ]; then
        [ `echo $info | grep -c 'Microsoft reserved partition'` -lt 1 ] && \
        [ `grep -Pc "\s131072\s+${dev#/dev/}$" /proc/partitions` -lt 1 ] && \
        echo "$disk mounted as $dev has problem with partition $partition";
  # 2048 comes from 1024*1024/512 = 2048
  # excludes spacer partitions created by Windows
  # excludes MikroTik CHR disks of 128 MiB


使用 KVM 我们发现,当 qemu 配置为使用带有写回缓存的 vioscsi 时,运行多个 VM 的主机将提供最佳性能。

由于系统不将奖励复制到缓存,一些基准测试显示禁用缓存后读取性能更高,但这极大地有利于读取并减轻了 Ceph/iSCSI 存储的压力。

PS:写回模式具有刷新感知功能,因此它像任何其他表现良好的硬件 RAID 控制器一样工作,并且在事务上是安全的。
