我长期以来一直是 ZFS 的粉丝,并在我的家庭 NAS 上使用它,但在测试它对生产工作负载的可行性时,我发现与相同磁盘上的 XFS 相比,它的性能差得令人难以置信。使用 fio 3.21 使用以下设置在 Intel P4510 8TB 磁盘上进行测试:
fio \
--name=xfs-fio \
--size=10G \
-group_reporting \
--time_based \
--runtime=300 \
--bs=4k \
--numjobs=64 \
--rw=randwrite \
--ioengine=sync \
--directory=/mnt/fio/
结果如下:
xfs-fio: (groupid=0, jobs=64): err= 0: pid=63: Mon Feb 1 21:46:44 2021
write: IOPS=189k, BW=738MiB/s (774MB/s)(216GiB/300056msec); 0 zone resets
clat (usec): min=2, max=2430.4k, avg=336.28, stdev=4745.39
lat (usec): min=2, max=2430.4k, avg=336.38, stdev=4745.40
clat percentiles (usec):
| 1.00th=[ 7], 5.00th=[ 10], 10.00th=[ 10], 20.00th=[ 11],
| 30.00th=[ 12], 40.00th=[ 14], 50.00th=[ 23], 60.00th=[ 35],
| 70.00th=[ 36], 80.00th=[ 37], 90.00th=[ 39], 95.00th=[ 40],
| 99.00th=[ 44], 99.50th=[ 8455], 99.90th=[ 66323], 99.95th=[ 70779],
| 99.99th=[179307]
bw ( KiB/s): min=95565, max=7139939, per=100.00%, avg=757400.32, stdev=21559.21, samples=38262
iops : min=23890, max=1784976, avg=189327.65, stdev=5389.87, samples=38262
lat (usec) : 4=0.03%, 10=13.41%, 20=36.22%, 50=49.56%, 100=0.12%
lat (usec) : 250=0.13%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
lat (msec) : 100=0.46%, 250=0.02%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2000=0.01%, >=2000=0.01%
cpu : usr=0.27%, sys=7.34%, ctx=793590, majf=0, minf=116620
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,56715776,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1Run status group 0 (all jobs):
WRITE: bw=738MiB/s (774MB/s), 738MiB/s-738MiB/s (774MB/s-774MB/s), io=216GiB (232GB), run=300056-300056msecDisk stats (read/write):
nvme7n1: ios=25/21951553, merge=0/173138, ticks=4/660308, in_queue=265520, util=21.39%real
在 ZFS 上,使用此 zpool 创建:
# zpool create -o ashift=13 -o autoreplace=on nvme6 /dev/nvme6n1
该卷创建:
zfs create \
-o mountpoint=/mnt/nvme6 \
-o atime=off \
-o compression=lz4 \
-o dnodesize=auto \
-o primarycache=metadata \
-o recordsize=128k \
-o xattr=sa \
-o acltype=posixacl \
nvme6/test0
结果如下:
zfs-fio: (groupid=0, jobs=64): err= 0: pid=64: Mon Feb 1 23:00:41 2021
write: IOPS=28.3k, BW=110MiB/s (116MB/s)(32.3GiB/300004msec); 0 zone resets
clat (usec): min=7, max=314789, avg=2258.78, stdev=2509.17
lat (usec): min=7, max=314790, avg=2259.28, stdev=2509.22
clat percentiles (usec):
| 1.00th=[ 52], 5.00th=[ 70], 10.00th=[ 81], 20.00th=[ 106],
| 30.00th=[ 225], 40.00th=[ 1057], 50.00th=[ 1713], 60.00th=[ 2606],
| 70.00th=[ 3458], 80.00th=[ 4146], 90.00th=[ 4948], 95.00th=[ 5669],
| 99.00th=[ 8455], 99.50th=[12256], 99.90th=[25560], 99.95th=[30540],
| 99.99th=[39060]
bw ( KiB/s): min=51047, max=455592, per=100.00%, avg=113196.01, stdev=702.99, samples=38272
iops : min=12761, max=113897, avg=28297.59, stdev=175.73, samples=38272
lat (usec) : 10=0.01%, 20=0.01%, 50=0.80%, 100=16.73%, 250=12.93%
lat (usec) : 500=2.45%, 750=2.97%, 1000=3.37%
lat (msec) : 2=14.91%, 4=23.92%, 10=21.20%, 20=0.50%, 50=0.19%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%
cpu : usr=0.31%, sys=7.39%, ctx=11163058, majf=0, minf=32449
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,8476060,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1Run status group 0 (all jobs):
WRITE: bw=110MiB/s (116MB/s), 110MiB/s-110MiB/s (116MB/s-116MB/s), io=32.3GiB (34.7GB), run=300004-300004msecreal
XFS 执行了 189k iops,ZFS 执行了 28.3k iops - 下降了 85% - 吞吐量也相应下降。 CPU 是双 Xeon 6132,该机器的内核是 4.15.0-62-generic,尽管我在 5.x 内核上也看到了相同的效果。