我有一台在 Exos X18 18TB(ST18000NM001J)的 ZFS 三路镜像上运行 Debian 的服务器。
我正在对它进行基准测试,发现在某些条件下读取率有一些令人惊讶的地方。
但首先,为了进行基准测试,我创建了一个基准测试数据集 (rpool/benchmarking),将主缓存和辅助缓存设置为无,以避免在读取时对缓存进行基准测试,并将压缩设置为关闭,以避免在写入 0 数组时速率过高。然后,我创建了 3 个子数据集,分别名为“8k”、“128k”和“1M”;每个子数据集都有其相应的记录大小。
然后使用以下 dd 脚本:
echo -e "bs=4M recordsize=1M\n"
dd if=/dev/zero of=/benchmarking/1M/ddfile bs=4M count=2000 conv=fdatasync
dd if=/benchmarking/1M/ddfile of=/dev/null bs=4M count=2000 #conv=fdatasync
rm /benchmarking/1M/ddfile
echo -e "------------------\n\n"
echo -e "bs=4k recordsize=1M\n"
dd if=/dev/zero of=/benchmarking/1M/ddfile bs=4k count=2000000 conv=fdatasync
dd if=/benchmarking/1M/ddfile of=/dev/null bs=4k count=2000000 #conv=fdatasync
rm /benchmarking/1M/ddfile
echo -e "------------------\n\n"
echo -e "bs=4M recordsize=128k\n"
dd if=/dev/zero of=/benchmarking/128k/ddfile bs=4M count=2000 conv=fdatasync
dd if=/benchmarking/128k/ddfile of=/dev/null bs=4M count=2000 #conv=fdatasync
rm /benchmarking/128k/ddfile
echo -e "------------------\n\n"
echo -e "bs=4k recordsize=128k\n"
dd if=/dev/zero of=/benchmarking/128k/ddfile bs=4k count=2000000 conv=fdatasync
dd if=/benchmarking/128k/ddfile of=/dev/null bs=4k count=2000000 #conv=fdatasync
rm /benchmarking/128k/ddfile
echo -e "------------------\n\n"
echo -e "bs=4M recordsize=8k\n"
dd if=/dev/zero of=/benchmarking/8k/ddfile bs=4M count=2000 conv=fdatasync
dd if=/benchmarking/8k/ddfile of=/dev/null bs=4M count=2000 #conv=fdatasync
rm /benchmarking/8k/ddfile
echo -e "------------------\n\n"
echo -e "bs=4k recordsize=8k\n"
dd if=/dev/zero of=/benchmarking/8k/ddfile bs=4k count=2000000 conv=fdatasync
dd if=/benchmarking/8k/ddfile of=/dev/null bs=4k count=2000000 #conv=fdatasync
rm /benchmarking/8k/ddfile
echo -e "------------------\n\n"
我得到以下信息:
root@pbs:/benchmarking# ./dd_bench.sh
bs=4M recordsize=1M
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 43.3219 s, 194 MB/s
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 43.7647 s, 192 MB/s
------------------
bs=4k recordsize=1M
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 38.7432 s, 211 MB/s
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 5100.27 s, 1.6 MB/s
------------------
bs=4M recordsize=128k
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 60.1265 s, 140 MB/s
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 56.4249 s, 149 MB/s
------------------
bs=4k recordsize=128k
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 52.044 s, 157 MB/s
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 1242.29 s, 6.6 MB/s
------------------
bs=4M recordsize=8k
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 111.594 s, 75.2 MB/s
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 60.547 s, 139 MB/s
------------------
bs=4k recordsize=8k
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 96.3637 s, 85.0 MB/s
2000000+0 records in
2000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 771.967 s, 10.6 MB/s
当块大小较小(4kb)时,读取速度非常有限(介于 1-10 MB/S 之间)。写入速度则不同。
然后我对所有三个数据集运行了 bonnie++:
root@pbs:~# bonnie++ -d /benchmarking/1M/ -u root -n 160
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 2.00 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
pbs 63624M 527k 93 136m 6 56.1m 4 0k 3 2902k 3 168.4 21
Latency 12952us 27977us 3500ms 21656ms 599ms 990ms
Version 2.00 ------Sequential Create------ --------Random Create--------
pbs -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
160 163840 13 +++++ +++ 163840 6 163840 15 +++++ +++ 163840 6
Latency 277ms 2447us 353ms 287ms 27us 377ms
1.98,2.00,pbs,1,1665552058,63624M,,8192,5,527,93,139633,6,57398,4,0,3,2902,3,168.4,21,160,,,,,9606,13,+++++,+++,1264,6,9808,15,+++++,+++,1147,6,12952us,27977us,3500ms,21656ms,599ms,990ms,277ms,2447us,353ms,287ms,27us,377ms
root@pbs:~# bonnie++ -d /benchmarking/128k/ -u root -n 160
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 2.00 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
pbs 63624M 525k 93 126m 6 44.1m 6 1k 7 10.3m 7 311.3 41
Latency 13067us 17678us 2688ms 6693ms 206ms 390ms
Version 2.00 ------Sequential Create------ --------Random Create--------
pbs -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
160 163840 13 +++++ +++ 163840 6 163840 14 +++++ +++ 163840 6
Latency 284ms 2643us 328ms 266ms 21us 356ms
1.98,2.00,pbs,1,1665335428,63624M,,8192,5,525,93,128601,6,45110,6,1,7,10548,7,311.3,41,160,,,,,8118,13,+++++,+++,1248,6,9634,14,+++++,+++,1173,6,13067us,17678us,2688ms,6693ms,206ms,390ms,284ms,2643us,328ms,266ms,21us,356ms
root@pbs:~# bonnie++ -d /benchmarking/8k/ -u root -n 160
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 2.00 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
pbs 63624M 528k 97 80.2m 6 54.4m 8 1k 4 15.1m 5 264.7 37
Latency 14231us 982us 1535ms 5087ms 342ms 284ms
Version 2.00 ------Sequential Create------ --------Random Create--------
pbs -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
160 163840 13 +++++ +++ 163840 6 163840 14 +++++ +++ 163840 6
Latency 334ms 100us 325ms 311ms 27us 353ms
1.98,2.00,pbs,1,1668749456,63624M,,8192,5,528,97,82088,6,55756,8,1,4,15510,5,264.7,37,160,,,,,9254,13,+++++,+++,1276,6,9582,14,+++++,+++,1066,6,14231us,982us,1535ms,5087ms,342ms,284ms,334ms,100us,325ms,311ms,27us,353ms
而且它的读取率也非常低,如dd。(3、10和15MB/S)
作为最后一步,我运行了另一个 dd bench,这次将 dd bs 与 zfs recordsize 对齐:
root@pbs:~# /benchmarking/dd_bench_2.sh
bs=1M recordsize=1M
8000+0 records in
8000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 62.6119 s, 134 MB/s
8000+0 records in
8000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 65.2772 s, 129 MB/s
------------------
bs=128k recordsize=128k
64000+0 records in
64000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 64.6437 s, 130 MB/s
64000+0 records in
64000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 49.128 s, 171 MB/s
------------------
bs=8k recordsize=8k
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 108.331 s, 75.6 MB/s
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 344.981 s, 23.7 MB/s
------------------
现在这是一个重要的改进,但我仍然期望 8k 的读取速度更快。
然后我将 atime 设置为 off 并重复了最后的测试,但没有什么太大的变化。(1M 数据集已经一直是 atime=off,对此感到抱歉)。
root@pbs:~# /benchmarking/dd_bench_2.sh
bs=1M recordsize=1M
8000+0 records in
8000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 44.505 s, 188 MB/s
8000+0 records in
8000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 40.3689 s, 208 MB/s
------------------
bs=128k recordsize=128k
64000+0 records in
64000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 67.7169 s, 124 MB/s
64000+0 records in
64000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 56.0657 s, 150 MB/s
------------------
bs=8k recordsize=8k
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 103.724 s, 79.0 MB/s
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 343.753 s, 23.8 MB/s
因此,尝试总结一下:
- 为什么我对 bonnie++ 和 small bs dd 的读取速度这么慢?
- 读取速度几乎总是等于或低于写入速度。三向镜像中怎么会这样?系统可以同时从三个设备读取数据,但必须写入 3 倍的数据。
作为附加信息,服务器运行在企业级磁盘上,但使用消费级(不是低端,而是消费级)主板,磁盘连接到主板 SATA 控制器。我知道,它们是低端 SATA 控制器,但有时读取速度很低而写入速度却一直很好,这仍然很奇怪。
另外,我已经检查过驱动器不是 SMR,并且在具有类似硬件/设置的类似服务器中重复了此处所做的测试,获得了类似的结果。
最后,我附上了来自其中一个基准数据集的 zfs get all:
root@pbs:~# zfs get all rpool/benchmarking/128k
NAME PROPERTY VALUE SOURCE
rpool/benchmarking/128k type filesystem -
rpool/benchmarking/128k creation Wed Oct 26 8:45 2022 -
rpool/benchmarking/128k used 96K -
rpool/benchmarking/128k available 12.5T -
rpool/benchmarking/128k referenced 96K -
rpool/benchmarking/128k compressratio 1.00x -
rpool/benchmarking/128k mounted yes -
rpool/benchmarking/128k quota none default
rpool/benchmarking/128k reservation none default
rpool/benchmarking/128k recordsize 128K default
rpool/benchmarking/128k mountpoint /benchmarking/128k inherited from rpool/benchmarking
rpool/benchmarking/128k sharenfs off default
rpool/benchmarking/128k checksum on default
rpool/benchmarking/128k compression off inherited from rpool/benchmarking
rpool/benchmarking/128k atime off local
rpool/benchmarking/128k devices on default
rpool/benchmarking/128k exec on default
rpool/benchmarking/128k setuid on default
rpool/benchmarking/128k readonly off default
rpool/benchmarking/128k zoned off default
rpool/benchmarking/128k snapdir hidden default
rpool/benchmarking/128k aclmode discard default
rpool/benchmarking/128k aclinherit restricted default
rpool/benchmarking/128k createtxg 255400 -
rpool/benchmarking/128k canmount on default
rpool/benchmarking/128k xattr on default
rpool/benchmarking/128k copies 1 default
rpool/benchmarking/128k version 5 -
rpool/benchmarking/128k utf8only off -
rpool/benchmarking/128k normalization none -
rpool/benchmarking/128k casesensitivity sensitive -
rpool/benchmarking/128k vscan off default
rpool/benchmarking/128k nbmand off default
rpool/benchmarking/128k sharesmb off default
rpool/benchmarking/128k refquota none default
rpool/benchmarking/128k refreservation none default
rpool/benchmarking/128k guid 13557460337392366562 -
rpool/benchmarking/128k primarycache none inherited from rpool/benchmarking
rpool/benchmarking/128k secondarycache none inherited from rpool/benchmarking
rpool/benchmarking/128k usedbysnapshots 0B -
rpool/benchmarking/128k usedbydataset 96K -
rpool/benchmarking/128k usedbychildren 0B -
rpool/benchmarking/128k usedbyrefreservation 0B -
rpool/benchmarking/128k logbias latency default
rpool/benchmarking/128k objsetid 60174 -
rpool/benchmarking/128k dedup off default
rpool/benchmarking/128k mlslabel none default
rpool/benchmarking/128k sync standard inherited from rpool
rpool/benchmarking/128k dnodesize legacy default
rpool/benchmarking/128k refcompressratio 1.00x -
rpool/benchmarking/128k written 96K -
rpool/benchmarking/128k logicalused 42K -
rpool/benchmarking/128k logicalreferenced 42K -
rpool/benchmarking/128k volmode default default
rpool/benchmarking/128k filesystem_limit none default
rpool/benchmarking/128k snapshot_limit none default
rpool/benchmarking/128k filesystem_count none default
rpool/benchmarking/128k snapshot_count none default
rpool/benchmarking/128k snapdev hidden default
rpool/benchmarking/128k acltype off default
rpool/benchmarking/128k context none default
rpool/benchmarking/128k fscontext none default
rpool/benchmarking/128k defcontext none default
rpool/benchmarking/128k rootcontext none default
rpool/benchmarking/128k relatime on inherited from rpool
rpool/benchmarking/128k redundant_metadata all default
rpool/benchmarking/128k overlay on default
rpool/benchmarking/128k encryption off default
rpool/benchmarking/128k keylocation none default
rpool/benchmarking/128k keyformat none default
rpool/benchmarking/128k pbkdf2iters 0 default
rpool/benchmarking/128k special_small_blocks 0 default
谢谢你的时间!
编辑:ashift 正确设置为 12,重复数据删除已关闭,碎片化为 0%。