ZFS 镜像中的读取速度很慢(比写入速度慢,并且对于小块大小来说非常慢)

ZFS 镜像中的读取速度很慢(比写入速度慢,并且对于小块大小来说非常慢)

我有一台在 Exos X18 18TB(ST18000NM001J)的 ZFS 三路镜像上运行 Debian 的服务器。

我正在对它进行基准测试,发现在某些条件下读取率有一些令人惊讶的地方。

但首先,为了进行基准测试,我创建了一个基准测试数据集 (rpool/benchmarking),将主缓存和辅助缓存设置为无,以避免在读取时对缓存进行基准测试,并将压缩设置为关闭,以避免在写入 0 数组时速率过高。然后,我创建了 3 个子数据集,分别名为“8k”、“128k”和“1M”;每个子数据集都有其相应的记录大小。

然后使用以下 dd 脚本:

echo -e "bs=4M recordsize=1M\n"
dd if=/dev/zero of=/benchmarking/1M/ddfile bs=4M count=2000 conv=fdatasync
dd if=/benchmarking/1M/ddfile of=/dev/null bs=4M count=2000 #conv=fdatasync
rm /benchmarking/1M/ddfile
echo -e "------------------\n\n"

echo -e "bs=4k recordsize=1M\n"
dd if=/dev/zero of=/benchmarking/1M/ddfile bs=4k count=2000000 conv=fdatasync
dd if=/benchmarking/1M/ddfile of=/dev/null bs=4k count=2000000 #conv=fdatasync
rm /benchmarking/1M/ddfile
echo -e "------------------\n\n"

echo -e "bs=4M recordsize=128k\n"
dd if=/dev/zero of=/benchmarking/128k/ddfile bs=4M count=2000 conv=fdatasync
dd if=/benchmarking/128k/ddfile of=/dev/null bs=4M count=2000 #conv=fdatasync
rm /benchmarking/128k/ddfile
echo -e "------------------\n\n"

echo -e "bs=4k recordsize=128k\n"
dd if=/dev/zero of=/benchmarking/128k/ddfile bs=4k count=2000000 conv=fdatasync
dd if=/benchmarking/128k/ddfile of=/dev/null bs=4k count=2000000 #conv=fdatasync
rm /benchmarking/128k/ddfile
echo -e "------------------\n\n"

echo -e "bs=4M recordsize=8k\n"
dd if=/dev/zero of=/benchmarking/8k/ddfile bs=4M count=2000 conv=fdatasync
dd if=/benchmarking/8k/ddfile of=/dev/null bs=4M count=2000 #conv=fdatasync
rm /benchmarking/8k/ddfile
echo -e "------------------\n\n"

echo -e "bs=4k recordsize=8k\n"
dd if=/dev/zero of=/benchmarking/8k/ddfile bs=4k count=2000000 conv=fdatasync
dd if=/benchmarking/8k/ddfile of=/dev/null bs=4k count=2000000 #conv=fdatasync
rm /benchmarking/8k/ddfile
echo -e "------------------\n\n"

我得到以下信息:

root@pbs:/benchmarking# ./dd_bench.sh

bs=4M recordsize=1M



2000+0 records in

2000+0 records out

8388608000 bytes (8.4 GB, 7.8 GiB) copied, 43.3219 s, 194 MB/s

2000+0 records in

2000+0 records out

8388608000 bytes (8.4 GB, 7.8 GiB) copied, 43.7647 s, 192 MB/s

------------------





bs=4k recordsize=1M



2000000+0 records in

2000000+0 records out

8192000000 bytes (8.2 GB, 7.6 GiB) copied, 38.7432 s, 211 MB/s

2000000+0 records in

2000000+0 records out

8192000000 bytes (8.2 GB, 7.6 GiB) copied, 5100.27 s, 1.6 MB/s

------------------





bs=4M recordsize=128k



2000+0 records in

2000+0 records out

8388608000 bytes (8.4 GB, 7.8 GiB) copied, 60.1265 s, 140 MB/s

2000+0 records in

2000+0 records out

8388608000 bytes (8.4 GB, 7.8 GiB) copied, 56.4249 s, 149 MB/s

------------------





bs=4k recordsize=128k



2000000+0 records in

2000000+0 records out

8192000000 bytes (8.2 GB, 7.6 GiB) copied, 52.044 s, 157 MB/s

2000000+0 records in

2000000+0 records out

8192000000 bytes (8.2 GB, 7.6 GiB) copied, 1242.29 s, 6.6 MB/s

------------------





bs=4M recordsize=8k



2000+0 records in

2000+0 records out

8388608000 bytes (8.4 GB, 7.8 GiB) copied, 111.594 s, 75.2 MB/s

2000+0 records in

2000+0 records out

8388608000 bytes (8.4 GB, 7.8 GiB) copied, 60.547 s, 139 MB/s

------------------





bs=4k recordsize=8k



2000000+0 records in

2000000+0 records out

8192000000 bytes (8.2 GB, 7.6 GiB) copied, 96.3637 s, 85.0 MB/s

2000000+0 records in

2000000+0 records out

8192000000 bytes (8.2 GB, 7.6 GiB) copied, 771.967 s, 10.6 MB/s

当块大小较小(4kb)时,读取速度非常有限(介于 1-10 MB/S 之间)。写入速度则不同。

然后我对所有三个数据集运行了 bonnie++:

root@pbs:~# bonnie++ -d /benchmarking/1M/ -u root -n 160
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  2.00       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
pbs          63624M  527k  93  136m   6 56.1m   4    0k   3 2902k   3 168.4  21
Latency             12952us   27977us    3500ms   21656ms     599ms     990ms
Version  2.00       ------Sequential Create------ --------Random Create--------
pbs                 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                160 163840  13 +++++ +++ 163840   6 163840  15 +++++ +++ 163840   6
Latency               277ms    2447us     353ms     287ms      27us     377ms
1.98,2.00,pbs,1,1665552058,63624M,,8192,5,527,93,139633,6,57398,4,0,3,2902,3,168.4,21,160,,,,,9606,13,+++++,+++,1264,6,9808,15,+++++,+++,1147,6,12952us,27977us,3500ms,21656ms,599ms,990ms,277ms,2447us,353ms,287ms,27us,377ms




root@pbs:~# bonnie++ -d /benchmarking/128k/ -u root -n 160

Using uid:0, gid:0.

Writing a byte at a time...done

Writing intelligently...done

Rewriting...done

Reading a byte at a time...done

Reading intelligently...done

start 'em...done...done...done...done...done...

Create files in sequential order...done.

Stat files in sequential order...done.

Delete files in sequential order...done.

Create files in random order...done.

Stat files in random order...done.

Delete files in random order...done.

Version  2.00       ------Sequential Output------ --Sequential Input- --Random-

                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--

Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP

pbs          63624M  525k  93  126m   6 44.1m   6    1k   7 10.3m   7 311.3  41

Latency             13067us   17678us    2688ms    6693ms     206ms     390ms

Version  2.00       ------Sequential Create------ --------Random Create--------

pbs                 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--

              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP

                160 163840  13 +++++ +++ 163840   6 163840  14 +++++ +++ 163840   6

Latency               284ms    2643us     328ms     266ms      21us     356ms

1.98,2.00,pbs,1,1665335428,63624M,,8192,5,525,93,128601,6,45110,6,1,7,10548,7,311.3,41,160,,,,,8118,13,+++++,+++,1248,6,9634,14,+++++,+++,1173,6,13067us,17678us,2688ms,6693ms,206ms,390ms,284ms,2643us,328ms,266ms,21us,356ms




root@pbs:~# bonnie++ -d /benchmarking/8k/ -u root -n 160

Using uid:0, gid:0.

Writing a byte at a time...done

Writing intelligently...done

Rewriting...done

Reading a byte at a time...done

Reading intelligently...done

start 'em...done...done...done...done...done...

Create files in sequential order...done.

Stat files in sequential order...done.

Delete files in sequential order...done.

Create files in random order...done.

Stat files in random order...done.

Delete files in random order...done.

Version  2.00       ------Sequential Output------ --Sequential Input- --Random-

                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--

Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP

pbs          63624M  528k  97 80.2m   6 54.4m   8    1k   4 15.1m   5 264.7  37

Latency             14231us     982us    1535ms    5087ms     342ms     284ms

Version  2.00       ------Sequential Create------ --------Random Create--------

pbs                 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--

              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP

                160 163840  13 +++++ +++ 163840   6 163840  14 +++++ +++ 163840   6

Latency               334ms     100us     325ms     311ms      27us     353ms

1.98,2.00,pbs,1,1668749456,63624M,,8192,5,528,97,82088,6,55756,8,1,4,15510,5,264.7,37,160,,,,,9254,13,+++++,+++,1276,6,9582,14,+++++,+++,1066,6,14231us,982us,1535ms,5087ms,342ms,284ms,334ms,100us,325ms,311ms,27us,353ms

而且它的读取率也非常低,如dd。(3、10和15MB/S)

作为最后一步,我运行了另一个 dd bench,这次将 dd bs 与 zfs recordsize 对齐:

root@pbs:~# /benchmarking/dd_bench_2.sh

bs=1M recordsize=1M



8000+0 records in

8000+0 records out

8388608000 bytes (8.4 GB, 7.8 GiB) copied, 62.6119 s, 134 MB/s

8000+0 records in

8000+0 records out

8388608000 bytes (8.4 GB, 7.8 GiB) copied, 65.2772 s, 129 MB/s

------------------





bs=128k recordsize=128k



64000+0 records in

64000+0 records out

8388608000 bytes (8.4 GB, 7.8 GiB) copied, 64.6437 s, 130 MB/s

64000+0 records in

64000+0 records out

8388608000 bytes (8.4 GB, 7.8 GiB) copied, 49.128 s, 171 MB/s

------------------





bs=8k recordsize=8k



1000000+0 records in

1000000+0 records out

8192000000 bytes (8.2 GB, 7.6 GiB) copied, 108.331 s, 75.6 MB/s

1000000+0 records in

1000000+0 records out

8192000000 bytes (8.2 GB, 7.6 GiB) copied, 344.981 s, 23.7 MB/s

------------------

现在这是一个重要的改进,但我仍然期望 8k 的读取速度更快。

然后我将 atime 设置为 off 并重复了最后的测试,但没有什么太大的变化。(1M 数据集已经一直是 atime=off,对此感到抱歉)。

root@pbs:~# /benchmarking/dd_bench_2.sh

bs=1M recordsize=1M



8000+0 records in

8000+0 records out

8388608000 bytes (8.4 GB, 7.8 GiB) copied, 44.505 s, 188 MB/s

8000+0 records in

8000+0 records out

8388608000 bytes (8.4 GB, 7.8 GiB) copied, 40.3689 s, 208 MB/s

------------------





bs=128k recordsize=128k



64000+0 records in

64000+0 records out

8388608000 bytes (8.4 GB, 7.8 GiB) copied, 67.7169 s, 124 MB/s

64000+0 records in

64000+0 records out

8388608000 bytes (8.4 GB, 7.8 GiB) copied, 56.0657 s, 150 MB/s

------------------





bs=8k recordsize=8k



1000000+0 records in

1000000+0 records out

8192000000 bytes (8.2 GB, 7.6 GiB) copied, 103.724 s, 79.0 MB/s

1000000+0 records in

1000000+0 records out

8192000000 bytes (8.2 GB, 7.6 GiB) copied, 343.753 s, 23.8 MB/s

因此,尝试总结一下:

  • 为什么我对 bonnie++ 和 small bs dd 的读取速度这么慢?
  • 读取速度几乎总是等于或低于写入速度。三向镜像中怎么会这样?系统可以同时从三个设备读取数据,但必须写入 3 倍的数据。


作为附加信息,服务器运行在企业级磁盘上,但使用消费级(不是低端,而是消费级)主板,磁盘连接到主板 SATA 控制器。我知道,它们是低端 SATA 控制器,但有时读取速度很低而写入速度却一直很好,这仍然很奇怪。

另外,我已经检查过驱动器不是 SMR,并且在具有类似硬件/设置的类似服务器中重复了此处所做的测试,获得了类似的结果。

最后,我附上了来自其中一个基准数据集的 zfs get all:

root@pbs:~# zfs get all rpool/benchmarking/128k
NAME                     PROPERTY              VALUE                  SOURCE
rpool/benchmarking/128k  type                  filesystem             -
rpool/benchmarking/128k  creation              Wed Oct 26  8:45 2022  -
rpool/benchmarking/128k  used                  96K                    -
rpool/benchmarking/128k  available             12.5T                  -
rpool/benchmarking/128k  referenced            96K                    -
rpool/benchmarking/128k  compressratio         1.00x                  -
rpool/benchmarking/128k  mounted               yes                    -
rpool/benchmarking/128k  quota                 none                   default
rpool/benchmarking/128k  reservation           none                   default
rpool/benchmarking/128k  recordsize            128K                   default
rpool/benchmarking/128k  mountpoint            /benchmarking/128k     inherited from rpool/benchmarking
rpool/benchmarking/128k  sharenfs              off                    default
rpool/benchmarking/128k  checksum              on                     default
rpool/benchmarking/128k  compression           off                    inherited from rpool/benchmarking
rpool/benchmarking/128k  atime                 off                    local
rpool/benchmarking/128k  devices               on                     default
rpool/benchmarking/128k  exec                  on                     default
rpool/benchmarking/128k  setuid                on                     default
rpool/benchmarking/128k  readonly              off                    default
rpool/benchmarking/128k  zoned                 off                    default
rpool/benchmarking/128k  snapdir               hidden                 default
rpool/benchmarking/128k  aclmode               discard                default
rpool/benchmarking/128k  aclinherit            restricted             default
rpool/benchmarking/128k  createtxg             255400                 -
rpool/benchmarking/128k  canmount              on                     default
rpool/benchmarking/128k  xattr                 on                     default
rpool/benchmarking/128k  copies                1                      default
rpool/benchmarking/128k  version               5                      -
rpool/benchmarking/128k  utf8only              off                    -
rpool/benchmarking/128k  normalization         none                   -
rpool/benchmarking/128k  casesensitivity       sensitive              -
rpool/benchmarking/128k  vscan                 off                    default
rpool/benchmarking/128k  nbmand                off                    default
rpool/benchmarking/128k  sharesmb              off                    default
rpool/benchmarking/128k  refquota              none                   default
rpool/benchmarking/128k  refreservation        none                   default
rpool/benchmarking/128k  guid                  13557460337392366562   -
rpool/benchmarking/128k  primarycache          none                   inherited from rpool/benchmarking
rpool/benchmarking/128k  secondarycache        none                   inherited from rpool/benchmarking
rpool/benchmarking/128k  usedbysnapshots       0B                     -
rpool/benchmarking/128k  usedbydataset         96K                    -
rpool/benchmarking/128k  usedbychildren        0B                     -
rpool/benchmarking/128k  usedbyrefreservation  0B                     -
rpool/benchmarking/128k  logbias               latency                default
rpool/benchmarking/128k  objsetid              60174                  -
rpool/benchmarking/128k  dedup                 off                    default
rpool/benchmarking/128k  mlslabel              none                   default
rpool/benchmarking/128k  sync                  standard               inherited from rpool
rpool/benchmarking/128k  dnodesize             legacy                 default
rpool/benchmarking/128k  refcompressratio      1.00x                  -
rpool/benchmarking/128k  written               96K                    -
rpool/benchmarking/128k  logicalused           42K                    -
rpool/benchmarking/128k  logicalreferenced     42K                    -
rpool/benchmarking/128k  volmode               default                default
rpool/benchmarking/128k  filesystem_limit      none                   default
rpool/benchmarking/128k  snapshot_limit        none                   default
rpool/benchmarking/128k  filesystem_count      none                   default
rpool/benchmarking/128k  snapshot_count        none                   default
rpool/benchmarking/128k  snapdev               hidden                 default
rpool/benchmarking/128k  acltype               off                    default
rpool/benchmarking/128k  context               none                   default
rpool/benchmarking/128k  fscontext             none                   default
rpool/benchmarking/128k  defcontext            none                   default
rpool/benchmarking/128k  rootcontext           none                   default
rpool/benchmarking/128k  relatime              on                     inherited from rpool
rpool/benchmarking/128k  redundant_metadata    all                    default
rpool/benchmarking/128k  overlay               on                     default
rpool/benchmarking/128k  encryption            off                    default
rpool/benchmarking/128k  keylocation           none                   default
rpool/benchmarking/128k  keyformat             none                   default
rpool/benchmarking/128k  pbkdf2iters           0                      default
rpool/benchmarking/128k  special_small_blocks  0                      default

谢谢你的时间!

编辑:ashift 正确设置为 12,重复数据删除已关闭,碎片化为 0%。

相关内容