大型 zfs 池中的顺序读取速度很慢

大型 zfs 池中的顺序读取速度很慢

我们正在运行一个 zfs 池作为科学数据的临时存储,其中 4 个 vdev 中有 24 个 10TB 磁盘,每个磁盘由 raidz2 配置中的 6 个磁盘组成(记录大小为 128K)。

~ # zpool status
  pool: tank
 state: ONLINE
  scan: scrub canceled on Mon Jun  3 11:14:39 2019
config:

        NAME                        STATE     READ WRITE CKSUM
        tank                        ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
            wwn-0x5000cca25160d3c8  ONLINE       0     0     0
            wwn-0x5000cca25165cf30  ONLINE       0     0     0
            wwn-0x5000cca2516711a4  ONLINE       0     0     0
            wwn-0x5000cca251673b88  ONLINE       0     0     0
            wwn-0x5000cca251673b94  ONLINE       0     0     0
            wwn-0x5000cca251674214  ONLINE       0     0     0
          raidz2-1                  ONLINE       0     0     0
            wwn-0x5000cca251683628  ONLINE       0     0     0
            wwn-0x5000cca25168771c  ONLINE       0     0     0
            wwn-0x5000cca25168f234  ONLINE       0     0     0
            wwn-0x5000cca251692890  ONLINE       0     0     0
            wwn-0x5000cca251695484  ONLINE       0     0     0
            wwn-0x5000cca2516969b0  ONLINE       0     0     0
          raidz2-2                  ONLINE       0     0     0
            wwn-0x5000c500a774ba03  ONLINE       0     0     0
            wwn-0x5000c500a7800c3b  ONLINE       0     0     0
            wwn-0x5000c500a7800feb  ONLINE       0     0     0
            wwn-0x5000c500a7802abf  ONLINE       0     0     0
            wwn-0x5000c500a78033cb  ONLINE       0     0     0
            wwn-0x5000c500a78039c7  ONLINE       0     0     0
          raidz2-3                  ONLINE       0     0     0
            wwn-0x5000c500a780416b  ONLINE       0     0     0
            wwn-0x5000c500a7804733  ONLINE       0     0     0
            wwn-0x5000c500a7804797  ONLINE       0     0     0
            wwn-0x5000c500a7805df3  ONLINE       0     0     0
            wwn-0x5000c500a7806a0b  ONLINE       0     0     0
            wwn-0x5000c500a7807ccf  ONLINE       0     0     0

errors: No known data errors

几个月前我们设置这个时,性能看起来还不错,速率在 500MB 到 1GB/s 之间。与此同时,我们注意到了一些性能问题,但认为这与其他可能的瓶颈有关。现在我们想将数据移动到最终存储,发现我们只能绕过60MB/s 连续(文件大小 > 100GB)来自池。

  • 数据已写入此配置,并应自动分发到 vdev
  • ashift 设置为 12
  • 我们已经排除了目标速度慢的可能性。我们可以更快地写入随机数据,并且在写入本地 SSD tmp 时,我们得到的速度同样慢
  • 我们已经检查了各个驱动器的速度,它们看起来很好(见下文)。
  • Scrub 每两周运行一次(运行时间本身超过 1 周),并且因这些测试而中断。
  • 池已填充至约 80% (USED 113T, AVAIL 23.7T)
  • top 没有显示任何可疑内容,24 个核心都没有达到最大负荷,系统大部分时间处于空闲状态
  • zfs iostat 10与测量的速率rsync --progress一致time cp
  • RAM 未用完(见下文)
  • 数据已备份到磁带上,但我们希望避免因“实验”而导致停机

测试磁盘性能

~ # echo 1 > /proc/sys/vm/drop_caches
    for i in $(zpool status | grep wwn- | awk '{print $1}'); do                                                                   
      echo $i;dd if=/dev/disk/by-id/$i of=/dev/null status=progress bs=1G count=1 seek=1G; echo; echo; sleep 1      
    done
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.81339 s, 223 MB/s
... (similar rates for all 24 disks)

内存:

~ # free -h
              total        used        free      shared  buff/cache   available
Mem:            62G        2.1G         35G         18M         25G         59G

我们从中复制的文件系统的所有属性(存在另一个更大的文件系统)

zfs get all tank/storage/bulk
NAME               PROPERTY               VALUE                              SOURCE
tank/storage/bulk  type                   filesystem                         -
tank/storage/bulk  creation               Fr Mär  1  9:48 2019              -
tank/storage/bulk  used                   3.96T                              -
tank/storage/bulk  available              23.7T                              -
tank/storage/bulk  referenced             3.96T                              -
tank/storage/bulk  compressratio          1.17x                              -
tank/storage/bulk  mounted                yes                                -
tank/storage/bulk  quota                  none                               default
tank/storage/bulk  reservation            none                               default
tank/storage/bulk  recordsize             128K                               default
tank/storage/bulk  mountpoint             /storage/bulk                      inherited from tank/storage
tank/storage/bulk  sharenfs               [email protected],[email protected]  local
tank/storage/bulk  checksum               on                                 default
tank/storage/bulk  compression            on                                 inherited from tank
tank/storage/bulk  atime                  off                                inherited from tank
tank/storage/bulk  devices                on                                 default
tank/storage/bulk  exec                   off                                inherited from tank
tank/storage/bulk  setuid                 off                                inherited from tank
tank/storage/bulk  readonly               off                                default
tank/storage/bulk  zoned                  off                                default
tank/storage/bulk  snapdir                hidden                             default
tank/storage/bulk  aclinherit             restricted                         default
tank/storage/bulk  canmount               on                                 default
tank/storage/bulk  xattr                  sa                                 inherited from tank
tank/storage/bulk  copies                 1                                  default
tank/storage/bulk  version                5                                  -
tank/storage/bulk  utf8only               off                                -
tank/storage/bulk  normalization          none                               -
tank/storage/bulk  casesensitivity        sensitive                          -
tank/storage/bulk  vscan                  off                                default
tank/storage/bulk  nbmand                 off                                default
tank/storage/bulk  sharesmb               off                                inherited from tank
tank/storage/bulk  refquota               none                               default
tank/storage/bulk  refreservation         none                               default
tank/storage/bulk  primarycache           all                                default
tank/storage/bulk  secondarycache         all                                default
tank/storage/bulk  usedbysnapshots        2.40M                              -
tank/storage/bulk  usedbydataset          3.96T                              -
tank/storage/bulk  usedbychildren         0                                  -
tank/storage/bulk  usedbyrefreservation   0                                  -
tank/storage/bulk  logbias                latency                            default
tank/storage/bulk  dedup                  off                                default
tank/storage/bulk  mlslabel               none                               default
tank/storage/bulk  sync                   standard                           default
tank/storage/bulk  refcompressratio       1.17x                              -
tank/storage/bulk  written                0                                  -
tank/storage/bulk  logicalused            4.55T                              -
tank/storage/bulk  logicalreferenced      4.55T                              -
tank/storage/bulk  filesystem_limit       none                               default
tank/storage/bulk  snapshot_limit         none                               default
tank/storage/bulk  filesystem_count       none                               default
tank/storage/bulk  snapshot_count         none                               default
tank/storage/bulk  snapdev                hidden                             default
tank/storage/bulk  acltype                posixacl                           inherited from tank
tank/storage/bulk  context                none                               default
tank/storage/bulk  fscontext              none                               default
tank/storage/bulk  defcontext             none                               default
tank/storage/bulk  rootcontext            none                               default
tank/storage/bulk  relatime               on                                 inherited from tank
tank/storage/bulk  redundant_metadata     all                                default
tank/storage/bulk  overlay                off                                default
tank/storage/bulk  com.sun:auto-snapshot  true                               local

版本

srv01 ~ # apt policy zfsutils-linux               
zfsutils-linux:
  Installed: 0.6.5.6-0ubuntu27
  Candidate: 0.6.5.6-0ubuntu27
  Version table:
 *** 0.6.5.6-0ubuntu27 500
        500 http://de.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     0.6.5.6-0ubuntu8 500
        500 http://de.archive.ubuntu.com/ubuntu xenial/universe amd64 Packages
srv01 ~ # uname -a
Linux srv01 4.15.0-50-generic #54~16.04.1-Ubuntu SMP Wed May 8 15:55:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

磁盘使用情况

~ # iostat -x 3  /dev/disk/by-id/wwn-0x????????????????
Linux 4.15.0-50-generic (bbo3102)       04.06.2019      _x86_64_        (48 CPU)

[...]

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.12    0.00    2.09    0.29    0.00   94.50

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdn               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdr               0.00     0.00  487.00    0.00 10744.00     0.00    44.12     0.56    1.16    1.16    0.00   0.51  24.93
sdt               1.67     0.00  484.33    0.00 12640.00     0.00    52.20     0.52    1.09    1.09    0.00   0.44  21.47
sdu               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdv               0.00     0.00    0.67    0.00     8.00     0.00    24.00     0.00    6.00    6.00    0.00   6.00   0.40
sdw               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdx               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdy               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdo               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdq               0.00     0.00  472.33    0.00 10812.00     0.00    45.78     0.64    1.35    1.35    0.00   0.59  27.73
sdb               0.00     0.00  469.33    0.00 10908.00     0.00    46.48     0.10    0.22    0.22    0.00   0.14   6.53
sdc               0.00     0.00  192.33    0.00  4696.00     0.00    48.83     0.05    0.25    0.25    0.00   0.12   2.27
sdg               0.33     0.00  281.33    0.00  6978.67     0.00    49.61     0.07    0.27    0.27    0.00   0.15   4.13
sdh               0.67     0.00  449.33    0.00 10524.00     0.00    46.84     0.16    0.36    0.36    0.00   0.17   7.73
sdj               0.00     0.00  271.33    0.00  6580.00     0.00    48.50     0.04    0.13    0.13    0.00   0.09   2.53
sdi               0.00     0.00  183.67    0.00  3928.00     0.00    42.77     0.07    0.36    0.36    0.00   0.23   4.27
sde               0.00     0.00  280.00    0.00  5860.00     0.00    41.86     0.10    0.36    0.36    0.00   0.22   6.27
sdf               0.00     0.00  177.33    0.00  4662.67     0.00    52.59     0.07    0.38    0.38    0.00   0.18   3.20
sdk               0.33     0.00  464.33    0.00 10498.67     0.00    45.22     0.05    0.10    0.10    0.00   0.07   3.47
sdp               0.00     0.00    0.67    0.00     8.00     0.00    24.00     0.00    4.00    4.00    0.00   4.00   0.27
sds               1.00     0.00  489.67    0.00 12650.67     0.00    51.67     0.16    0.34    0.34    0.00   0.16   7.87
sdl               0.00     0.00  464.67    0.00 10200.00     0.00    43.90     0.05    0.11    0.11    0.00   0.08   3.73
sdd               0.00     0.00  268.00    0.00  5509.33     0.00    41.11     0.07    0.26    0.26    0.00   0.18   4.93
sda               0.00     0.00  192.00    0.00  3928.00     0.00    40.92     0.03    0.17    0.17    0.00   0.09   1.73

对于进一步测试有什么想法或建议吗?

相关内容