我们正在运行一个 zfs 池作为科学数据的临时存储,其中 4 个 vdev 中有 24 个 10TB 磁盘,每个磁盘由 raidz2 配置中的 6 个磁盘组成(记录大小为 128K)。
~ # zpool status
pool: tank
state: ONLINE
scan: scrub canceled on Mon Jun 3 11:14:39 2019
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
wwn-0x5000cca25160d3c8 ONLINE 0 0 0
wwn-0x5000cca25165cf30 ONLINE 0 0 0
wwn-0x5000cca2516711a4 ONLINE 0 0 0
wwn-0x5000cca251673b88 ONLINE 0 0 0
wwn-0x5000cca251673b94 ONLINE 0 0 0
wwn-0x5000cca251674214 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
wwn-0x5000cca251683628 ONLINE 0 0 0
wwn-0x5000cca25168771c ONLINE 0 0 0
wwn-0x5000cca25168f234 ONLINE 0 0 0
wwn-0x5000cca251692890 ONLINE 0 0 0
wwn-0x5000cca251695484 ONLINE 0 0 0
wwn-0x5000cca2516969b0 ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
wwn-0x5000c500a774ba03 ONLINE 0 0 0
wwn-0x5000c500a7800c3b ONLINE 0 0 0
wwn-0x5000c500a7800feb ONLINE 0 0 0
wwn-0x5000c500a7802abf ONLINE 0 0 0
wwn-0x5000c500a78033cb ONLINE 0 0 0
wwn-0x5000c500a78039c7 ONLINE 0 0 0
raidz2-3 ONLINE 0 0 0
wwn-0x5000c500a780416b ONLINE 0 0 0
wwn-0x5000c500a7804733 ONLINE 0 0 0
wwn-0x5000c500a7804797 ONLINE 0 0 0
wwn-0x5000c500a7805df3 ONLINE 0 0 0
wwn-0x5000c500a7806a0b ONLINE 0 0 0
wwn-0x5000c500a7807ccf ONLINE 0 0 0
errors: No known data errors
几个月前我们设置这个时,性能看起来还不错,速率在 500MB 到 1GB/s 之间。与此同时,我们注意到了一些性能问题,但认为这与其他可能的瓶颈有关。现在我们想将数据移动到最终存储,发现我们只能绕过60MB/s 连续(文件大小 > 100GB)来自池。
- 数据已写入此配置,并应自动分发到 vdev
- ashift 设置为 12
- 我们已经排除了目标速度慢的可能性。我们可以更快地写入随机数据,并且在写入本地 SSD tmp 时,我们得到的速度同样慢
- 我们已经检查了各个驱动器的速度,它们看起来很好(见下文)。
- Scrub 每两周运行一次(运行时间本身超过 1 周),并且因这些测试而中断。
- 池已填充至约 80% (USED 113T, AVAIL 23.7T)
- top 没有显示任何可疑内容,24 个核心都没有达到最大负荷,系统大部分时间处于空闲状态
zfs iostat 10
与测量的速率rsync --progress
一致time cp
- RAM 未用完(见下文)
- 数据已备份到磁带上,但我们希望避免因“实验”而导致停机
测试磁盘性能
~ # echo 1 > /proc/sys/vm/drop_caches
for i in $(zpool status | grep wwn- | awk '{print $1}'); do
echo $i;dd if=/dev/disk/by-id/$i of=/dev/null status=progress bs=1G count=1 seek=1G; echo; echo; sleep 1
done
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.81339 s, 223 MB/s
... (similar rates for all 24 disks)
内存:
~ # free -h
total used free shared buff/cache available
Mem: 62G 2.1G 35G 18M 25G 59G
我们从中复制的文件系统的所有属性(存在另一个更大的文件系统)
zfs get all tank/storage/bulk
NAME PROPERTY VALUE SOURCE
tank/storage/bulk type filesystem -
tank/storage/bulk creation Fr Mär 1 9:48 2019 -
tank/storage/bulk used 3.96T -
tank/storage/bulk available 23.7T -
tank/storage/bulk referenced 3.96T -
tank/storage/bulk compressratio 1.17x -
tank/storage/bulk mounted yes -
tank/storage/bulk quota none default
tank/storage/bulk reservation none default
tank/storage/bulk recordsize 128K default
tank/storage/bulk mountpoint /storage/bulk inherited from tank/storage
tank/storage/bulk sharenfs [email protected],[email protected] local
tank/storage/bulk checksum on default
tank/storage/bulk compression on inherited from tank
tank/storage/bulk atime off inherited from tank
tank/storage/bulk devices on default
tank/storage/bulk exec off inherited from tank
tank/storage/bulk setuid off inherited from tank
tank/storage/bulk readonly off default
tank/storage/bulk zoned off default
tank/storage/bulk snapdir hidden default
tank/storage/bulk aclinherit restricted default
tank/storage/bulk canmount on default
tank/storage/bulk xattr sa inherited from tank
tank/storage/bulk copies 1 default
tank/storage/bulk version 5 -
tank/storage/bulk utf8only off -
tank/storage/bulk normalization none -
tank/storage/bulk casesensitivity sensitive -
tank/storage/bulk vscan off default
tank/storage/bulk nbmand off default
tank/storage/bulk sharesmb off inherited from tank
tank/storage/bulk refquota none default
tank/storage/bulk refreservation none default
tank/storage/bulk primarycache all default
tank/storage/bulk secondarycache all default
tank/storage/bulk usedbysnapshots 2.40M -
tank/storage/bulk usedbydataset 3.96T -
tank/storage/bulk usedbychildren 0 -
tank/storage/bulk usedbyrefreservation 0 -
tank/storage/bulk logbias latency default
tank/storage/bulk dedup off default
tank/storage/bulk mlslabel none default
tank/storage/bulk sync standard default
tank/storage/bulk refcompressratio 1.17x -
tank/storage/bulk written 0 -
tank/storage/bulk logicalused 4.55T -
tank/storage/bulk logicalreferenced 4.55T -
tank/storage/bulk filesystem_limit none default
tank/storage/bulk snapshot_limit none default
tank/storage/bulk filesystem_count none default
tank/storage/bulk snapshot_count none default
tank/storage/bulk snapdev hidden default
tank/storage/bulk acltype posixacl inherited from tank
tank/storage/bulk context none default
tank/storage/bulk fscontext none default
tank/storage/bulk defcontext none default
tank/storage/bulk rootcontext none default
tank/storage/bulk relatime on inherited from tank
tank/storage/bulk redundant_metadata all default
tank/storage/bulk overlay off default
tank/storage/bulk com.sun:auto-snapshot true local
版本
srv01 ~ # apt policy zfsutils-linux
zfsutils-linux:
Installed: 0.6.5.6-0ubuntu27
Candidate: 0.6.5.6-0ubuntu27
Version table:
*** 0.6.5.6-0ubuntu27 500
500 http://de.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
100 /var/lib/dpkg/status
0.6.5.6-0ubuntu8 500
500 http://de.archive.ubuntu.com/ubuntu xenial/universe amd64 Packages
srv01 ~ # uname -a
Linux srv01 4.15.0-50-generic #54~16.04.1-Ubuntu SMP Wed May 8 15:55:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
磁盘使用情况
~ # iostat -x 3 /dev/disk/by-id/wwn-0x????????????????
Linux 4.15.0-50-generic (bbo3102) 04.06.2019 _x86_64_ (48 CPU)
[...]
avg-cpu: %user %nice %system %iowait %steal %idle
3.12 0.00 2.09 0.29 0.00 94.50
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdn 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdr 0.00 0.00 487.00 0.00 10744.00 0.00 44.12 0.56 1.16 1.16 0.00 0.51 24.93
sdt 1.67 0.00 484.33 0.00 12640.00 0.00 52.20 0.52 1.09 1.09 0.00 0.44 21.47
sdu 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdv 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 6.00 6.00 0.00 6.00 0.40
sdw 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdx 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdy 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdq 0.00 0.00 472.33 0.00 10812.00 0.00 45.78 0.64 1.35 1.35 0.00 0.59 27.73
sdb 0.00 0.00 469.33 0.00 10908.00 0.00 46.48 0.10 0.22 0.22 0.00 0.14 6.53
sdc 0.00 0.00 192.33 0.00 4696.00 0.00 48.83 0.05 0.25 0.25 0.00 0.12 2.27
sdg 0.33 0.00 281.33 0.00 6978.67 0.00 49.61 0.07 0.27 0.27 0.00 0.15 4.13
sdh 0.67 0.00 449.33 0.00 10524.00 0.00 46.84 0.16 0.36 0.36 0.00 0.17 7.73
sdj 0.00 0.00 271.33 0.00 6580.00 0.00 48.50 0.04 0.13 0.13 0.00 0.09 2.53
sdi 0.00 0.00 183.67 0.00 3928.00 0.00 42.77 0.07 0.36 0.36 0.00 0.23 4.27
sde 0.00 0.00 280.00 0.00 5860.00 0.00 41.86 0.10 0.36 0.36 0.00 0.22 6.27
sdf 0.00 0.00 177.33 0.00 4662.67 0.00 52.59 0.07 0.38 0.38 0.00 0.18 3.20
sdk 0.33 0.00 464.33 0.00 10498.67 0.00 45.22 0.05 0.10 0.10 0.00 0.07 3.47
sdp 0.00 0.00 0.67 0.00 8.00 0.00 24.00 0.00 4.00 4.00 0.00 4.00 0.27
sds 1.00 0.00 489.67 0.00 12650.67 0.00 51.67 0.16 0.34 0.34 0.00 0.16 7.87
sdl 0.00 0.00 464.67 0.00 10200.00 0.00 43.90 0.05 0.11 0.11 0.00 0.08 3.73
sdd 0.00 0.00 268.00 0.00 5509.33 0.00 41.11 0.07 0.26 0.26 0.00 0.18 4.93
sda 0.00 0.00 192.00 0.00 3928.00 0.00 40.92 0.03 0.17 0.17 0.00 0.09 1.73
对于进一步测试有什么想法或建议吗?