在 20.04 服务器下的新机器上测试 Postgresql 性能时,我注意到在 LXD 容器内运行 postgresql 时结果非常糟糕。LXD 在 ZFS 池上运行。
我最终回去测试磁盘bonnie++
,以下是我发现的结果:虚拟文件系统引入了巨大的延迟。
由于服务器有 128GB RAM,我使用以下选项运行 bonnie++:bonnie++ -d t3 -c 8 -m test-name -s 270000 -r 130688
在同一个 SSD(三星 840 pro,512GB)上,延迟顺序输出/重写为:
- 在 ext4 下:82025us
- 在zfs下:6601毫秒(!)
我检查了ashift
,一切正常 (13)。Zfs 压缩未处于活动状态。
我进行了几次测试,例如带有 3 个 SSD 的 RAIDZ-2 zpool 显示连续块输出延迟为 79328ms(!!!)。
有一点可能很重要:CPU 是 8 核Power9中央处理器。
所以,我不是 ZFS 专家,而是因为 LXD 而考虑它。所以,我的问题是这样:这种延迟从何而来?
编辑 2:在 ext4 和 zfs 下测试每个磁盘,然后在 3 个磁盘上进行 lvm 剥离与 zfs 剥离后,我使用了 pgbench (Postgresql bench)。结果类似:zfs (3 SSD) 下的相同 pgbench 为我提供了 1363 TPS,而在 ext4 (1 SSD) 下它将为我提供 2023 TPS。
编辑:这是重做的整个测试序列,包括blkdiscard
评论中的建议。ZFS 顺序删除的延迟仍为 5.575 秒...
franck@blackbird:~$ sudo blkdiscard /dev/sdb
franck@blackbird:~$ sudo zpool create -f piscine -o ashift=13 /dev/sdb
franck@blackbird:~$ sudo zfs create piscine/data
franck@blackbird:~$ sudo zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
piscine 476G 1.31M 476G - - 0% 0% 1.00x ONLINE -
franck@blackbird:~$ sudo zpool get all piscine
NAME PROPERTY VALUE SOURCE
piscine size 476G -
piscine capacity 0% -
piscine altroot - default
piscine health ONLINE -
piscine guid 8049629417386861725 -
piscine version - default
piscine bootfs - default
piscine delegation on default
piscine autoreplace off default
piscine cachefile - default
piscine failmode wait default
piscine listsnapshots off default
piscine autoexpand off default
piscine dedupditto 0 default
piscine dedupratio 1.00x -
piscine free 476G -
piscine allocated 1.31M -
piscine readonly off -
piscine ashift 13 local
piscine comment - default
piscine expandsize - -
piscine freeing 0 -
piscine fragmentation 0% -
piscine leaked 0 -
piscine multihost off default
piscine checkpoint - -
piscine load_guid 14905955500159639897 -
piscine autotrim off default
piscine feature@async_destroy enabled local
piscine feature@empty_bpobj active local
piscine feature@lz4_compress active local
piscine feature@multi_vdev_crash_dump enabled local
piscine feature@spacemap_histogram active local
piscine feature@enabled_txg active local
piscine feature@hole_birth active local
piscine feature@extensible_dataset active local
piscine feature@embedded_data active local
piscine feature@bookmarks enabled local
piscine feature@filesystem_limits enabled local
piscine feature@large_blocks enabled local
piscine feature@large_dnode enabled local
piscine feature@sha512 enabled local
piscine feature@skein enabled local
piscine feature@edonr enabled local
piscine feature@userobj_accounting active local
piscine feature@encryption enabled local
piscine feature@project_quota active local
piscine feature@device_removal enabled local
piscine feature@obsolete_counts enabled local
piscine feature@zpool_checkpoint enabled local
piscine feature@spacemap_v2 active local
piscine feature@allocation_classes enabled local
piscine feature@resilver_defer enabled local
piscine feature@bookmark_v2 enabled local
franck@blackbird:~$ sudo zfs get all piscine/data
NAME PROPERTY VALUE SOURCE
piscine/data type filesystem -
piscine/data creation Fri May 15 15:29 2020 -
piscine/data used 192K -
piscine/data available 461G -
piscine/data referenced 192K -
piscine/data compressratio 1.00x -
piscine/data mounted yes -
piscine/data quota none default
piscine/data reservation none default
piscine/data recordsize 128K default
piscine/data mountpoint /home/franck/data local
piscine/data sharenfs off default
piscine/data checksum on default
piscine/data compression off default
piscine/data atime on default
piscine/data devices on default
piscine/data exec on default
piscine/data setuid on default
piscine/data readonly off default
piscine/data zoned off default
piscine/data snapdir hidden default
piscine/data aclinherit restricted default
piscine/data createtxg 9 -
piscine/data canmount on default
piscine/data xattr on default
piscine/data copies 1 default
piscine/data version 5 -
piscine/data utf8only off -
piscine/data normalization none -
piscine/data casesensitivity sensitive -
piscine/data vscan off default
piscine/data nbmand off default
piscine/data sharesmb off default
piscine/data refquota none default
piscine/data refreservation none default
piscine/data guid 12809002562676768507 -
piscine/data primarycache all default
piscine/data secondarycache all default
piscine/data usedbysnapshots 0B -
piscine/data usedbydataset 192K -
piscine/data usedbychildren 0B -
piscine/data usedbyrefreservation 0B -
piscine/data logbias latency default
piscine/data objsetid 269 -
piscine/data dedup off default
piscine/data mlslabel none default
piscine/data sync standard default
piscine/data dnodesize legacy default
piscine/data refcompressratio 1.00x -
piscine/data written 192K -
piscine/data logicalused 78K -
piscine/data logicalreferenced 78K -
piscine/data volmode default default
piscine/data filesystem_limit none default
piscine/data snapshot_limit none default
piscine/data filesystem_count none default
piscine/data snapshot_count none default
piscine/data snapdev hidden default
piscine/data acltype off default
piscine/data context none default
piscine/data fscontext none default
piscine/data defcontext none default
piscine/data rootcontext none default
piscine/data relatime off default
piscine/data redundant_metadata all default
piscine/data overlay off default
piscine/data encryption off default
piscine/data keylocation none default
piscine/data keyformat none default
piscine/data pbkdf2iters 0 default
piscine/data special_small_blocks 0 default
franck@blackbird:~$ sudo zdb
piscine:
version: 5000
name: 'piscine'
state: 0
txg: 4
pool_guid: 8049629417386861725
errata: 0
hostid: 2946950021
hostname: 'blackbird'
com.delphix:has_per_vdev_zaps
vdev_children: 1
vdev_tree:
type: 'root'
id: 0
guid: 8049629417386861725
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 2148537938584442770
path: '/dev/sdb1'
devid: 'ata-Samsung_SSD_840_PRO_Series_S1AXNSADA16578B-part1'
phys_path: 'pci-0002:01:00.0-ata-2'
whole_disk: 1
metaslab_array: 256
metaslab_shift: 32
ashift: 13
asize: 512095682560
is_log: 0
create_txg: 4
com.delphix:vdev_zap_leaf: 129
com.delphix:vdev_zap_top: 130
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
franck@blackbird:~$ sudo chown franck:franck data
franck@blackbird:~$ bonnie++ -d data -c 8 -m zfs-sdb -s 270000 -r 130688
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.98 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
zfs-sdb 270000M::8 85k 99 475m 89 152m 47 185k 99 499m 53 4070 115
Latency 98772us 3795us 18291us 50503us 16191us 7025us
Version 1.98 ------Sequential Create------ --------Random Create--------
zfs-sdc -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 685584420 99 +++++ +++ 1143148205 14 -1321910944 98 +++++ +++ -382954332 99
Latency 4631us 2108us 5575ms 4596us 140us 320us
1.98,1.98,zfs-sdb,8,1588801810,270000M,,8192,5,85,99,486661,89,155677,47,185,99,510981,53,4070,115,16,,,,,19138,99,+++++,+++,2146,14,19113,98,+++++,+++,14365,99,98772us,3795us,18291us,50503us,16191us,7025us,4631us,2108us,5575ms,4596us,140us,320us
franck@blackbird:~$ sudo zpool destroy piscine
franck@blackbird:~$ sudo blkdiscard /dev/sdb
franck@blackbird:~$ sudo fdisk /dev/sdb
Welcome to fdisk (util-linux 2.34).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0x3a5a314c.
Command (m for help): n
Partition type
p primary (0 primary, 0 extended, 4 free)
e extended (container for logical partitions)
Select (default p):
Using default response p.
Partition number (1-4, default 1):
First sector (2048-1000215215, default 2048):
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-1000215215, default 1000215215):
Created a new partition 1 of type 'Linux' and of size 477 GiB.
Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
franck@blackbird:~$ sudo mkfs.ext4 /dev/sdb1
mke2fs 1.45.5 (07-Jan-2020)
Discarding device blocks: done
Creating filesystem with 125026640 4k blocks and 31260672 inodes
Filesystem UUID: c833b984-e104-4f3b-a68b-801678e33281
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000
Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done
franck@blackbird:~$ sudo mount /dev/sdb1 data
franck@blackbird:~$ sudo chown franck:franck data
franck@blackbird:~$ bonnie++ -d data -c 8 -m ext4-sdb -s 270000 -r 130688
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.98 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
ext4-sdb 270000M::8 331k 99 481m 31 169m 15 1585k 99 465m 12 12131 87
Latency 27081us 71048us 54910us 9400us 23853us 1330us
Version 1.98 ------Sequential Create------ --------Random Create--------
ext4-sdb -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency 927us 566us 720us 122us 4us 44us
1.98,1.98,ext4-sdb,8,1588605414,270000M,,8192,5,331,99,492198,31,173283,15,1585,99,476038,12,12131,87,16,,,,,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,27081us,71048us,54910us,9400us,23853us,1330us,927us,566us,720us,122us,4us,44us