ZFS 相比 ext4 存在巨大延迟

ZFS 相比 ext4 存在巨大延迟

在 20.04 服务器下的新机器上测试 Postgresql 性能时,我注意到在 LXD 容器内运行 postgresql 时结果非常糟糕。LXD 在 ZFS 池上运行。

我最终回去测试磁盘bonnie++,以下是我发现的结果:虚拟文件系统引入了巨大的延迟

由于服务器有 128GB RAM,我使用以下选项运行 bonnie++:bonnie++ -d t3 -c 8 -m test-name -s 270000 -r 130688

在同一个 SSD(三星 840 pro,512GB)上,延迟顺序输出/重写为:

  • 在 ext4 下:82025us
  • 在zfs下:6601毫秒(!)

我检查了ashift,一切正常 (13)。Zfs 压缩未处于活动状态。

我进行了几次测试,例如带有 3 个 SSD 的 RAIDZ-2 zpool 显示连续块输出延迟为 79328ms(!!!)。

有一点可能很重要:CPU 是 8 核Power9中央处理器。

所以,我不是 ZFS 专家,而是因为 LXD 而考虑它。所以,我的问题是这样:这种延迟从何而来?

编辑 2:在 ext4 和 zfs 下测试每个磁盘,然后在 3 个磁盘上进行 lvm 剥离与 zfs 剥离后,我使用了 pgbench (Postgresql bench)。结果类似:zfs (3 SSD) 下的相同 pgbench 为我提供了 1363 TPS,而在 ext4 (1 SSD) 下它将为我提供 2023 TPS。

编辑:这是重做的整个测试序列,包括blkdiscard评论中的建议。ZFS 顺序删除的延迟仍为 5.575 秒...

franck@blackbird:~$ sudo blkdiscard /dev/sdb
franck@blackbird:~$ sudo zpool create -f piscine -o ashift=13 /dev/sdb
franck@blackbird:~$ sudo zfs create piscine/data

franck@blackbird:~$ sudo zpool list
NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
piscine   476G  1.31M   476G        -         -     0%     0%  1.00x    ONLINE  -

franck@blackbird:~$ sudo zpool get all piscine
NAME     PROPERTY                       VALUE                          SOURCE
piscine  size                           476G                           -
piscine  capacity                       0%                             -
piscine  altroot                        -                              default
piscine  health                         ONLINE                         -
piscine  guid                           8049629417386861725            -
piscine  version                        -                              default
piscine  bootfs                         -                              default
piscine  delegation                     on                             default
piscine  autoreplace                    off                            default
piscine  cachefile                      -                              default
piscine  failmode                       wait                           default
piscine  listsnapshots                  off                            default
piscine  autoexpand                     off                            default
piscine  dedupditto                     0                              default
piscine  dedupratio                     1.00x                          -
piscine  free                           476G                           -
piscine  allocated                      1.31M                          -
piscine  readonly                       off                            -
piscine  ashift                         13                             local
piscine  comment                        -                              default
piscine  expandsize                     -                              -
piscine  freeing                        0                              -
piscine  fragmentation                  0%                             -
piscine  leaked                         0                              -
piscine  multihost                      off                            default
piscine  checkpoint                     -                              -
piscine  load_guid                      14905955500159639897           -
piscine  autotrim                       off                            default
piscine  feature@async_destroy          enabled                        local
piscine  feature@empty_bpobj            active                         local
piscine  feature@lz4_compress           active                         local
piscine  feature@multi_vdev_crash_dump  enabled                        local
piscine  feature@spacemap_histogram     active                         local
piscine  feature@enabled_txg            active                         local
piscine  feature@hole_birth             active                         local
piscine  feature@extensible_dataset     active                         local
piscine  feature@embedded_data          active                         local
piscine  feature@bookmarks              enabled                        local
piscine  feature@filesystem_limits      enabled                        local
piscine  feature@large_blocks           enabled                        local
piscine  feature@large_dnode            enabled                        local
piscine  feature@sha512                 enabled                        local
piscine  feature@skein                  enabled                        local
piscine  feature@edonr                  enabled                        local
piscine  feature@userobj_accounting     active                         local
piscine  feature@encryption             enabled                        local
piscine  feature@project_quota          active                         local
piscine  feature@device_removal         enabled                        local
piscine  feature@obsolete_counts        enabled                        local
piscine  feature@zpool_checkpoint       enabled                        local
piscine  feature@spacemap_v2            active                         local
piscine  feature@allocation_classes     enabled                        local
piscine  feature@resilver_defer         enabled                        local
piscine  feature@bookmark_v2            enabled                        local

franck@blackbird:~$ sudo zfs get all piscine/data
NAME          PROPERTY              VALUE                  SOURCE
piscine/data  type                  filesystem             -
piscine/data  creation              Fri May 15 15:29 2020  -
piscine/data  used                  192K                   -
piscine/data  available             461G                   -
piscine/data  referenced            192K                   -
piscine/data  compressratio         1.00x                  -
piscine/data  mounted               yes                    -
piscine/data  quota                 none                   default
piscine/data  reservation           none                   default
piscine/data  recordsize            128K                   default
piscine/data  mountpoint            /home/franck/data      local
piscine/data  sharenfs              off                    default
piscine/data  checksum              on                     default
piscine/data  compression           off                    default
piscine/data  atime                 on                     default
piscine/data  devices               on                     default
piscine/data  exec                  on                     default
piscine/data  setuid                on                     default
piscine/data  readonly              off                    default
piscine/data  zoned                 off                    default
piscine/data  snapdir               hidden                 default
piscine/data  aclinherit            restricted             default
piscine/data  createtxg             9                      -
piscine/data  canmount              on                     default
piscine/data  xattr                 on                     default
piscine/data  copies                1                      default
piscine/data  version               5                      -
piscine/data  utf8only              off                    -
piscine/data  normalization         none                   -
piscine/data  casesensitivity       sensitive              -
piscine/data  vscan                 off                    default
piscine/data  nbmand                off                    default
piscine/data  sharesmb              off                    default
piscine/data  refquota              none                   default
piscine/data  refreservation        none                   default
piscine/data  guid                  12809002562676768507   -
piscine/data  primarycache          all                    default
piscine/data  secondarycache        all                    default
piscine/data  usedbysnapshots       0B                     -
piscine/data  usedbydataset         192K                   -
piscine/data  usedbychildren        0B                     -
piscine/data  usedbyrefreservation  0B                     -
piscine/data  logbias               latency                default
piscine/data  objsetid              269                    -
piscine/data  dedup                 off                    default
piscine/data  mlslabel              none                   default
piscine/data  sync                  standard               default
piscine/data  dnodesize             legacy                 default
piscine/data  refcompressratio      1.00x                  -
piscine/data  written               192K                   -
piscine/data  logicalused           78K                    -
piscine/data  logicalreferenced     78K                    -
piscine/data  volmode               default                default
piscine/data  filesystem_limit      none                   default
piscine/data  snapshot_limit        none                   default
piscine/data  filesystem_count      none                   default
piscine/data  snapshot_count        none                   default
piscine/data  snapdev               hidden                 default
piscine/data  acltype               off                    default
piscine/data  context               none                   default
piscine/data  fscontext             none                   default
piscine/data  defcontext            none                   default
piscine/data  rootcontext           none                   default
piscine/data  relatime              off                    default
piscine/data  redundant_metadata    all                    default
piscine/data  overlay               off                    default
piscine/data  encryption            off                    default
piscine/data  keylocation           none                   default
piscine/data  keyformat             none                   default
piscine/data  pbkdf2iters           0                      default
piscine/data  special_small_blocks  0                      default

franck@blackbird:~$ sudo zdb
piscine:
    version: 5000
    name: 'piscine'
    state: 0
    txg: 4
    pool_guid: 8049629417386861725
    errata: 0
    hostid: 2946950021
    hostname: 'blackbird'
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 8049629417386861725
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 2148537938584442770
            path: '/dev/sdb1'
            devid: 'ata-Samsung_SSD_840_PRO_Series_S1AXNSADA16578B-part1'
            phys_path: 'pci-0002:01:00.0-ata-2'
            whole_disk: 1
            metaslab_array: 256
            metaslab_shift: 32
            ashift: 13
            asize: 512095682560
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_leaf: 129
            com.delphix:vdev_zap_top: 130
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data

franck@blackbird:~$ sudo chown franck:franck data

franck@blackbird:~$ bonnie++ -d data -c 8 -m zfs-sdb -s 270000 -r 130688
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.98       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
zfs-sdb  270000M::8   85k  99  475m  89  152m  47  185k  99  499m  53  4070 115
Latency             98772us    3795us   18291us   50503us   16191us    7025us
Version  1.98       ------Sequential Create------ --------Random Create--------
zfs-sdc             -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 685584420  99 +++++ +++ 1143148205  14 -1321910944  98 +++++ +++ -382954332  99
Latency              4631us    2108us    5575ms    4596us     140us     320us
1.98,1.98,zfs-sdb,8,1588801810,270000M,,8192,5,85,99,486661,89,155677,47,185,99,510981,53,4070,115,16,,,,,19138,99,+++++,+++,2146,14,19113,98,+++++,+++,14365,99,98772us,3795us,18291us,50503us,16191us,7025us,4631us,2108us,5575ms,4596us,140us,320us

franck@blackbird:~$ sudo zpool destroy piscine
franck@blackbird:~$ sudo blkdiscard /dev/sdb
franck@blackbird:~$ sudo fdisk /dev/sdb

Welcome to fdisk (util-linux 2.34).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0x3a5a314c.

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): 

Using default response p.
Partition number (1-4, default 1): 
First sector (2048-1000215215, default 2048): 
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-1000215215, default 1000215215): 

Created a new partition 1 of type 'Linux' and of size 477 GiB.

Command (m for help): w

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

franck@blackbird:~$ sudo mkfs.ext4 /dev/sdb1
mke2fs 1.45.5 (07-Jan-2020)
Discarding device blocks: done                            
Creating filesystem with 125026640 4k blocks and 31260672 inodes
Filesystem UUID: c833b984-e104-4f3b-a68b-801678e33281
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
    102400000

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done     

franck@blackbird:~$ sudo mount /dev/sdb1 data
franck@blackbird:~$ sudo chown franck:franck data
franck@blackbird:~$ bonnie++ -d data -c 8 -m ext4-sdb -s 270000 -r 130688
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.98       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
ext4-sdb 270000M::8  331k  99  481m  31  169m  15 1585k  99  465m  12 12131  87
Latency             27081us   71048us   54910us    9400us   23853us    1330us
Version  1.98       ------Sequential Create------ --------Random Create--------
ext4-sdb            -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency               927us     566us     720us     122us       4us      44us
1.98,1.98,ext4-sdb,8,1588605414,270000M,,8192,5,331,99,492198,31,173283,15,1585,99,476038,12,12131,87,16,,,,,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,27081us,71048us,54910us,9400us,23853us,1330us,927us,566us,720us,122us,4us,44us

相关内容