Linux 上带有 ZFS 的 nginx 具有 16m 记录大小,弧填充过多且 io 较高但吞吐量较低

Linux 上带有 ZFS 的 nginx 具有 16m 记录大小,弧填充过多且 io 较高但吞吐量较低

Nginx 设置在 CentOs v7.7 上为静态大文件 (100MB-16GB) 提供服务,绑定网络为 2x10Gbps。使用 Linux 上的 ZFS。

  • 池大小为 50TB,分布在 8x8TB 磁盘上
  • 最大弧尺寸 65GB
  • L2ARC 1TB NVMe
  • 记录大小=16M
  • 移位=12
  • nginx:关闭 sendfile
  • nginx:aio 开启
  • nginx:输出缓冲区 1 128k

系统已运行几天。填充弧占用了过多的 CPU。磁盘繁忙,速度为 600MB/s,但 nginx 吞吐量低于 2Gbps,L2ARC 命中率非常低。有什么想法吗?

这是 zfs_arc_summary 输出和 perf 报告。

ZFS Subsystem Report                            Wed May 20 12:27:46 2020
ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                1.84m
        Mutex Misses:                           157.78k
        Evict Skips:                            157.78k

ARC Size:                               102.54% 66.97   GiB
        Target Size: (Adaptive)         100.00% 65.32   GiB
        Min Size (Hard Limit):          92.87%  60.66   GiB
        Max Size (High Water):          1:1     65.32   GiB

ARC Size Breakdown:
        Recently Used Cache Size:       46.89%  31.40   GiB
        Frequently Used Cache Size:     53.11%  35.57   GiB

ARC Hash Breakdown:
        Elements Max:                           159.31k
        Elements Current:               97.44%  155.23k
        Collisions:                             11.76k
        Chain Max:                              2
        Chains:                                 779

ARC Total accesses:                                     446.46m
        Cache Hit Ratio:                99.29%  443.29m
        Cache Miss Ratio:               0.71%   3.17m
        Actual Hit Ratio:               99.29%  443.29m

        Data Demand Efficiency:         99.28%  402.73m

        CACHE HITS BY CACHE LIST:
          Most Recently Used:           5.99%   26.57m
          Most Frequently Used:         94.01%  416.71m
          Most Recently Used Ghost:     0.00%   9.65k
          Most Frequently Used Ghost:   0.28%   1.26m

        CACHE HITS BY DATA TYPE:
          Demand Data:                  90.19%  399.81m
          Prefetch Data:                0.00%   0
          Demand Metadata:              9.81%   43.47m
          Prefetch Metadata:            0.00%   1.82k

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  91.77%  2.91m
          Prefetch Data:                0.00%   0
          Demand Metadata:              7.85%   249.26k
          Prefetch Metadata:            0.38%   12.12k

L2 ARC Summary: (HEALTHY)
        Low Memory Aborts:                      0
        Free on Write:                          3
        R/W Clashes:                            0
        Bad Checksums:                          0
        IO Errors:                              0

L2 ARC Size: (Adaptive)                         458.07  GiB
        Compressed:                     99.60%  456.23  GiB
        Header Size:                    0.00%   5.34    MiB

L2 ARC Breakdown:                               3.17m
        Hit Ratio:                      15.02%  476.70k
        Miss Ratio:                     84.98%  2.70m
        Feeds:                                  55.31k

L2 ARC Writes:
        Writes Sent:                    100.00% 55.27k



ZFS Tunable:
        metaslab_debug_load                               0
        zfs_multihost_interval                            1000
        zfs_vdev_default_ms_count                         200
        zfetch_max_streams                                8
        zfs_nopwrite_enabled                              1
        zfetch_min_sec_reap                               2
        zfs_dbgmsg_enable                                 1
        zfs_dirty_data_max_max_percent                    25
        zfs_abd_scatter_enabled                           1
        zfs_remove_max_segment                            16777216
        zfs_deadman_ziotime_ms                            300000
        spa_load_verify_data                              1
        zfs_zevent_cols                                   80
        zfs_obsolete_min_time_ms                          500
        zfs_dirty_data_max_percent                        40
        zfs_vdev_mirror_non_rotating_inc                  0
        zfs_resilver_disable_defer                        0
        zfs_sync_pass_dont_compress                       8
        zvol_volmode                                      1
        l2arc_write_max                                   8388608
        zfs_disable_ivset_guid_check                      0
        zfs_vdev_scrub_max_active                         128
        zfs_vdev_sync_write_min_active                    64
        zvol_prefetch_bytes                               131072
        zfs_send_unmodified_spill_blocks                  1
        metaslab_aliquot                                  524288
        zfs_no_scrub_prefetch                             0
        zfs_abd_scatter_max_order                         10
        zfs_arc_shrink_shift                              0
        zfs_vdev_queue_depth_pct                          1000
        zfs_txg_history                                   100
        zfs_vdev_removal_max_active                       2
        zil_maxblocksize                                  131072
        metaslab_force_ganging                            16777217
        zfs_delay_scale                                   500000
        zfs_free_bpobj_enabled                            1
        zfs_vdev_async_write_active_min_dirty_percent     30
        metaslab_debug_unload                             1
        zfs_read_history                                  0
        zfs_vdev_initializing_max_active                  1
        zvol_max_discard_blocks                           16384
        zfs_recover                                       0
        zfs_scan_fill_weight                              3
        spa_load_print_vdev_tree                          0
        zfs_key_max_salt_uses                             400000000
        zfs_metaslab_segment_weight_enabled               1
        zfs_dmu_offset_next_sync                          0
        l2arc_headroom                                    2
        zfs_deadman_synctime_ms                           600000
        zfs_dirty_data_sync_percent                       20
        zfs_free_min_time_ms                              1000
        zfs_dirty_data_max                                4294967296
        zfs_vdev_async_read_min_active                    64
        dbuf_metadata_cache_max_bytes                     314572800
        zfs_mg_noalloc_threshold                          0
        zfs_dedup_prefetch                                0
        dbuf_cache_lowater_pct                            10
        zfs_slow_io_events_per_second                     20
        zfs_vdev_max_active                               1000
        l2arc_write_boost                                 8388608
        zfs_resilver_min_time_ms                          3000
        zfs_max_missing_tvds                              0
        zfs_vdev_async_write_max_active                   10
        zvol_request_sync                                 0
        zfs_async_block_max_blocks                        100000
        metaslab_df_max_search                            16777216
        zfs_prefetch_disable                              1
        metaslab_lba_weighting_enabled                    1
        zio_dva_throttle_enabled                          1
        metaslab_df_use_largest_segment                   0
        zfs_vdev_trim_max_active                          2
        zfs_unlink_suspend_progress                       0
        zfs_sync_taskq_batch_pct                          75
        zfs_arc_min_prescient_prefetch_ms                 0
        zfs_scan_max_ext_gap                              2097152
        zfs_initialize_value                              16045690984833335022
        zfs_mg_fragmentation_threshold                    95
        zil_nocacheflush                                  0
        l2arc_feed_again                                  1
        zfs_trim_metaslab_skip                            0
        zfs_zevent_console                                0
        zfs_immediate_write_sz                            32768
        zfs_condense_indirect_commit_entry_delay_ms       0
        zfs_dbgmsg_maxsize                                4194304
        zfs_trim_extent_bytes_max                         134217728
        zfs_trim_extent_bytes_min                         32768
        zfs_user_indirect_is_special                      1
        zfs_lua_max_instrlimit                            100000000
        zfs_free_leak_on_eio                              0
        zfs_special_class_metadata_reserve_pct            25
        zfs_deadman_enabled                               1
        dmu_object_alloc_chunk_shift                      7
        vdev_validate_skip                                0
        zfs_commit_timeout_pct                            5
        zfs_arc_meta_limit_percent                        75
        metaslab_bias_enabled                             1
        zfs_send_queue_length                             16777216
        zfs_arc_p_dampener_disable                        1
        zfs_object_mutex_size                             64
        zfs_metaslab_fragmentation_threshold              70
        zfs_delete_blocks                                 20480
        zfs_arc_dnode_limit_percent                       10
        zfs_no_scrub_io                                   0
        zfs_dbuf_state_index                              0
        zio_deadman_log_all                               0
        zfs_vdev_sync_read_min_active                     64
        zfs_deadman_checktime_ms                          60000
        metaslab_fragmentation_factor_enabled             1
        zfs_override_estimate_recordsize                  0
        zfs_multilist_num_sublists                        0
        zvol_inhibit_dev                                  0
        zfs_scan_legacy                                   0
        zfetch_max_distance                               16777216
        zap_iterate_prefetch                              1
        zfs_scan_strict_mem_lim                           0
        zfs_vdev_async_write_active_max_dirty_percent     60
        zfs_scan_checkpoint_intval                        7200
        dmu_prefetch_max                                  134217728
        zfs_recv_queue_length                             16777216
        zfs_vdev_mirror_rotating_seek_inc                 5
        dbuf_cache_shift                                  5
        dbuf_metadata_cache_shift                         6
        zfs_condense_min_mapping_bytes                    131072
        zfs_vdev_cache_size                               0
        spa_config_path                                   /etc/zfs/zpool.cache
        zfs_dirty_data_max_max                            4294967296
        zfs_arc_lotsfree_percent                          10
        zfs_vdev_ms_count_limit                           131072
        zfs_zevent_len_max                                1024
        zfs_checksum_events_per_second                    20
        zfs_arc_sys_free                                  0
        zfs_scan_issue_strategy                           0
        zfs_arc_meta_strategy                             1
        zfs_condense_max_obsolete_bytes                   1073741824
        zfs_vdev_cache_bshift                             16
        zfs_compressed_arc_enabled                        1
        zfs_arc_meta_adjust_restarts                      4096
        zfs_max_recordsize                                16777216
        zfs_vdev_scrub_min_active                         48
        zfs_zil_clean_taskq_maxalloc                      1048576
        zfs_lua_max_memlimit                              104857600
        zfs_vdev_raidz_impl                               cycle [fastest] original scalar sse2 ssse3
        zfs_per_txg_dirty_frees_percent                   5
        zfs_vdev_read_gap_limit                           32768
        zfs_scan_vdev_limit                               4194304
        zfs_zil_clean_taskq_minalloc                      1024
        zfs_multihost_history                             0
        zfs_scan_mem_lim_fact                             20
        zfs_arc_meta_limit                                0
        spa_load_verify_shift                             4
        zfs_vdev_sync_write_max_active                    128
        l2arc_norw                                        0
        zfs_arc_meta_prune                                10000
        zfs_vdev_removal_min_active                       1
        metaslab_preload_enabled                          1
        dbuf_cache_max_bytes                              629145600
        zfs_vdev_mirror_non_rotating_seek_inc             1
        zfs_spa_discard_memory_limit                      16777216
        zfs_vdev_initializing_min_active                  1
        zvol_major                                        230
        zfs_vdev_aggregation_limit                        1048576
        zfs_flags                                         0
        zfs_vdev_mirror_rotating_seek_offset              1048576
        spa_asize_inflation                               24
        zfs_admin_snapshot                                0
        l2arc_feed_secs                                   1
        vdev_removal_max_span                             32768
        zfs_trim_txg_batch                                32
        zfs_multihost_fail_intervals                      10
        zfs_abd_scatter_min_size                          1536
        zio_taskq_batch_pct                               75
        zfs_sync_pass_deferred_free                       2
        zfs_arc_min_prefetch_ms                           0
        zvol_threads                                      32
        zfs_condense_indirect_vdevs_enable                1
        zfs_arc_grow_retry                                0
        zfs_multihost_import_intervals                    20
        zfs_read_history_hits                             0
        zfs_vdev_min_ms_count                             16
        zfs_zil_clean_taskq_nthr_pct                      100
        zfs_vdev_async_write_min_active                   2
        zfs_vdev_async_read_max_active                    128
        zfs_vdev_aggregate_trim                           0
        zfs_delay_min_dirty_percent                       60
        zfs_vdev_cache_max                                16384
        zfs_removal_suspend_progress                      0
        zfs_vdev_trim_min_active                          1
        zfs_scan_mem_lim_soft_fact                        20
        ignore_hole_birth                                 1
        spa_slop_shift                                    5
        zfs_vdev_write_gap_limit                          4096
        dbuf_cache_hiwater_pct                            10
        spa_load_verify_metadata                          1
        l2arc_noprefetch                                  1
        send_holes_without_birth_time                     1
        zfs_vdev_mirror_rotating_inc                      0
        zfs_arc_dnode_reduce_percent                      10
        zfs_arc_pc_percent                                0
        zfs_metaslab_switch_threshold                     2
        zfs_vdev_scheduler                                deadline
        zil_slog_bulk                                     786432
        zfs_expire_snapshot                               300
        zfs_sync_pass_rewrite                             2
        zil_replay_disable                                0
        zfs_nocacheflush                                  0
        zfs_vdev_aggregation_limit_non_rotating           131072
        zfs_arc_max                                       70132659200
        zfs_arc_min                                       65132659200
        zfs_read_chunk_size                               1048576
        zfs_txg_timeout                                   5
        zfs_trim_queue_limit                              10
        zfs_arc_dnode_limit                               0
        zfs_scan_ignore_errors                            0
        zfs_pd_bytes_max                                  52428800
        zfs_scrub_min_time_ms                             1000
        l2arc_headroom_boost                              200
        zfs_send_corrupt_data                             0
        l2arc_feed_min_ms                                 200
        zfs_arc_meta_min                                  0
        zfs_arc_average_blocksize                         8192
        zfetch_array_rd_sz                                1048576
        zfs_autoimport_disable                            1
        zio_slow_io_ms                                    30000
        zfs_arc_p_min_shift                               0
        zio_requeue_io_start_cut_in_line                  1
        zfs_removal_ignore_errors                         0
        zfs_scan_suspend_progress                         0
        zfs_vdev_sync_read_max_active                     128
        zfs_deadman_failmode                              wait
        zfs_reconstruct_indirect_combinations_max         4096
        zfs_ddt_data_is_special                           1

nginx 进程的 perf 报告

答案1

客户端请求的 I/O 大小是多少?我猜测您的记录大小太大,导致读取放大。如果客户端正在获取较小的块,ZFS 仍必须读取完整的 16MB 块来验证校验和,而 ARC 和 L2ARC 被设计为能够抵抗缓存顺序 I/O,因为使用顺序 I/O 进行缓存的好处较小。必须缓存整个块,因此您只能在 ARC 中缓存约 4000 个块。

将记录大小减少到 1MB,cp + mv 返回一些频繁使用的文件,看看磁盘 I/O 和网络 I/O 是否变得更加相似。

相关内容