Nginx 设置在 CentOs v7.7 上为静态大文件 (100MB-16GB) 提供服务,绑定网络为 2x10Gbps。使用 Linux 上的 ZFS。
- 池大小为 50TB,分布在 8x8TB 磁盘上
- 最大弧尺寸 65GB
- L2ARC 1TB NVMe
- 记录大小=16M
- 移位=12
- nginx:关闭 sendfile
- nginx:aio 开启
- nginx:输出缓冲区 1 128k
系统已运行几天。填充弧占用了过多的 CPU。磁盘繁忙,速度为 600MB/s,但 nginx 吞吐量低于 2Gbps,L2ARC 命中率非常低。有什么想法吗?
这是 zfs_arc_summary 输出和 perf 报告。
ZFS Subsystem Report Wed May 20 12:27:46 2020
ARC Summary: (HEALTHY)
Memory Throttle Count: 0
ARC Misc:
Deleted: 1.84m
Mutex Misses: 157.78k
Evict Skips: 157.78k
ARC Size: 102.54% 66.97 GiB
Target Size: (Adaptive) 100.00% 65.32 GiB
Min Size (Hard Limit): 92.87% 60.66 GiB
Max Size (High Water): 1:1 65.32 GiB
ARC Size Breakdown:
Recently Used Cache Size: 46.89% 31.40 GiB
Frequently Used Cache Size: 53.11% 35.57 GiB
ARC Hash Breakdown:
Elements Max: 159.31k
Elements Current: 97.44% 155.23k
Collisions: 11.76k
Chain Max: 2
Chains: 779
ARC Total accesses: 446.46m
Cache Hit Ratio: 99.29% 443.29m
Cache Miss Ratio: 0.71% 3.17m
Actual Hit Ratio: 99.29% 443.29m
Data Demand Efficiency: 99.28% 402.73m
CACHE HITS BY CACHE LIST:
Most Recently Used: 5.99% 26.57m
Most Frequently Used: 94.01% 416.71m
Most Recently Used Ghost: 0.00% 9.65k
Most Frequently Used Ghost: 0.28% 1.26m
CACHE HITS BY DATA TYPE:
Demand Data: 90.19% 399.81m
Prefetch Data: 0.00% 0
Demand Metadata: 9.81% 43.47m
Prefetch Metadata: 0.00% 1.82k
CACHE MISSES BY DATA TYPE:
Demand Data: 91.77% 2.91m
Prefetch Data: 0.00% 0
Demand Metadata: 7.85% 249.26k
Prefetch Metadata: 0.38% 12.12k
L2 ARC Summary: (HEALTHY)
Low Memory Aborts: 0
Free on Write: 3
R/W Clashes: 0
Bad Checksums: 0
IO Errors: 0
L2 ARC Size: (Adaptive) 458.07 GiB
Compressed: 99.60% 456.23 GiB
Header Size: 0.00% 5.34 MiB
L2 ARC Breakdown: 3.17m
Hit Ratio: 15.02% 476.70k
Miss Ratio: 84.98% 2.70m
Feeds: 55.31k
L2 ARC Writes:
Writes Sent: 100.00% 55.27k
ZFS Tunable:
metaslab_debug_load 0
zfs_multihost_interval 1000
zfs_vdev_default_ms_count 200
zfetch_max_streams 8
zfs_nopwrite_enabled 1
zfetch_min_sec_reap 2
zfs_dbgmsg_enable 1
zfs_dirty_data_max_max_percent 25
zfs_abd_scatter_enabled 1
zfs_remove_max_segment 16777216
zfs_deadman_ziotime_ms 300000
spa_load_verify_data 1
zfs_zevent_cols 80
zfs_obsolete_min_time_ms 500
zfs_dirty_data_max_percent 40
zfs_vdev_mirror_non_rotating_inc 0
zfs_resilver_disable_defer 0
zfs_sync_pass_dont_compress 8
zvol_volmode 1
l2arc_write_max 8388608
zfs_disable_ivset_guid_check 0
zfs_vdev_scrub_max_active 128
zfs_vdev_sync_write_min_active 64
zvol_prefetch_bytes 131072
zfs_send_unmodified_spill_blocks 1
metaslab_aliquot 524288
zfs_no_scrub_prefetch 0
zfs_abd_scatter_max_order 10
zfs_arc_shrink_shift 0
zfs_vdev_queue_depth_pct 1000
zfs_txg_history 100
zfs_vdev_removal_max_active 2
zil_maxblocksize 131072
metaslab_force_ganging 16777217
zfs_delay_scale 500000
zfs_free_bpobj_enabled 1
zfs_vdev_async_write_active_min_dirty_percent 30
metaslab_debug_unload 1
zfs_read_history 0
zfs_vdev_initializing_max_active 1
zvol_max_discard_blocks 16384
zfs_recover 0
zfs_scan_fill_weight 3
spa_load_print_vdev_tree 0
zfs_key_max_salt_uses 400000000
zfs_metaslab_segment_weight_enabled 1
zfs_dmu_offset_next_sync 0
l2arc_headroom 2
zfs_deadman_synctime_ms 600000
zfs_dirty_data_sync_percent 20
zfs_free_min_time_ms 1000
zfs_dirty_data_max 4294967296
zfs_vdev_async_read_min_active 64
dbuf_metadata_cache_max_bytes 314572800
zfs_mg_noalloc_threshold 0
zfs_dedup_prefetch 0
dbuf_cache_lowater_pct 10
zfs_slow_io_events_per_second 20
zfs_vdev_max_active 1000
l2arc_write_boost 8388608
zfs_resilver_min_time_ms 3000
zfs_max_missing_tvds 0
zfs_vdev_async_write_max_active 10
zvol_request_sync 0
zfs_async_block_max_blocks 100000
metaslab_df_max_search 16777216
zfs_prefetch_disable 1
metaslab_lba_weighting_enabled 1
zio_dva_throttle_enabled 1
metaslab_df_use_largest_segment 0
zfs_vdev_trim_max_active 2
zfs_unlink_suspend_progress 0
zfs_sync_taskq_batch_pct 75
zfs_arc_min_prescient_prefetch_ms 0
zfs_scan_max_ext_gap 2097152
zfs_initialize_value 16045690984833335022
zfs_mg_fragmentation_threshold 95
zil_nocacheflush 0
l2arc_feed_again 1
zfs_trim_metaslab_skip 0
zfs_zevent_console 0
zfs_immediate_write_sz 32768
zfs_condense_indirect_commit_entry_delay_ms 0
zfs_dbgmsg_maxsize 4194304
zfs_trim_extent_bytes_max 134217728
zfs_trim_extent_bytes_min 32768
zfs_user_indirect_is_special 1
zfs_lua_max_instrlimit 100000000
zfs_free_leak_on_eio 0
zfs_special_class_metadata_reserve_pct 25
zfs_deadman_enabled 1
dmu_object_alloc_chunk_shift 7
vdev_validate_skip 0
zfs_commit_timeout_pct 5
zfs_arc_meta_limit_percent 75
metaslab_bias_enabled 1
zfs_send_queue_length 16777216
zfs_arc_p_dampener_disable 1
zfs_object_mutex_size 64
zfs_metaslab_fragmentation_threshold 70
zfs_delete_blocks 20480
zfs_arc_dnode_limit_percent 10
zfs_no_scrub_io 0
zfs_dbuf_state_index 0
zio_deadman_log_all 0
zfs_vdev_sync_read_min_active 64
zfs_deadman_checktime_ms 60000
metaslab_fragmentation_factor_enabled 1
zfs_override_estimate_recordsize 0
zfs_multilist_num_sublists 0
zvol_inhibit_dev 0
zfs_scan_legacy 0
zfetch_max_distance 16777216
zap_iterate_prefetch 1
zfs_scan_strict_mem_lim 0
zfs_vdev_async_write_active_max_dirty_percent 60
zfs_scan_checkpoint_intval 7200
dmu_prefetch_max 134217728
zfs_recv_queue_length 16777216
zfs_vdev_mirror_rotating_seek_inc 5
dbuf_cache_shift 5
dbuf_metadata_cache_shift 6
zfs_condense_min_mapping_bytes 131072
zfs_vdev_cache_size 0
spa_config_path /etc/zfs/zpool.cache
zfs_dirty_data_max_max 4294967296
zfs_arc_lotsfree_percent 10
zfs_vdev_ms_count_limit 131072
zfs_zevent_len_max 1024
zfs_checksum_events_per_second 20
zfs_arc_sys_free 0
zfs_scan_issue_strategy 0
zfs_arc_meta_strategy 1
zfs_condense_max_obsolete_bytes 1073741824
zfs_vdev_cache_bshift 16
zfs_compressed_arc_enabled 1
zfs_arc_meta_adjust_restarts 4096
zfs_max_recordsize 16777216
zfs_vdev_scrub_min_active 48
zfs_zil_clean_taskq_maxalloc 1048576
zfs_lua_max_memlimit 104857600
zfs_vdev_raidz_impl cycle [fastest] original scalar sse2 ssse3
zfs_per_txg_dirty_frees_percent 5
zfs_vdev_read_gap_limit 32768
zfs_scan_vdev_limit 4194304
zfs_zil_clean_taskq_minalloc 1024
zfs_multihost_history 0
zfs_scan_mem_lim_fact 20
zfs_arc_meta_limit 0
spa_load_verify_shift 4
zfs_vdev_sync_write_max_active 128
l2arc_norw 0
zfs_arc_meta_prune 10000
zfs_vdev_removal_min_active 1
metaslab_preload_enabled 1
dbuf_cache_max_bytes 629145600
zfs_vdev_mirror_non_rotating_seek_inc 1
zfs_spa_discard_memory_limit 16777216
zfs_vdev_initializing_min_active 1
zvol_major 230
zfs_vdev_aggregation_limit 1048576
zfs_flags 0
zfs_vdev_mirror_rotating_seek_offset 1048576
spa_asize_inflation 24
zfs_admin_snapshot 0
l2arc_feed_secs 1
vdev_removal_max_span 32768
zfs_trim_txg_batch 32
zfs_multihost_fail_intervals 10
zfs_abd_scatter_min_size 1536
zio_taskq_batch_pct 75
zfs_sync_pass_deferred_free 2
zfs_arc_min_prefetch_ms 0
zvol_threads 32
zfs_condense_indirect_vdevs_enable 1
zfs_arc_grow_retry 0
zfs_multihost_import_intervals 20
zfs_read_history_hits 0
zfs_vdev_min_ms_count 16
zfs_zil_clean_taskq_nthr_pct 100
zfs_vdev_async_write_min_active 2
zfs_vdev_async_read_max_active 128
zfs_vdev_aggregate_trim 0
zfs_delay_min_dirty_percent 60
zfs_vdev_cache_max 16384
zfs_removal_suspend_progress 0
zfs_vdev_trim_min_active 1
zfs_scan_mem_lim_soft_fact 20
ignore_hole_birth 1
spa_slop_shift 5
zfs_vdev_write_gap_limit 4096
dbuf_cache_hiwater_pct 10
spa_load_verify_metadata 1
l2arc_noprefetch 1
send_holes_without_birth_time 1
zfs_vdev_mirror_rotating_inc 0
zfs_arc_dnode_reduce_percent 10
zfs_arc_pc_percent 0
zfs_metaslab_switch_threshold 2
zfs_vdev_scheduler deadline
zil_slog_bulk 786432
zfs_expire_snapshot 300
zfs_sync_pass_rewrite 2
zil_replay_disable 0
zfs_nocacheflush 0
zfs_vdev_aggregation_limit_non_rotating 131072
zfs_arc_max 70132659200
zfs_arc_min 65132659200
zfs_read_chunk_size 1048576
zfs_txg_timeout 5
zfs_trim_queue_limit 10
zfs_arc_dnode_limit 0
zfs_scan_ignore_errors 0
zfs_pd_bytes_max 52428800
zfs_scrub_min_time_ms 1000
l2arc_headroom_boost 200
zfs_send_corrupt_data 0
l2arc_feed_min_ms 200
zfs_arc_meta_min 0
zfs_arc_average_blocksize 8192
zfetch_array_rd_sz 1048576
zfs_autoimport_disable 1
zio_slow_io_ms 30000
zfs_arc_p_min_shift 0
zio_requeue_io_start_cut_in_line 1
zfs_removal_ignore_errors 0
zfs_scan_suspend_progress 0
zfs_vdev_sync_read_max_active 128
zfs_deadman_failmode wait
zfs_reconstruct_indirect_combinations_max 4096
zfs_ddt_data_is_special 1
答案1
客户端请求的 I/O 大小是多少?我猜测您的记录大小太大,导致读取放大。如果客户端正在获取较小的块,ZFS 仍必须读取完整的 16MB 块来验证校验和,而 ARC 和 L2ARC 被设计为能够抵抗缓存顺序 I/O,因为使用顺序 I/O 进行缓存的好处较小。必须缓存整个块,因此您只能在 ARC 中缓存约 4000 个块。
将记录大小减少到 1MB,cp + mv 返回一些频繁使用的文件,看看磁盘 I/O 和网络 I/O 是否变得更加相似。