我的虚拟机上有 4 个磁盘可用于测试sdb
、、和。sdc
sdd
sde
前 3 个磁盘用于 RAID5 配置,最后一个磁盘用作 lvm 缓存驱动器。
我不明白的是:
当我创建一个 50GB 的缓存磁盘(块大小为 64KiB)时,xfs_info
出现以下情况:
[vagrant@node-02 ~]$ xfs_info /data
meta-data=/dev/mapper/data-data isize=512 agcount=32, agsize=16777072 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0 spinodes=0
data = bsize=4096 blocks=536866304, imaxpct=5
= sunit=16 swidth=32 blks
naming =version 2 bsize=8192 ascii-ci=0 ftype=1
log =internal bsize=4096 blocks=262144, version=2
= sectsz=512 sunit=16 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
我们在这里看到的 sunit=16 和 swidth=32 似乎是正确的并且与 raid5 布局相匹配。
结果lsblk -t
[vagrant@node-02 ~]$ lsblk -t
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
sda 0 512 0 512 512 1 deadline 128 4096 0B
├─sda1 0 512 0 512 512 1 deadline 128 4096 0B
└─sda2 0 512 0 512 512 1 deadline 128 4096 0B
├─centos-root 0 512 0 512 512 1 128 4096 0B
├─centos-swap 0 512 0 512 512 1 128 4096 0B
└─centos-home 0 512 0 512 512 1 128 4096 0B
sdb 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-data5_corig_rmeta_0 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
│ └─data5-data5 0 65536 131072 512 512 1 128 4096 0B
└─data5-data5_corig_rimage_0 0 512 0 512 512 1 128 4096 32M
└─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
└─data5-data5 0 65536 131072 512 512 1 128 4096 0B
sdc 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-data5_corig_rmeta_1 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
│ └─data5-data5 0 65536 131072 512 512 1 128 4096 0B
└─data5-data5_corig_rimage_1 0 512 0 512 512 1 128 4096 32M
└─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
└─data5-data5 0 65536 131072 512 512 1 128 4096 0B
sdd 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-data5_corig_rmeta_2 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
│ └─data5-data5 0 65536 131072 512 512 1 128 4096 0B
└─data5-data5_corig_rimage_2 0 512 0 512 512 1 128 4096 32M
└─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
└─data5-data5 0 65536 131072 512 512 1 128 4096 0B
sde 0 512 0 512 512 1 deadline 128 4096 32M
sdf 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-cache_data5_cdata 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5 0 65536 131072 512 512 1 128 4096 0B
└─data5-cache_data5_cmeta 0 512 0 512 512 1 128 4096 32M
└─data5-data5 0 65536 131072 512 512 1 128 4096 0B
sdg 0 512 0 512 512 1 deadline 128 4096 32M
sdh 0 512 0 512 512 1 deadline 128 4096 32M
并lvdisplay -a -m data
给我以下信息:
[vagrant@node-02 ~]$ sudo lvdisplay -m -a data
--- Logical volume ---
LV Path /dev/data/data
LV Name data
VG Name data
LV UUID MBG1p8-beQj-TNDd-Cyx4-QkyN-vdVk-dG6n6I
LV Write Access read/write
LV Creation host, time node-02, 2019-09-03 13:22:08 +0000
LV Cache pool name cache_data
LV Cache origin name data_corig
LV Status available
# open 1
LV Size <2.00 TiB
Cache used blocks 0.06%
Cache metadata blocks 0.64%
Cache dirty blocks 0.00%
Cache read hits/misses 293 / 66
Cache wrt hits/misses 59 / 41173
Cache demotions 0
Cache promotions 486
Current LE 524284
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:9
--- Segments ---
Logical extents 0 to 524283:
Type cache
Chunk size 64.00 KiB
Metadata format 2
Mode writethrough
Policy smq
--- Logical volume ---
Internal LV Name cache_data
VG Name data
LV UUID apACl6-DtfZ-TURM-vxjD-UhxF-tthY-uSYRGq
LV Write Access read/write
LV Creation host, time node-02, 2019-09-03 13:22:16 +0000
LV Pool metadata cache_data_cmeta
LV Pool data cache_data_cdata
LV Status NOT available
LV Size 50.00 GiB
Current LE 12800
Segments 1
Allocation inherit
Read ahead sectors auto
--- Segments ---
Logical extents 0 to 12799:
Type cache-pool
Chunk size 64.00 KiB
Metadata format 2
Mode writethrough
Policy smq
--- Logical volume ---
Internal LV Name cache_data_cmeta
VG Name data
LV UUID hmkW6M-CKGO-CTUP-rR4v-KnWn-DbBZ-pJeEA2
LV Write Access read/write
LV Creation host, time node-02, 2019-09-03 13:22:15 +0000
LV Status available
# open 1
LV Size 1.00 GiB
Current LE 256
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:11
--- Segments ---
Logical extents 0 to 255:
Type linear
Physical volume /dev/sdf
Physical extents 0 to 255
--- Logical volume ---
Internal LV Name cache_data_cdata
VG Name data
LV UUID 9mHe8J-SRiY-l1gl-TO1h-2uCC-Hi10-UpeEVP
LV Write Access read/write
LV Creation host, time node-02, 2019-09-03 13:22:16 +0000
LV Status available
# open 1
LV Size 50.00 GiB
Current LE 12800
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:10
--- Segments ---
Logical extents 0 to 12799:
Type linear
Physical volume /dev/sdf
Physical extents 256 to 13055
--- Logical volume ---
Internal LV Name data_corig
VG Name data
LV UUID QP8ppy-nv1v-0sii-tANA-6ZzK-EJkP-sLfrh4
LV Write Access read/write
LV Creation host, time node-02, 2019-09-03 13:22:17 +0000
LV origin of Cache LV data
LV Status available
# open 1
LV Size <2.00 TiB
Current LE 524284
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 768
Block device 253:12
--- Segments ---
Logical extents 0 to 524283:
Type raid5
Monitoring monitored
Raid Data LV 0
Logical volume data_corig_rimage_0
Logical extents 0 to 262141
Raid Data LV 1
Logical volume data_corig_rimage_1
Logical extents 0 to 262141
Raid Data LV 2
Logical volume data_corig_rimage_2
Logical extents 0 to 262141
Raid Metadata LV 0 data_corig_rmeta_0
Raid Metadata LV 1 data_corig_rmeta_1
Raid Metadata LV 2 data_corig_rmeta_2
[vagrant@node-02 ~]$
[vagrant@node-02 ~]$ --- Segments ---
Df7SLj
LV Write Access read/write
LV Creation host, time node-02, 2019-09-03 13:22:08 +0000
LV Status available
# open 1
LV Size 1023.99 GiB
Current LE 262142
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:8
--- Segments ---
Logical extents 0 to 262141:
Type linear
Physical volume /dev/sdd
Physical extents 1 to 262142
--- Logical volume ---
Internal LV Name data_corig_rmeta_2
VG Name data
LV UUID xi9Ot3-aTnp-bA3z-YL0x-eVaB-87EP-JSM3eN
LV Write Access read/write
LV Creation host, time node-02, 2019-09-03 13:22:08 +0000
LV Status available
# open 1
LV Size 4.00 MiB
Current LE 1
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:7
--- Segments ---
Logical extents 0 to 0:
Type linear
Physical volume /dev/sdd
Physical extents 0 to 0
--- Logical volume ---
Internal LV Name data_corig
VG Name data
LV UUID QP8ppy-nv1v-0sii-tANA-6ZzK-EJkP-sLfrh4
LV Write Access read/write
LV Creation host, time node-02, 2019-09-03 13:22:17 +0000
LV origin of Cache LV data
LV Status available
# open 1
LV Size <2.00 TiB
Current LE 524284
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 768
Block device 253:12
--- Segments ---
Logical extents 0 to 524283:
Type raid5
Monitoring monitored
Raid Data LV 0
Logical volume data_corig_rimage_0
Logical extents 0 to 262141
Raid Data LV 1
Logical volume data_corig_rimage_1
Logical extents 0 to 262141
Raid Data LV 2
Logical volume data_corig_rimage_2
Logical extents 0 to 262141
Raid Metadata LV 0 data_corig_rmeta_0
Raid Metadata LV 1 data_corig_rmeta_1
Raid Metadata LV 2 data_corig_rmeta_2
我们可以清楚地看到段中的块大小为 64KiB。
但是当我创建 250GB 的缓存磁盘时,lvm 至少需要 288KiB 的块大小才能容纳该缓存磁盘的大小。但是当我执行时,xfs_info
值sunit/swidth
突然与缓存驱动器的值匹配,而不是 RAID5 布局。
输出xfs_info
[vagrant@node-02 ~]$ xfs_info /data
meta-data=/dev/mapper/data-data isize=512 agcount=32, agsize=16777152 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0 spinodes=0
data = bsize=4096 blocks=536866816, imaxpct=5
= sunit=72 swidth=72 blks
naming =version 2 bsize=8192 ascii-ci=0 ftype=1
log =internal bsize=4096 blocks=262144, version=2
= sectsz=512 sunit=8 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
突然,我们得到了72 的sunit
和swidth
,它与缓存驱动器的 288KiB 块大小相匹配,我们可以通过以下方式看到这一点lvdisplay -m -a
[vagrant@node-02 ~]$ sudo lvdisplay -m -a data
--- Logical volume ---
LV Path /dev/data/data
LV Name data
VG Name data
LV UUID XLHw3w-RkG9-UNh6-WZBM-HtjM-KcV6-6dOdnG
LV Write Access read/write
LV Creation host, time node-2, 2019-09-03 13:36:32 +0000
LV Cache pool name cache_data
LV Cache origin name data_corig
LV Status available
# open 1
LV Size <2.00 TiB
Cache used blocks 0.17%
Cache metadata blocks 0.71%
Cache dirty blocks 0.00%
Cache read hits/misses 202 / 59
Cache wrt hits/misses 8939 / 34110
Cache demotions 0
Cache promotions 1526
Current LE 524284
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:9
--- Segments ---
Logical extents 0 to 524283:
Type cache
Chunk size 288.00 KiB
Metadata format 2
Mode writethrough
Policy smq
--- Logical volume ---
Internal LV Name cache_data
VG Name data
LV UUID Ps7Z1P-y5Ae-ju80-SZjc-yB6S-YBtx-SWL9vO
LV Write Access read/write
LV Creation host, time node-2, 2019-09-03 13:36:40 +0000
LV Pool metadata cache_data_cmeta
LV Pool data cache_data_cdata
LV Status NOT available
LV Size 250.00 GiB
Current LE 64000
Segments 1
Allocation inherit
Read ahead sectors auto
--- Segments ---
Logical extents 0 to 63999:
Type cache-pool
Chunk size 288.00 KiB
Metadata format 2
Mode writethrough
Policy smq
--- Logical volume ---
Internal LV Name cache_data_cmeta
VG Name data
LV UUID k4rVn9-lPJm-2Vvt-77jw-NP1K-PTOs-zFy2ph
LV Write Access read/write
LV Creation host, time node-2, 2019-09-03 13:36:39 +0000
LV Status available
# open 1
LV Size 1.00 GiB
Current LE 256
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:11
--- Segments ---
Logical extents 0 to 255:
Type linear
Physical volume /dev/sdf
Physical extents 0 to 255
--- Logical volume ---
Internal LV Name cache_data_cdata
VG Name data
LV UUID dm571W-f9eX-aFMA-SrPC-PYdd-zs45-ypLksd
LV Write Access read/write
LV Creation host, time node-2, 2019-09-03 13:36:39 +0000
LV Status available
# open 1
LV Size 250.00 GiB
Current LE 64000
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 8192
Block device 253:10
--- Logical volume ---
Internal LV Name data_corig
VG Name data
LV UUID hbYiRO-YnV8-gd1B-shQD-N3SR-xpTl-rOjX8V
LV Write Access read/write
LV Creation host, time node-2, 2019-09-03 13:36:41 +0000
LV origin of Cache LV data
LV Status available
# open 1
LV Size <2.00 TiB
Current LE 524284
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 768
Block device 253:12
--- Segments ---
Logical extents 0 to 524283:
Type raid5
Monitoring monitored
Raid Data LV 0
Logical volume data_corig_rimage_0
Logical extents 0 to 262141
Raid Data LV 1
Logical volume data_corig_rimage_1
Logical extents 0 to 262141
Raid Data LV 2
Logical volume data_corig_rimage_2
Logical extents 0 to 262141
Raid Metadata LV 0 data_corig_rmeta_0
Raid Metadata LV 1 data_corig_rmeta_1
Raid Metadata LV 2 data_corig_rmeta_2
输出lsblk -t
[vagrant@node-02 ~]$ lsblk -t
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
sda 0 512 0 512 512 1 deadline 128 4096 0B
├─sda1 0 512 0 512 512 1 deadline 128 4096 0B
└─sda2 0 512 0 512 512 1 deadline 128 4096 0B
├─centos-root 0 512 0 512 512 1 128 4096 0B
├─centos-swap 0 512 0 512 512 1 128 4096 0B
└─centos-home 0 512 0 512 512 1 128 4096 0B
sdb 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-data5_corig_rmeta_0 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
│ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B
└─data5-data5_corig_rimage_0 0 512 0 512 512 1 128 4096 32M
└─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
└─data5-data5 0 294912 294912 512 512 1 128 4096 0B
sdc 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-data5_corig_rmeta_1 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
│ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B
└─data5-data5_corig_rimage_1 0 512 0 512 512 1 128 4096 32M
└─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
└─data5-data5 0 294912 294912 512 512 1 128 4096 0B
sdd 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-data5_corig_rmeta_2 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
│ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B
└─data5-data5_corig_rimage_2 0 512 0 512 512 1 128 4096 32M
└─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
└─data5-data5 0 294912 294912 512 512 1 128 4096 0B
sde 0 512 0 512 512 1 deadline 128 4096 32M
sdf 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-cache_data5_cdata 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B
└─data5-cache_data5_cmeta 0 512 0 512 512 1 128 4096 32M
└─data5-data5 0 294912 294912 512 512 1 128 4096 0B
sdg 0 512 0 512 512 1 deadline 128 4096 32M
sdh 0 512 0 512 512 1 deadline 128 4096 32M
这里出现了几个问题。
XFS 显然可以自动检测这些设置,但为什么 XFS 选择使用缓存驱动器的块大小?正如我们在第一个示例中看到的,它能够自动检测 RAID5 布局。
我知道我可以传递su/sw
选项来mkfs.xfs
获取正确的sunit/swidth
值,但在这种情况下我应该这样做吗?
http://xfs.org/index.php/XFS_FAQ#Q:如何计算正确的sunit.2Cswidth_values_for_optimal_performance
我用 Google 搜索了好几天,查看了 XFS 源代码,但没有找到任何线索来了解 XFS 为什么这样做。
因此出现了以下问题:
- XFS 为何会有这样的行为?
- 我应该
su/sw
在运行时手动定义mkfs.xfs
- 缓存驱动器的块大小是否会对 RAID5 设置产生影响?是否应该以某种方式进行对齐?
答案1
最佳分配策略是一个复杂的问题,因为它取决于各个块层之间如何相互作用。
在确定最优分配政策时,mkfs.xfs
利用libblkid
. 您可以访问发布相同的信息lsblk -t
。它是非常可能mkfs.xfs
使用 288K 分配对齐,因为lvs
(嗯,device-mapper
实际上)只是将该值传递给堆栈。
我看到了与精简配置非常相似的行为,其中mkfs.xfs
文件系统根据块大小对齐。
lsblk -t
编辑:所以,这是......的输出。
[vagrant@node-02 ~]$ lsblk -t
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
sda 0 512 0 512 512 1 deadline 128 4096 0B
├─sda1 0 512 0 512 512 1 deadline 128 4096 0B
└─sda2 0 512 0 512 512 1 deadline 128 4096 0B
├─centos-root 0 512 0 512 512 1 128 4096 0B
├─centos-swap 0 512 0 512 512 1 128 4096 0B
└─centos-home 0 512 0 512 512 1 128 4096 0B
sdb 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-data5_corig_rmeta_0 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
│ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B
└─data5-data5_corig_rimage_0 0 512 0 512 512 1 128 4096 32M
└─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
└─data5-data5 0 294912 294912 512 512 1 128 4096 0B
sdc 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-data5_corig_rmeta_1 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
│ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B
└─data5-data5_corig_rimage_1 0 512 0 512 512 1 128 4096 32M
└─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
└─data5-data5 0 294912 294912 512 512 1 128 4096 0B
sdd 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-data5_corig_rmeta_2 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
│ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B
└─data5-data5_corig_rimage_2 0 512 0 512 512 1 128 4096 32M
└─data5-data5_corig 0 65536 131072 512 512 1 128 384 0B
└─data5-data5 0 294912 294912 512 512 1 128 4096 0B
sde 0 512 0 512 512 1 deadline 128 4096 32M
sdf 0 512 0 512 512 1 deadline 128 4096 32M
├─data5-cache_data5_cdata 0 512 0 512 512 1 128 4096 32M
│ └─data5-data5 0 294912 294912 512 512 1 128 4096 0B
└─data5-cache_data5_cmeta 0 512 0 512 512 1 128 4096 32M
└─data5-data5 0 294912 294912 512 512 1 128 4096 0B
sdg 0 512 0 512 512 1 deadline 128 4096 32M
sdh 0 512 0 512 512 1 deadline 128 4096 32M
如您所见,设备data5-data5
(您在其上创建 xfs 文件系统)报告294912MIN-IO
字节OPT-IO
(288K,您的缓存块),而底层设备报告 RAID 阵列块大小(64K)。这意味着device-mapper
用当前缓存块大小覆盖了底层 IO 信息。
mkfs.xfs
只是使用libblkid
报告的内容,而这又取决于所使用的特定缓存设备映射器目标。