ceph 存储 MDS 报告元数据 IO 缓慢

ceph 存储 MDS 报告元数据 IO 缓慢

我正在实验室中使用 ceph 存储,并且我有一个服务器,因此我想在单个机器上安装所有服务,例如 MON、OSD、MDS 等。

我使用loopdevice创建了两个磁盘(该服务器有SSD磁盘,所以速度非常好)

root@ceph2# losetup -a
/dev/loop1: [64769]:26869770 (/root/100G-2.img)
/dev/loop0: [64769]:26869769 (/root/100G-1.img)

这就是我的ceph -s输出的样子

root@ceph2# ceph -s
  cluster:
    id:     1106ae5c-e5bf-4316-8185-3e559d246ac5
    health: HEALTH_WARN
            1 MDSs report slow metadata IOs
            Reduced data availability: 65 pgs inactive
            Degraded data redundancy: 65 pgs undersized

  services:
    mon: 1 daemons, quorum ceph2 (age 8m)
    mgr: ceph2(active, since 9m)
    mds: 1/1 daemons up
    osd: 2 osds: 2 up (since 20m), 2 in (since 38m)

  data:
    volumes: 1/1 healthy
    pools:   3 pools, 65 pgs
    objects: 0 objects, 0 B
    usage:   11 MiB used, 198 GiB / 198 GiB avail
    pgs:     100.000% pgs not active
             65 undersized+peered

不知道 MDS 缓慢 IO 错误从何而来,并且 mds stat 停留在创建状态

root@ceph2# ceph mds stat
cephfs:1 {0=ceph2=up:creating}

这就是健康细节的样子

root@ceph2# ceph health detail
HEALTH_WARN 1 MDSs report slow metadata IOs; Reduced data availability: 65 pgs inactive; Degraded data redundancy: 65 pgs undersized
[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
    mds.ceph2(mds.0): 31 slow metadata IOs are blocked > 30 secs, oldest blocked for 864 secs
[WRN] PG_AVAILABILITY: Reduced data availability: 65 pgs inactive
    pg 1.0 is stuck inactive for 22m, current state undersized+peered, last acting [1]
    pg 2.0 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.1 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.2 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.3 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.4 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.5 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.6 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.7 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.8 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.c is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.d is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.e is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.f is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.10 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.11 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.12 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.13 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.14 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.15 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.16 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.17 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 2.18 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.19 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.1a is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 2.1b is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.0 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.1 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.2 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.3 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.4 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.5 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.6 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.7 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.9 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.c is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.d is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.e is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.f is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.10 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.11 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.12 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.13 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.14 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.15 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.16 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.17 is stuck inactive for 14m, current state undersized+peered, last acting [0]
    pg 3.18 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.19 is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.1a is stuck inactive for 14m, current state undersized+peered, last acting [1]
    pg 3.1b is stuck inactive for 14m, current state undersized+peered, last acting [0]
[WRN] PG_DEGRADED: Degraded data redundancy: 65 pgs undersized
    pg 1.0 is stuck undersized for 22m, current state undersized+peered, last acting [1]
    pg 2.0 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.1 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.2 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.3 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.4 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.5 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.6 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.7 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.8 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.c is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.d is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.e is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.f is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.10 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.11 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.12 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.13 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.14 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.15 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.16 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.17 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 2.18 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.19 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.1a is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 2.1b is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.0 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.1 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.2 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.3 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.4 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.5 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.6 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.7 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.9 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.c is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.d is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.e is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.f is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.10 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.11 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.12 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.13 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.14 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.15 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.16 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.17 is stuck undersized for 14m, current state undersized+peered, last acting [0]
    pg 3.18 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.19 is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.1a is stuck undersized for 14m, current state undersized+peered, last acting [1]
    pg 3.1b is stuck undersized for 14m, current state undersized+peered, last acting [0]

这里会出现什么问题呢?您认为这是因为我只有一台服务器和 2 个 OSD 吗?

答案1

MDS 报告元数据速度缓慢,因为它无法联系任何 PG,您的所有 PG 都“不活动”。一旦您调出 PG,警告最终就会消失。每个池的默认压缩规则大小为 3,如果您只有两个 OSD,则永远无法实现这一点。您还必须将 更改osd_crush_chooseleaf_type为 0,以便 OSD 是您的粉碎故障域而不是主机。然后,您还应该将池大小更改为 2,以便所有 PG 都适合两个 OSD。但请注意,池大小 2 仅用于测试目的,或者如果您不重视数据,则不建议将其用于任何生产用途。

相关内容