Ceph - 纠删码池 - 总是获取不活跃的 pgs

2024-6-1 • tag-icon

我正在尝试在 ceph 上实现与 raid6 类似的事情。

但是当我创建擦除编码池（k = 3 + m = 2（或k = 4））时，我总是得到不活动的pgs。

Ceph 健康详细信息如下：

HEALTH_WARN Reduced data availability: 128 pgs inactive
PG_AVAILABILITY Reduced data availability: 128 pgs inactive
    pg 11.a is stuck inactive for 107.974251, current state unknown, last acting []
    pg 11.b is stuck inactive for 107.974251, current state unknown, last acting []
    pg 11.c is stuck inactive for 107.974251, current state unknown, last acting []
    pg 11.d is stuck inactive for 107.974251, current state unknown, last acting []
    pg 11.e is stuck inactive for 107.974251, current state unknown, last acting []
    pg 11.f is stuck inactive for 107.974251, current state unknown, last acting []
    pg 11.20 is stuck inactive for 107.974251, current state unknown, last acting []

EC 概况如下：

crush-device-class=
crush-failure-domain=rack
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=2
plugin=jerasure
technique=reed_sol_van
w=8

也尝试了 k=3，发生了同样的事情

粉碎规则（对于 k=4）：

{
    "rule_id": 1,
    "rule_name": "r6pool",
    "ruleset": 1,
    "type": 3,
    "min_size": 3,
    "max_size": 6,
    "steps": [
        {
            "op": "set_chooseleaf_tries",
            "num": 5
        },
        {
            "op": "set_choose_tries",
            "num": 100
        },
        {
            "op": "take",
            "item": -1,
            "item_name": "default"
        },
        {
            "op": "chooseleaf_indep",
            "num": 0,
            "type": "rack"
        },
        {
            "op": "emit"
        }
    ]
}

因为如果不使用 EC 池，一切都很好，我很确定我不明白某些事情，但我无法指出我的错误。

为了在完全良好的 8 OSD 集群上获取此错误，我只需执行以下操作：

ceph osd erasure-code-profile set raid6 k=3 m=2
ceph osd pool create r6pool 128 128 erasure raid6

ceph -s 显示如下信息：

ceph -s
  cluster:
    id:     7635eaf1-df47-4bed-9cef-a3152cb4fa5f
    health: HEALTH_WARN
            Reduced data availability: 128 pgs inactive

  services:
    mon: 3 daemons, quorum CephMon1,CephMon3,CephMon2
    mgr: CephClient(active)
    osd: 9 osds: 8 up, 8 in

  data:
    pools:   2 pools, 256 pgs
    objects: 0 objects, 0B
    usage:   8.13GiB used, 31.8GiB / 40.0GiB avail
    pgs:     50.000% pgs unknown
             128 active+clean
             128 unknown

我将非常感激您的帮助。谢谢。

答案1

嗯，一个：

crush-failure-domain=osd

解决了难题

答案1

相关内容