Ceph PG 卡住创建/池创建缓慢

Ceph PG 卡住创建/池创建缓慢

我创建了一个小型 ceph 集群,其中有 3 台服务器,每台服务器有 5 个 osd 磁盘,每台服务器有一个监视器。

实际设置似乎已顺利完成,mons 处于法定人数,所有 15 个 osd 都已启动并进入,但是在创建池时,pg 一直处于非活动状态,并且从未真正正确创建。

我已经阅读了尽可能多的我能找到的帖子/教程但仍然似乎无法弄清楚为什么它们卡在创建状态并且永远无法完成。

我真的需要一些建议来查找错误、问题,或者这个池创建真的那么慢。系统已经像这样设置并运行了 2 周,pgmap 显示ceph -wMB Used 值以非常非常非常缓慢的速度增加,大约每 2 分钟 1mb。

输出ceph -w

cephadmin@cnc:~$ ceph -w
    cluster 7908651c-252e-4761-8a83-4b1cfcf90522
     health HEALTH_ERR
            700 pgs are stuck inactive for more than 300 seconds
            700 pgs stuck inactive
     monmap e1: 3 mons at {ceph1=10.0.80.10:6789/0,ceph2=10.0.80.11:6789/0,ceph3=10.0.80.12:6789/0}
            election epoch 18, quorum 0,1,2 ceph1,ceph2,ceph3
     osdmap e304359: 15 osds: 15 up, 15 in
            flags sortbitwise,require_jewel_osds
      pgmap v1097264: 700 pgs, 1 pools, 0 bytes data, 0 objects
            90932 MB used, 55699 GB / 55788 GB avail
                 700 creating

2017-02-02 11:20:10.774943 mon.0 [INF] pgmap v1097264: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:11.152412 mon.0 [INF] mds.? 10.0.80.10:6800/1746 up:boot
2017-02-02 11:20:11.152632 mon.0 [INF] fsmap e304293:, 1 up:standby
2017-02-02 11:20:11.853221 mon.0 [INF] pgmap v1097265: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:12.931001 mon.0 [INF] pgmap v1097266: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:14.097210 mon.0 [INF] pgmap v1097267: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:14.707583 mon.0 [INF] osdmap e304360: 15 osds: 15 up, 15 in
2017-02-02 11:20:14.774994 mon.0 [INF] pgmap v1097268: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:15.197354 mon.0 [INF] mds.? 10.0.80.10:6801/2222 up:boot
2017-02-02 11:20:15.197528 mon.0 [INF] fsmap e304294:, 1 up:standby
2017-02-02 11:20:15.875919 mon.0 [INF] pgmap v1097269: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:16.975746 mon.0 [INF] pgmap v1097270: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:18.075955 mon.0 [INF] pgmap v1097271: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:18.708059 mon.0 [INF] osdmap e304361: 15 osds: 15 up, 15 in
2017-02-02 11:20:18.775552 mon.0 [INF] pgmap v1097272: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:19.253143 mon.0 [INF] mds.? 10.0.80.10:6800/1746 up:boot
2017-02-02 11:20:19.253314 mon.0 [INF] fsmap e304295:, 1 up:standby
2017-02-02 11:20:19.853348 mon.0 [INF] pgmap v1097273: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:20.988606 mon.0 [INF] pgmap v1097274: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:22.188444 mon.0 [INF] pgmap v1097275: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:22.709647 mon.0 [INF] osdmap e304362: 15 osds: 15 up, 15 in
2017-02-02 11:20:22.777063 mon.0 [INF] pgmap v1097276: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:23.288351 mon.0 [INF] mds.? 10.0.80.10:6801/2222 up:boot
2017-02-02 11:20:23.288498 mon.0 [INF] fsmap e304296:, 1 up:standby
2017-02-02 11:20:23.855536 mon.0 [INF] pgmap v1097277: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:25.533595 mon.0 [INF] pgmap v1097278: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:26.610728 mon.0 [INF] pgmap v1097279: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:26.743563 mon.0 [INF] osdmap e304363: 15 osds: 15 up, 15 in
2017-02-02 11:20:26.743636 mon.0 [INF] mds.? 10.0.80.10:6800/1746 up:boot
2017-02-02 11:20:26.743722 mon.0 [INF] fsmap e304297:, 1 up:standby
2017-02-02 11:20:26.822333 mon.0 [INF] pgmap v1097280: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:27.900114 mon.0 [INF] pgmap v1097281: 700 pgs: 700 creating; 0 bytes data, 90932 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:29.111348 mon.0 [INF] pgmap v1097282: 700 pgs: 700 creating; 0 bytes data, 90933 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:30.188991 mon.0 [INF] pgmap v1097283: 700 pgs: 700 creating; 0 bytes data, 90933 MB used, 55699 GB / 55788 GB avail
2017-02-02 11:20:30.721728 mon.0 [INF] osdmap e304364: 15 osds: 15 up, 15 in
2017-02-02 11:20:30.778195 mon.0 [INF] pgmap v1097284: 700 pgs: 700 creating; 0 bytes data, 90933 MB used, 55699 GB / 55788 GB avail

ceph.conf

[global]
public network = 10.0.80.0/23
cluster network = 10.0.80.0/23

fsid = 7908651c-252e-4761-8a83-4b1cfcf90522
mon_initial_members = ceph1, ceph2, ceph3
mon_host = 10.0.80.10,10.0.80.11,10.0.80.12
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

osd pool default size = 2
osd pool default min size = 1
osd pool default pg num = 750
osd pool default pgp num = 750
osd crush chooseleaf type = 2

[mon.ceph1]
mon addr = 10.0.80.10:6789
host = ceph1
[mon.ceph2]
mon addr = 10.0.80.11:6789
host = ceph2
[mon.ceph3]
mon addr = 10.0.80.12:6789
host = ceph3

[mds]
keyring = /var/lib/ceph/mds/ceph-ceph1/keyring
[mds.ceph1]
host = ceph1

[osd.0]
cluster addr = 10.0.80.13
host = ceph1
[osd.1]
cluster addr = 10.0.80.13
host = ceph1
[osd.2]
cluster addr = 10.0.80.13
host = ceph1
[osd.3]
cluster addr = 10.0.80.13
host = ceph1
[osd.4]
cluster addr = 10.0.80.13
host = ceph1

[osd.5]
cluster addr = 10.0.80.14
host = ceph2
[osd.6]
cluster addr = 10.0.80.14
host = ceph2
[osd.7]
cluster addr = 10.0.80.14
host = ceph2
[osd.8]
cluster addr = 10.0.80.14
host = ceph2
[osd.9]
cluster addr = 10.0.80.14
host = ceph2

[osd.10]
cluster addr = 10.0.80.15
host = ceph3
[osd.11]
cluster addr = 10.0.80.15
host = ceph3
[osd.12]
cluster addr = 10.0.80.15
host = ceph3
[osd.13]
cluster addr = 10.0.80.15
host = ceph3
[osd.14]
cluster addr = 10.0.80.15
host = ceph3

头孢他汀

cephadmin@cnc:~$ ceph df
GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED
    55788G     55699G       90973M          0.16
POOLS:
    NAME              ID     USED     %USED     MAX AVAIL     OBJECTS
    rbd_vmstorage     4         0         0        27849G           0
cephadmin@cnc:~$

ceph osd 树

cephadmin@cnc:~$ ceph osd tree
ID WEIGHT   TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 54.47983 root default
-2 18.15994     host ceph1
 0  3.63199         osd.0       up  1.00000          1.00000
 1  3.63199         osd.1       up  1.00000          1.00000
 2  3.63199         osd.2       up  1.00000          1.00000
 3  3.63199         osd.3       up  1.00000          1.00000
 4  3.63199         osd.4       up  1.00000          1.00000
-3 18.15994     host ceph2
 5  3.63199         osd.5       up  1.00000          1.00000
 6  3.63199         osd.6       up  1.00000          1.00000
 7  3.63199         osd.7       up  1.00000          1.00000
 8  3.63199         osd.8       up  1.00000          1.00000
 9  3.63199         osd.9       up  1.00000          1.00000
-4 18.15994     host ceph3
10  3.63199         osd.10      up  1.00000          1.00000
11  3.63199         osd.11      up  1.00000          1.00000
12  3.63199         osd.12      up  1.00000          1.00000
13  3.63199         osd.13      up  1.00000          1.00000
14  3.63199         osd.14      up  1.00000          1.00000

crushmap 反编译。

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host ceph1 {
        id -2           # do not change unnecessarily
        # weight 18.160
        alg straw
        hash 0  # rjenkins1
        item osd.0 weight 3.632
        item osd.1 weight 3.632
        item osd.2 weight 3.632
        item osd.3 weight 3.632
        item osd.4 weight 3.632
}
host ceph2 {
        id -3           # do not change unnecessarily
        # weight 18.160
        alg straw
        hash 0  # rjenkins1
        item osd.5 weight 3.632
        item osd.6 weight 3.632
        item osd.7 weight 3.632
        item osd.8 weight 3.632
        item osd.9 weight 3.632
}
host ceph3 {
        id -4           # do not change unnecessarily
        # weight 18.160
        alg straw
        hash 0  # rjenkins1
        item osd.10 weight 3.632
        item osd.11 weight 3.632
        item osd.12 weight 3.632
        item osd.13 weight 3.632
        item osd.14 weight 3.632
}
root default {
        id -1           # do not change unnecessarily
        # weight 54.480
        alg straw
        hash 0  # rjenkins1
        item ceph1 weight 18.160
        item ceph2 weight 18.160
        item ceph3 weight 18.160
}

# rules
rule replicated_ruleset {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type chassis
        step emit
}

# end crush map

创建一个池真的需要一周以上的时间吗?我在某个配置中做错了什么,导致它们无法相互通信或发生其他问题?如果您需要更多信息,我会运行您希望我运行的任何命令,只需发布​​命令,我就会运行它。我只是需要一些想法,因为我确实想尝试/使用 ceph,但我目前的知识水平有限,正在努力在谷歌上寻找更多知识/类似问题

答案1

您可以参考这里

https://github.com/ceph/ceph/commit/b73d0d325d382e32662ba5fab3c3f4d3a1b1681b

我们曾经有一个复杂的 pg 创建过程,在这个过程中,我们会在本地创建新的“空”pg 之前查询 pg 的任何先前映射。先前映射的跟踪非常简单(并且有问题),但这并不重要,因为 mon 会定期重新发送 pg 创建消息。现在它不会了,所以这有问题。

答案2

我会开始研究OSD

  1. ceph 告诉 osd.0 注入参数 --debug-osd 0/5
  2. 在此处查看池命令http://docs.ceph.com/docs/jewel/rados/operations/pools/

如果这不起作用,请将所有内容修改为最高调试级别http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/,然后检查文档中描述的日志文件。

凭借我对 CEPH 的有限了解,我认为最好只是查看在线文档(因为 CEPH 版本发展很快,然后尝试了解功能,在可以的地方添加调试,查看日志)。

请告诉我你发现了什么错误。

相关内容