我在 kubernetes v1.13 上运行 ceph(由 rook-ceph 操作员 v0.9.3 创建)。在我们的集群不正常关闭后,一些进程随机进入不间断睡眠状态。一段时间后,kubernetes 集群无法安排新的 Pod。查看 dmesg,我发现了以下内容:
[ 3021.890423] INFO: task tp_fstore_op:22689 blocked for more than 120 seconds.
[ 3021.890456] Tainted: G O 4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[ 3021.890480] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3021.890504] tp_fstore_op D 0 22689 20967 0x00000000
[ 3021.890508] ffff93c0a5dc0080 0000000000000000 ffff93d137954540 ffff93c1fe8d8980
[ 3021.890510] ffff93bf42e823c0 ffffb9ae3834b7b0 ffffffff9e0144b9 0000000000008000
[ 3021.890512] 0000000000000040 ffff93c1fe8d8980 ffff93c0a9156300 ffff93d137954540
[ 3021.890515] Call Trace:
[ 3021.890524] [<ffffffff9e0144b9>] ? __schedule+0x239/0x6f0
[ 3021.890571] [<ffffffffc0b69321>] ? xfs_reclaim_inode+0x131/0x340 [xfs]
[ 3021.890574] [<ffffffff9e0149a2>] ? schedule+0x32/0x80
[ 3021.890576] [<ffffffff9e017d4d>] ? schedule_timeout+0x1dd/0x380
[ 3021.890602] [<ffffffffc0b8556d>] ? _xfs_log_force_lsn+0x22d/0x320 [xfs]
[ 3021.890613] [<ffffffff9daf107e>] ? ktime_get+0x3e/0xb0
[ 3021.890635] [<ffffffffc0b69321>] ? xfs_reclaim_inode+0x131/0x340 [xfs]
[ 3021.890638] [<ffffffff9e01421d>] ? io_schedule_timeout+0x9d/0x100
[ 3021.890659] [<ffffffffc0b71e24>] ? __xfs_iunpin_wait+0xd4/0x160 [xfs]
[ 3021.890662] [<ffffffff9dabd3f0>] ? wake_atomic_t_function+0x60/0x60
[ 3021.890681] [<ffffffffc0b69321>] ? xfs_reclaim_inode+0x131/0x340 [xfs]
[ 3021.890699] [<ffffffffc0b6970e>] ? xfs_reclaim_inodes_ag+0x1de/0x300 [xfs]
[ 3021.890702] [<ffffffff9db91885>] ? node_dirty_ok+0x125/0x170
[ 3021.890704] [<ffffffff9dd53419>] ? list_del+0x9/0x30
[ 3021.890707] [<ffffffff9dbe599a>] ? page_is_poisoned+0xa/0x20
[ 3021.890709] [<ffffffff9db8ba0e>] ? get_page_from_freelist+0x88e/0xb20
[ 3021.890712] [<ffffffff9daae1ff>] ? select_task_rq_fair+0x51f/0x7e0
[ 3021.890714] [<ffffffff9daad9d5>] ? select_idle_sibling+0x25/0x330
[ 3021.890716] [<ffffffff9daa5674>] ? try_to_wake_up+0x54/0x3c0
[ 3021.890734] [<ffffffffc0b6a771>] ? xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
[ 3021.890736] [<ffffffff9dc0eed8>] ? super_cache_scan+0x188/0x190
[ 3021.890738] [<ffffffff9db97a0a>] ? shrink_slab.part.38+0x21a/0x440
[ 3021.890740] [<ffffffff9db9c3ca>] ? shrink_node+0x10a/0x340
[ 3021.890742] [<ffffffff9db9c6f1>] ? do_try_to_free_pages+0xf1/0x310
[ 3021.890744] [<ffffffff9dd38b6a>] ? __next_node_in+0x3a/0x50
[ 3021.890745] [<ffffffff9db9cb73>] ? try_to_free_mem_cgroup_pages+0xc3/0x1a0
[ 3021.890748] [<ffffffff9dbfd147>] ? try_charge+0x147/0x6f0
[ 3021.890750] [<ffffffff9dc01237>] ? mem_cgroup_try_charge+0x67/0x1b0
[ 3021.890752] [<ffffffff9dbbb1d2>] ? handle_mm_fault+0x10e2/0x1310
[ 3021.890755] [<ffffffff9dc0ac30>] ? new_sync_write+0xe0/0x130
[ 3021.890758] [<ffffffff9da622f5>] ? __do_page_fault+0x255/0x4f0
[ 3021.890760] [<ffffffff9e01a618>] ? page_fault+0x28/0x30
此后,访问 RBD 时立即产生类似的错误:
[ 3021.890820] INFO: task xfsaild/rbd2:23307 blocked for more than 120 seconds.
[ 3021.890845] Tainted: G O 4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[ 3021.890867] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3021.890896] xfsaild/rbd2 D 0 23307 2 0x00000000
[ 3021.890898] ffff93c182e46480 0000000000000000 ffff93d0d3a4ca00 ffff93d1fdb58980
[ 3021.890900] ffff93d1f6a4a180 ffffb9ae24e07d80 ffffffff9e0144b9 0000000000000246
[ 3021.890903] 00ffffff9dae787d ffff93d1fdb58980 e182622c538e97d5 ffff93d0d3a4ca00
[ 3021.890905] Call Trace:
[ 3021.890909] [<ffffffff9e0144b9>] ? __schedule+0x239/0x6f0
[ 3021.890911] [<ffffffff9e0149a2>] ? schedule+0x32/0x80
[ 3021.890948] [<ffffffffc0b8508c>] ? _xfs_log_force+0x15c/0x2b0 [xfs]
[ 3021.890949] [<ffffffff9daa5a70>] ? wake_up_q+0x70/0x70
[ 3021.890973] [<ffffffffc0b92895>] ? xfsaild+0x1a5/0x7a0 [xfs]
[ 3021.890994] [<ffffffffc0b926f0>] ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
[ 3021.890996] [<ffffffff9da9a5d9>] ? kthread+0xd9/0xf0
[ 3021.890998] [<ffffffff9e019364>] ? __switch_to_asm+0x34/0x70
[ 3021.891000] [<ffffffff9da9a500>] ? kthread_park+0x60/0x60
[ 3021.891002] [<ffffffff9e0193f7>] ? ret_from_fork+0x57/0x70
[ 3021.891004] INFO: task xfsaild/rbd3:23438 blocked for more than 120 seconds.
[ 3021.891027] Tainted: G O 4.9.0-8-amd64 #1 Debian 4.9.144-3.1
[ 3021.891050] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3021.891074] xfsaild/rbd3 D 0 23438 2 0x00000000
[ 3021.891075] ffff93c0fb0464c0 0000000000000000 ffff93d0a88f61c0 ffff93d1fdd18980
[ 3021.891077] ffff93d1f6a80340 ffffb9ae24e37d80 ffffffff9e0144b9 0000000000000246
[ 3021.891080] 00ffffff9dae787d ffff93d1fdd18980 10168cfc448e06f4 ffff93d0a88f61c0
[ 3021.891081] Call Trace:
[ 3021.891084] [<ffffffff9e0144b9>] ? __schedule+0x239/0x6f0
[ 3021.891086] [<ffffffff9e0149a2>] ? schedule+0x32/0x80
[ 3021.891108] [<ffffffffc0b8508c>] ? _xfs_log_force+0x15c/0x2b0 [xfs]
[ 3021.891109] [<ffffffff9daa5a70>] ? wake_up_q+0x70/0x70
[ 3021.891130] [<ffffffffc0b92895>] ? xfsaild+0x1a5/0x7a0 [xfs]
[ 3021.891151] [<ffffffffc0b926f0>] ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
[ 3021.891153] [<ffffffff9da9a5d9>] ? kthread+0xd9/0xf0
[ 3021.891154] [<ffffffff9e019364>] ? __switch_to_asm+0x34/0x70
[ 3021.891156] [<ffffffff9da9a500>] ? kthread_park+0x60/0x60
[ 3021.891158] [<ffffffff9e0193f7>] ? ret_from_fork+0x57/0x70
dmesg 中有更多错误,但它们都遵循相同的模式:某个进程尝试在 XFS 上执行某些操作,内核任务卡住并且进程保持不间断睡眠状态。
不久之后,libceph 报告 OSD 已关闭:
[ 4218.521314] libceph: osd0 down
Journalctl 没有报告任何其他错误。
由于 Kubernetes Pod 尝试写入的文件对于附加卷而言太大,因此必须进行非正常关闭,因为存在类似问题。该卷由 rook-ceph 提供。这是我使用的配置:
集群配置:
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
image: "ceph/ceph:v13.2.5-20190319"
dataDirHostPath: "/var/rook/data"
dashboard:
enabled: True
port: 80
ssl: False
network:
hostNetwork: False # use SDN (Canal) as network
mon:
count: 3
allowMultiplePerNode: True
resources: # http://docs.ceph.com/docs/mimic/start/hardware-recommendations/
mgr:
requests:
cpu: 4
memory: "2Gi"
limits:
cpu: 4
memory: "2Gi"
mon:
requests:
cpu: 0.5
memory: "2Gi"
limits:
cpu: 0.5
memory: "2Gi"
osd:
requests:
cpu: 2
memory: "5Gi"
limits:
cpu: 2
memory: "5Gi"
storage:
useAllNodes: False
nodes:
- name: "kubernetes-master" # matches node label: kubernetes.io/hostname
useAllDevices: False
directories:
- path: "/var/rook/filestore"
BlockPool配置:
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: volatile-replicapool
namespace: rook-ceph
spec:
failureDomain: osd
replicated:
size: 1
以及 StorageClasses:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-block-development
provisioner: ceph.rook.io/block
parameters:
blockPool: volatile-replicapool
clusterNamespace: rook-ceph
fstype: xfs
reclaimPolicy: Delete
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-block-production
provisioner: ceph.rook.io/block
parameters:
blockPool: volatile-replicapool
clusterNamespace: rook-ceph
fstype: xfs
reclaimPolicy: Retain
我在跑步Linux 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64
。
如能提供关于如何调试该问题的任何指示,我们将不胜感激。
提前致谢。