我们的设置是一个在 Docker 上运行的 3 节点 RHEL 7.3 裸机 Kubernetes 集群。
我们在所有三个节点上都发现了一个多路径 FC SAN 块设备。此设备用作具有 ext4 文件系统的 Kubernetes 持久卷。此对象的定义如下:
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/bound-by-controller: "yes"
creationTimestamp: 2019-01-04T13:49:42Z
finalizers:
- kubernetes.io/pv-protection
labels:
...
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 15Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: ...
namespace: ...
resourceVersion: ...
uid: ...
fc:
fsType: ext4
lun: 1
targetWWNs:
- ...04
- ...15
persistentVolumeReclaimPolicy: Retain
status:
phase: Bound
使用该卷的 pod 崩溃了,重新启动后开始抱怨不一致并请求运行 fsck。
Warning FailedMount 1m (x13 over 26m) kubelet, node2 MountVolume.WaitForAttach failed for volume "rtbm-prod-influxdb-pv" : fc: failed to mount fc volume /dev/dm-9 [ext4] to /var/lib/kubelet/plugins/kubernetes.io/fc/500
60e801232d404-lun-1, error 'fsck' found errors on device /dev/dm-9 but could not correct them: fsck from util-linux 2.23.2
k8s-san-0 contains a file system with errors, check forced.
k8s-san-0: Entry '675' in /data/_internal/monitor (262147) has an incorrect filetype (was 2, should be 1).
k8s-san-0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
但是我们无法启动 fsck。我们已经取消部署了 pod,但 fsck 仍然在抱怨
# fsck.ext4 /dev/mapper/mpathb
e2fsck 1.42.9 (28-Dec-2013)
/dev/mapper/mpathb is in use.
e2fsck: Cannot continue, aborting.
我尝试查看到底是什么在使用该设备:
# mount -l | grep -i mpathb
# lsof /dev/mapper/mpathb
# grep mpathb /proc/mounts
# fuser -m /dev/mapper/mpathb
但对于所有这些工具来说,它们的使用情况都是不可见的。我还能检查什么来找出是什么占用了我的块设备?