我们在 GKE 卷使用方面遇到了问题。
从今晚开始我们的部署无法再访问我们的主要文档存储磁盘,日志看起来像这样:
...
/go/src/github.com/def/abc/backend/formulare/formulare_generate_http.go:62 +0x55
github.com/def/abc/backend/formulare.CreateDirsIfNeeded(0xc000b9b1d0, 0x2e, 0x0, 0x0)
/usr/local/go/src/os/path.go:20 +0x39
os.MkdirAll(0xc000b9b1d0, 0x25, 0xc0000001ff, 0x25, 0xc000e75b18)
/usr/local/go/src/os/stat.go:13 +0x4d
os.Stat(0xc000b9b1d0, 0x25, 0xc000b9b1d0, 0x0, 0xc000b9b1d0, 0x25)
/usr/local/go/src/os/stat_unix.go:31 +0x77
os.statNolog(0xc000b9b1d0, 0x25, 0xc000171ac8, 0x2, 0x2, 0xc000b9b1d0)
/usr/local/go/src/os/file_posix.go:245
os.ignoringEINTR(...)
/usr/local/go/src/os/stat_unix.go:32
os.statNolog.func1(...)
/usr/local/go/src/syscall/syscall_linux_amd64.go:66
syscall.Stat(...)
/usr/local/go/src/syscall/zsyscall_linux_amd64.go:1440 +0xd2
syscall.fstatat(0xffffffffffffff9c, 0xc000b9b1d0, 0x25, 0xc001a90378, 0x0, 0xc000171ac0, 0x4f064b)
/usr/local/go/src/syscall/asm_linux_amd64.s:43 +0x5
syscall.Syscall6(0x106, 0xffffffffffffff9c, 0xc000b9b200, 0xc001a90378, 0x0, 0x0, 0x0, 0xc000ba4400, 0x0, 0xc000171a08)
goroutine 808214 [syscall, 534 minutes]:
在 gke 上重新创建 pv/pvc 和 nfs 服务器后,pv/pvc 已成功绑定,但 nfs 服务甚至无法启动,因为它无法绑定磁盘:
Warning FailedMount 95s (x7 over 15m) kubelet
Unable to attach or mount volumes: unmounted volumes=[document-storage-claim default-token-sbxxl], unattached volumes=[document-storage-claim default-token-sbxxl]: timed out waiting for the condition
奇怪的是,默认的 Google 服务帐户令牌卷也无法挂载。
这可能是 Google 的问题吗?我需要更改 nfs-server 配置吗?
以下是我的 k8s 定义:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: document-storage-claim
namespace: default
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard
volumeName: document-storage
resources:
requests:
storage: 250Gi
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: document-storage
namespace: default
spec:
storageClassName: standard
capacity:
storage: 250Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
gcePersistentDisk:
pdName: document-storage-clone
fsType: ext4
---
apiVersion: v1
kind: ReplicationController
metadata:
name: document-storage-nfs-server
spec:
replicas: 1
selector:
role: nfs-server
template:
metadata:
labels:
role: nfs-server
spec:
containers:
- name: nfs-server
image: k8s.gcr.io/volume-nfs:0.8
ports:
- name: nfs
containerPort: 2049
- name: mountd
containerPort: 20048
- name: rpcbind
containerPort: 111
securityContext:
privileged: true
volumeMounts:
- mountPath: /exports
name: document-storage-claim
volumes:
- name: document-storage-claim
persistentVolumeClaim:
claimName: document-storage-claim
答案1
看来 Google 在 2020-04-20 夜间推出了 GKE 更新。此更新不知何故也影响了一些先前的版本(在我们的情况下是 1.18.16-gke.502)。
我们通过升级到 1.19.8-gke.1600 解决了该问题。