上周,在将我们的 GKE 集群升级到 Kubernetes 1.13.6-gke.13 后,集群中的所有节点都因 CPU 使用率过高而开始出现故障。节点本身上的 Kubernetes 软件占用了所有 CPU,而不是 pod。
top
当我们通过 SSH 进入节点时显示的内容:
top - 10:11:27 up 5 days, 22 min, 1 user, load average: 23.71, 21.90, 20.32
Tasks: 858 total, 3 running, 854 sleeping, 0 stopped, 1 zombie
%Cpu(s): 33.5 us, 30.9 sy, 0.0 ni, 35.5 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem : 30157.0 total, 14125.6 free, 4771.2 used, 11260.1 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 24762.7 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 114636 22564 4928 R 65.2 0.1 5254:53 systemd
356 message+ 20 0 110104 98784 2916 S 33.8 0.3 2705:49 dbus-daemon
1236 root 20 0 3174876 147328 57464 S 22.0 0.5 1541:29 kubelet
366 root 20 0 12020 3864 3396 S 21.6 0.0 1705:21 systemd-logind
1161 root 20 0 5371428 586832 26756 S 18.7 1.9 1848:18 dockerd
432 root 20 0 5585144 49040 13420 S 11.1 0.2 848:54.06 containerd
23797 root 20 0 994620 8784 6088 S 3.0 0.0 96:58.79 containerd-shim
45359 root 20 0 994620 8968 5600 S 3.0 0.0 100:28.46 containerd-shim
35913 root 20 0 1068352 8192 5600 S 2.3 0.0 104:54.12 containerd-shim
10806 root 20 0 994620 8908 5596 S 2.0 0.0 102:57.45 containerd-shim
15378 root 20 0 994620 9084 5600 S 2.0 0.0 102:24.08 containerd-shim
33141 root 20 0 994620 8856 5848 S 2.0 0.0 95:26.89 containerd-shim
34299 root 20 0 994620 8824 5848 S 2.0 0.0 90:55.28 containerd-shim
48411 root 20 0 994620 9488 6216 S 2.0 0.0 95:38.56 containerd-shim
1824641 root 20 0 1068352 6836 5716 S 2.0 0.0 65:45.81 containerd-shim
10257 root 20 0 994620 9404 5596 S 1.6 0.0 101:10.45 containerd-shim
15400 root 20 0 1068352 8916 6216 S 1.6 0.0 97:47.99 containerd-shim
22895 root 20 0 1068352 8408 5472 S 1.6 0.0 102:55.97 containerd-shim
29969 root 20 0 994620 9124 5600 S 1.6 0.0 98:32.32 containerd-shim
34720 root 20 0 994620 8444 5472 S 1.6 0.0 97:23.98 containerd-shim
10073 root 20 0 1068352 9796 6152 S 1.3 0.0 100:54.30 containerd-shim
在此之后,这些节点上的 pod 会出现 CPU 匮乏、运行不佳的情况,并由于活性检查不佳而被杀死。
为了尝试解决该问题,我们重新创建了所有节点。我们创建了一个具有同等资源的新池,并通过将旧池缩减为 0 个节点来迁移所有 pod。(这很困难,因为至少有 2 个以前的节点在很长时间后仍无法关闭。最后,我们不得不关闭底层虚拟机来杀死这些节点。)这没有帮助。
我们还尝试将节点从容器操作系统切换到常规 Ubuntu。这也没有帮助。
接下来,我们又创建了另一个池,这次 CPU 数量是原来的两倍。这有所帮助,但并没有解决根本问题。一些节点的 systemd、dbus-daemon、kubelet 等 CPU 使用率仍然非常高。
现在我们每个节点都运行着大量额外的 CPU,这掩盖了问题。除了有问题的系统服务外,还有足够的 CPU 来运行实际的 pod。
但整个系统仍然在浪费大量资源(和金钱),所以我们真的需要弄清楚这里到底发生了什么。什么事情systemd
如此耗费 CPU?dbus-daemon
在做什么?我们检查了各种日志(见下文),但没有明显的错误。
我们如何才能找出这里到底出了什么问题?
摘录自journalctl -u kubelet
:
Jul 04 05:49:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 05:49:34.849808 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 05:54:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 05:54:34.850598 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 05:59:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 05:59:34.851797 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 06:04:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 06:04:34.858344 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 06:09:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 06:09:34.859626 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 06:14:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 06:14:34.861142 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 06:19:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 06:19:34.862185 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 06:24:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 06:24:34.863160 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 06:29:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 06:29:34.864156 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 06:34:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 06:34:34.865041 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 06:39:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 06:39:34.866044 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 06:44:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 06:44:34.866969 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 06:49:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 06:49:34.867979 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 06:54:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 06:54:34.869429 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 06:59:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 06:59:34.870359 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 07:04:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 07:04:34.871190 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 07:09:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 07:09:34.872063 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 07:14:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 07:14:34.873240 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 07:19:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 07:19:34.874123 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 07:24:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 07:24:34.875010 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 07:29:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 07:29:34.876612 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 07:34:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 07:34:34.877420 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 07:39:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 07:39:34.878368 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 07:44:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 07:44:34.879221 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 07:49:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 07:49:34.880239 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 07:54:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 07:54:34.881172 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 07:59:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 07:59:34.881868 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 08:04:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 08:04:34.882653 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 08:09:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 08:09:34.883432 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 08:14:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 08:14:34.884175 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 08:19:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 08:19:34.885043 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 08:24:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 08:24:34.885845 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
Jul 04 08:29:34 gke-cluster0-pool-1-9c29a343-vqtq kubelet[1236]: I0704 08:29:34.886690 1236 container_manager_linux.go:434] [ContainerManager]: Discovered runtime cgroups name: /system.slice/docker.service
输出自strace -c -p `pidof systemd`
:
strace: Process 1 attached
strace: Process 1 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
24.20 0.109101 5 22827 lstat
17.45 0.078654 9 8610 sendmsg
15.51 0.069914 7 10549 read
10.96 0.049406 3 17310 getpid
6.45 0.029075 8 3806 openat
6.08 0.027385 7 3783 readlinkat
5.76 0.025945 3 7579 fstat
4.47 0.020167 4 5700 getrandom
3.62 0.016301 4 3806 close
3.14 0.014133 7 1892 access
2.28 0.010278 5 1924 11 stat
0.03 0.000145 4 33 epoll_wait
0.02 0.000089 4 22 readlink
0.01 0.000029 3 11 prctl
0.01 0.000029 3 11 clock_gettime
0.01 0.000027 2 11 getgid
0.01 0.000026 2 11 geteuid
0.00 0.000020 2 11 getuid
0.00 0.000020 2 11 getegid
------ ----------- ----------- --------- --------- ----------------
100.00 0.450744 87907 11 total
strace lstat 是什么?下面是我从完整的 strace 输出中随机截取的一些相关代码片段:
...
stat("/dev/sds", {st_mode=S_IFBLK|0660, st_rdev=makedev(65, 32), ...}) = 0
readlinkat(AT_FDCWD, "/sys/dev/block/65:32", "../../devices/pci0000:00/0000:00"..., 99) = 83
lstat("/sys", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
lstat("/sys/dev", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/dev/block", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices/pci0000:00", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices/pci0000:00/0000:00:03.0", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices/pci0000:00/0000:00:03.0/virtio0", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices/pci0000:00/0000:00:03.0/virtio0/host0", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:19", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:19/0:0:19:0", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:19/0:0:19:0/block", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:19/0:0:19:0/block/sds", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
access("/sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:19/0:0:19:0/block/sds/uevent", F_OK) = 0
openat(AT_FDCWD, "/sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:19/0:0:19:0/block/sds/uevent", O_RDONLY|O_CLOEXEC) = 16
...
lstat("/sys", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
lstat("/sys/dev", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/dev/block", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices/pci0000:00", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices/pci0000:00/0000:00:03.0", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices/pci0000:00/0000:00:03.0/virtio0", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices/pci0000:00/0000:00:03.0/virtio0/host0", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:18", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:18/0:0:18:0", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:18/0:0:18:0/block", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
lstat("/sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:18/0:0:18:0/block/sdr", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
access("/sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:18/0:0:18:0/block/sdr/uevent", F_OK) = 0
dbus-monitor --system --profile | head -n 20
:
dbus-monitor: unable to enable new-style monitoring: org.freedesktop.DBus.Error.AccessDenied: "Rejected send message, 1 matched rules; type="method_call", sender=":1.165" (uid=5004 pid=769854 comm="dbus-monitor --system --profile ") interface="org.freedesktop.DBus.Monitoring" member="BecomeMonitor" error name="(unset)" requested_reply="0" destination="org.freedesktop.DBus" (bus)". Falling back to eavesdropping.
#type timestamp serial sender destination path interface member
# in_reply_to
sig 1562263380.765023 2 org.freedesktop.DBus :1.165 /org/freedesktop/DBus org.freedesktop.DBus NameAcquired
sig 1562263380.953812 132870362 :1.157 <none> /org/freedesktop/systemd1/unit/var_2dlib_2ddocker_2doverlay2_2d5d1be6e08bfe7552d6f9ee50a943eca88dd0dd749ec248594aa0be91879a2cdb_2dmerged_2emount org.freedesktop.DBus.Properties PropertiesChanged
sig 1562263380.957890 132870363 :1.157 <none> /org/freedesktop/systemd1/unit/var_2dlib_2ddocker_2doverlay2_2d5d1be6e08bfe7552d6f9ee50a943eca88dd0dd749ec248594aa0be91879a2cdb_2dmerged_2emount org.freedesktop.DBus.Properties PropertiesChanged
sig 1562263380.957912 132870364 :1.157 <none> /org/freedesktop/systemd1/unit/var_2dlib_2ddocker_2doverlay2_2d539cd8b8367913182e438aa1b3b05714c8f3f131e20bcadabdeb850c375a8125_2dmerged_2emount org.freedesktop.DBus.Properties PropertiesChanged
sig 1562263380.957918 132870365 :1.157 <none> /org/freedesktop/systemd1/unit/var_2dlib_2ddocker_2doverlay2_2d539cd8b8367913182e438aa1b3b05714c8f3f131e20bcadabdeb850c375a8125_2dmerged_2emount org.freedesktop.DBus.Properties PropertiesChanged
sig 1562263380.957923 132870366 :1.157 <none> /org/freedesktop/systemd1/unit/var_2dlib_2ddocker_2dcontainers_2d4c04ee8d1bf693ff2c9300b198b2b47bbf2c240265af5b9edc1f07b6cbd0c1ce_2dmounts_2dshm_2emount org.freedesktop.DBus.Properties PropertiesChanged
sig 1562263380.958014 132870367 :1.157 <none> /org/freedesktop/systemd1/unit/var_2dlib_2ddocker_2dcontainers_2d4c04ee8d1bf693ff2c9300b198b2b47bbf2c240265af5b9edc1f07b6cbd0c1ce_2dmounts_2dshm_2emount org.freedesktop.DBus.Properties PropertiesChanged
sig 1562263380.958020 132870368 :1.157 <none> /org/freedesktop/systemd1/unit/var_2dlib_2ddocker_2doverlay2_2d872dd2c0a63a9f3b5c9c5e4972e06fcf89d28b4c7f59aea7ea4d38f5a6bf0db6_2dmerged_2emount org.freedesktop.DBus.Properties PropertiesChanged
sig 1562263380.958025 132870369 :1.157 <none> /org/freedesktop/systemd1/unit/var_2dlib_2ddocker_2doverlay2_2d872dd2c0a63a9f3b5c9c5e4972e06fcf89d28b4c7f59aea7ea4d38f5a6bf0db6_2dmerged_2emount org.freedesktop.DBus.Properties PropertiesChanged
sig 1562263380.958029 132870370 :1.157 <none> /org/freedesktop/systemd1/unit/home_2dkubernetes_2dcontainerized_5fmounter_2drootfs_2dvar_2dlib_2dkubelet_2dpods_2d2f4e6eae_5cx2d9e51_5cx2d11e9_5cx2db4ee_5cx2d42010a80000f_2dvolumes_2dkubernetes_2eio_5cx7esecret_2dfluentd_5cx2dgcp_5cx2dtoken_5cx2dzfrkb_2emount org.freedesktop.DBus.Properties PropertiesChanged
sig 1562263380.958111 132870371 :1.157 <none> /org/freedesktop/systemd1/unit/home_2dkubernetes_2dcontainerized_5fmounter_2drootfs_2dvar_2dlib_2dkubelet_2dpods_2d2f4e6eae_5cx2d9e51_5cx2d11e9_5cx2db4ee_5cx2d42010a80000f_2dvolumes_2dkubernetes_2eio_5cx7esecret_2dfluentd_5cx2dgcp_5cx2dtoken_5cx2dzfrkb_2emount org.freedesktop.DBus.Properties PropertiesChanged
sig 1562263380.958117 132870372 :1.157 <none> /org/freedesktop/systemd1/unit/var_2dlib_2dkubelet_2dpods_2d2f4e6eae_5cx2d9e51_5cx2d11e9_5cx2db4ee_5cx2d42010a80000f_2dvolumes_2dkubernetes_2eio_5cx7esecret_2dfluentd_5cx2dgcp_5cx2dtoken_5cx2dzfrkb_2emount org.freedesktop.DBus.Properties PropertiesChanged
sig 1562263380.958121 132870373 :1.157 <none> /org/freedesktop/systemd1/unit/var_2dlib_2dkubelet_2dpods_2d2f4e6eae_5cx2d9e51_5cx2d11e9_5cx2db4ee_5cx2d42010a80000f_2dvolumes_2dkubernetes_2eio_5cx7esecret_2dfluentd_5cx2dgcp_5cx2dtoken_5cx2dzfrkb_2emount org.freedesktop.DBus.Properties PropertiesChanged
sig 1562263380.958126 132870374 :1.157 <none> /org/freedesktop/systemd1/unit/var_2dlib_2ddocker_2doverlay2_2d667d713fe45ea0af609c85fbfd3aafbca9494574fe10509bda8cd3d13d1e6b66_2dmerged_2emount org.freedesktop.DBus.Properties PropertiesChanged
sig 1562263380.958223 132870375 :1.157 <none> /org/freedesktop/systemd1/unit/var_2dlib_2ddocker_2doverlay2_2d667d713fe45ea0af609c85fbfd3aafbca9494574fe10509bda8cd3d13d1e6b66_2dmerged_2emount org.freedesktop.DBus.Properties PropertiesChanged
sig 1562263380.958229 132870376 :1.157 <none> /org/freedesktop/systemd1/unit/var_2dlib_2ddocker_2doverlay2_2d0a9f899f23c0965b43cae81dd04f46a78e4b1b063fa5679323146b06932474a9_2dmerged_2emount org.freedesktop.DBus.Properties PropertiesChanged
sig 1562263380.958233 132870377 :1.157 <none> /org/freedesktop/systemd1/unit/var_2dlib_2ddocker_2doverlay2_2d0a9f899f23c0965b43cae81dd04f46a78e4b1b063fa5679323146b06932474a9_2dmerged_2emount org.freedesktop.DBus.Properties PropertiesChanged
sig 1562263380.958238 132870378 :1.157 <none> /org/freedesktop/systemd1/unit/var_2dlib_2ddocker_2doverlay2_2d875734c124694cd54a7c26516f740cedcb853a18ff5805a89669747c1188b65d_2dmerged_2emount org.freedesktop.DBus.Properties PropertiesChanged
答案1
与 Google 合作后,我们终于解决了这个问题。简而言之,我们必须创建一个运行 GKE 1.12.8-gke.10 的新节点池,并使用持久卷将所有 pod 转移到该池。
长版本
这是我希望在调试时立即想到的事情,但我必须感谢谷歌支持团队的 Francisco 和 Lynn。
故障排除过程中的一个关键时刻是我们对负载进行分段。有一次,我们为每种类型的 Pod 指定了一个特定的节点池。这使我们能够回答这个问题:问题是否特定于某种类型的 Pod?我们怀疑安装活动与该问题有关,因此特别感兴趣的是查看具有持久卷的 Pod 是否与节点性能下降相关。
事实证明确实如此。只有运行带有卷的 pod 的节点才会发热。
根本原因
Google GKE 团队的假设是,Docker 在 17.03 和 18.09 版本之间的变化导致每当exec
pod 中有东西被“添加”时,就会产生大量的 systemd 活动。具体来说,他们的一名工程师发现了一个变化,runc
这会导致所有挂载单元作为 exec 活跃度探测的一部分被重新加载。
因此,解决这个问题的秘诀有三个:
- Docker 18(GKE 13 使用)
- 每个节点安装多个卷
- 用于
exec
活性探测(或者其他我认为的)
我们每个节点运行大约 40 个 Redis pod。按照惯例,我们的 Redis pod 使用基于 exec 的存活探测器。此外,每个 pod 都有一个挂载卷用于永久存储。存活探测器每分钟运行一次。因此,我们每分钟有 40 个卷被重新加载 40 次。
解决问题
为了解决这个问题,我们使用节点污点隔离了所有 Redis 负载。我们创建了一个专用于 Redis 的新节点池,运行 GKE 1.12.8-gke.10。虽然 GKE 不允许您降级集群,但它允许您创建运行旧版本的新节点池。当然,我们禁用了这个新池的自动升级。
然后我们强制将所有 Redis pod 迁移到这个新池。
结果是立竿见影的:我们现在可以用 6 个 CPU 核心运行与之前用 24 个 CPU 核心一样多的负载,而且情况看起来很稳定。
答案2
这里需要考虑两件事:
- 要查看集群中是否发生任何问题,您需要检查日志。
系统组件有两种类型:在容器中运行的组件和不在容器中运行的组件。例如:
Kubernetes 调度程序和 kube-proxy 在容器中运行。kubelet 和容器运行时(例如 Docker)不在容器中运行。在具有 systemd 的机器上,kubelet 和容器运行时会写入 journald。如果不存在 systemd,它们会写入 /var/log 目录中的 .log 文件。容器内的系统组件始终会写入 /var/log 目录,从而绕过默认日志记录机制。
例如,用于journalctl -u kubelet
分析 kubelet 日志。您还可以检查pod 日志。
- 如果没有错误,但您可能需要使用的资源仍然存在问题自动缩放:
- HPA 根据 CPU/内存使用情况扩展 pod 副本的数量。
- 垂直 Pod 自动扩缩器 (VPA) 为现有 Pod 分配更多(或更少)的 CPU 或内存。
- 集群自动扩缩器 (CA) 根据待处理的 Pod 扩缩集群节点。
使用上述一种或多种方法的组合来动态扩展您的集群,而无需手动设置。
如果有帮助的话请告诉我。