Kubernetes 节点频繁出现 DiskPressure

Kubernetes 节点频繁出现 DiskPressure

我目前面临的一个问题是,我的一个 Kubernetes 节点不断遇到 DiskPressure,导致 pod 被驱逐和服务中断。尽管我们尽了最大努力,但仍无法确定此问题的根本原因。我正在寻求有关如何有效分析和调试问题的指导。

以下是背景信息和我们迄今为止采取的措施:

  • Kubernetes 版本:1.24.1
  • 节点规格
    • 操作系统:Ubuntu 20.04.4 LTS(amd64)
    • 内核:5.13.0-51-generic
    • 容器运行时:containerd://1.6.6
  • Pod 和资源利用率
Capacity:
  cpu:                16
  ephemeral-storage:  256Gi
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             65776132Ki
  pods:               110
Allocatable:
  cpu:                16
  ephemeral-storage:  241591910Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             65673732Ki
  pods:               110
System Info:
  Kernel Version:             5.13.0-51-generic
  OS Image:                   Ubuntu 20.04.4 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.6
  Kubelet Version:            v1.24.1
  Kube-Proxy Version:         v1.24.1
Non-terminated Pods:          (41 in total)
  Namespace                   Name                                               CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                               ------------  ----------  ---------------  -------------  ---
  cert-manager                cert-manager-7686fcb9bc-jptct                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
  cert-manager                cert-manager-cainjector-69d77789d-kmzb9            0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
  cert-manager                cert-manager-webhook-84c6f5779-gs8h7               0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
  devops                      external-dns-7bdcbb7658-rvwqs                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
  devops                      filebeat-7l62m                                     100m (0%)     0 (0%)      100Mi (0%)       200Mi (0%)     20m
  devops                      jenkins-597c5d498c-prs5x                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         14m
  devops                      kibana-6b577f877c-28ck4                            100m (0%)     1 (6%)      0 (0%)           0 (0%)         46m
  devops                      logstash-788d5f89b-pr79c                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         14m
  devops                      nexus-6db65f8744-cxlhs                             0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
  devops                      powerdns-authoritative-85dcd685c4-4mts8            0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
  devops                      powerdns-recursor-757854d6f8-5z25p                 0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
  devops                      powerdns-recursor-nok8s-5db55c87f9-77ww6           0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
  devops                      sonarqube-5767c467c9-2crz2                         0 (0%)        0 (0%)      200Mi (0%)       0 (0%)         46m
  devops                      sonarqube-postgres-0                               0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
  ingress-nginx               ingress-nginx-controller-75f6588c7b-gw77s          100m (0%)     0 (0%)      90Mi (0%)        0 (0%)         13m
  jenkins-agents              my-cluster-dev-tenant-develop-328-76mr4-ns67p-3xczd 0 (0%)        0 (0%)      350Mi (0%)       0 (0%)         72s
  kube-system                 calico-kube-controllers-56cdb7c587-zmz4t           0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
  kube-system                 calico-node-pshn4                                  250m (1%)     0 (0%)      0 (0%)           0 (0%)         354d
  kube-system                 coredns-6d4b75cb6d-nrbmq                           100m (0%)     0 (0%)      70Mi (0%)        170Mi (0%)     46m
  kube-system                 coredns-6d4b75cb6d-q9hvs                           100m (0%)     0 (0%)      70Mi (0%)        170Mi (0%)     46m
  kube-system                 etcd-my-cluster                                    100m (0%)     0 (0%)      100Mi (0%)       0 (0%)         354d
  kube-system                 kube-apiserver-my-cluster                          250m (1%)     0 (0%)      0 (0%)           0 (0%)         354d
  kube-system                 kube-controller-manager-my-cluster                 200m (1%)     0 (0%)      0 (0%)           0 (0%)         354d
  kube-system                 kube-proxy-qwmrd                                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         354d
  kube-system                 kube-scheduler-my-cluster                          100m (0%)     0 (0%)      0 (0%)           0 (0%)         354d
  kube-system                 metrics-server-5744cd7dbb-h758l                    100m (0%)     0 (0%)      200Mi (0%)       0 (0%)         34m
  kube-system                 metrics-server-6bf466fbf5-nt5k6                    100m (0%)     0 (0%)      200Mi (0%)       0 (0%)         47m
  kube-system                 node-shell-0c3bde15-32fa-4831-9f05-ebfe5d14a909    0 (0%)        0 (0%)      0 (0%)           0 (0%)         43m
  kube-system                 node-shell-692c6032-8301-44ac-b12e-e5a222a6f80a    0 (0%)        0 (0%)      0 (0%)           0 (0%)         8m6s
  lens-metrics                prometheus-0                                       100m (0%)     0 (0%)      512Mi (0%)       0 (0%)         14m
  imaginary-dev               mailhog-7f666fdfbf-xgcwf                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
  imaginary-dev               ms-nginx-766bf76f87-ss8h6                          0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
  imaginary-dev               ms-tenant-f847987cc-rf9db                          400m (2%)     500m (3%)   500M (0%)        700M (1%)      46m
  imaginary-dev               ms-webapp-5d6bcdcc4f-x68s4                         100m (0%)     200m (1%)   200M (0%)        400M (0%)      46m
  imaginary-dev               mysql-0                                            0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
  imaginary-dev               redis-0                                             0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
  imaginary-uat               mailhog-685b7c6844-cpmfp                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
  imaginary-uat               ms-tenant-6965d68df8-nlm7p                         500m (3%)     600m (3%)   512M (0%)        704M (1%)      46m
  imaginary-uat               ms-webapp-6cb7fb6c65-pfhsh                         100m (0%)     200m (1%)   200M (0%)        400M (0%)      46m
  imaginary-uat               mysql-0                                            0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
  imaginary-uat               redis-0                                            0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests         Limits
  --------           --------         ------
  cpu                2800m (17%)      2500m (15%)
  memory             3395905792 (5%)  2770231040 (4%)
  ephemeral-storage  2Gi (0%)         0 (0%)
  hugepages-1Gi      0 (0%)           0 (0%)
  hugepages-2Mi      0 (0%)           0 (0%)
Events:              <none>
  • 磁盘使用情况分析du:我们使用和命令查看了节点上的磁盘使用情况df

尽管我们做了上述努力,但我们仍无法确定 DiskPressure 问题的确切原因。我们怀疑它可能与过多的日志记录、大型容器镜像或资源分配效率低下有关,但我们不确定如何确认和解决这些怀疑。

因此,我恳请您提供以下方面的帮助:

  1. 分析和调试 Kubernetes 节点中的 DiskPressure 问题的最佳实践。
  2. 用于识别占用最多磁盘空间的特定进程或 pod 的工具或技术。
  3. 优化 Kubernetes 中的资源分配和磁盘使用以缓解 DiskPressure 问题的策略。
  4. 关于有效解决此问题的任何其他见解或建议。

任何建议、推荐或基于经验的见解都将不胜感激。提前感谢您的帮助!

相关内容