kubectl top node 不起作用。看起来像是 heapster 的问题

kubectl top node 不起作用。看起来像是 heapster 的问题

我在 gke 上有一个新的 k8s 集群。

每当我跑步时kubectl top node gke-data-custom-vm-6-25-0cbae9b9-hrkc 我都会

Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)

同时我还有这项服务:

> kubectl -n kube-system get services
NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
default-http-backend   NodePort    10.11.241.20    <none>        80:32688/TCP    59d
heapster               ClusterIP   10.11.245.182   <none>        80/TCP          59d
kube-dns               ClusterIP   10.11.240.10    <none>        53/UDP,53/TCP   59d
metrics-server         ClusterIP   10.11.249.26    <none>        443/TCP         59d

带有 heapster 的 pod 正在运行(我可以看到它重启了很多次)

 kubectl -n kube-system get pods
NAME                                               READY     STATUS    RESTARTS   AGE
event-exporter-v0.2.3-85644fcdf-kwd6g              2/2       Running   0          16d
fluentd-gcp-scaler-8b674f786-dbrcr                 1/1       Running   0          16d
fluentd-gcp-v3.2.0-2fqgl                           2/2       Running   0          17d
fluentd-gcp-v3.2.0-47586                           2/2       Running   0          17d
fluentd-gcp-v3.2.0-552xm                           2/2       Running   0          16d
heapster-v1.6.0-beta.1-fdc7fd478-8s998             3/3       Running   73         16d

但是我在 heapster-nanny 容器的日志中看到一些错误:

> kubectl logs -n kube-system --tail 10 -f po/heapster-v1.6.0-beta.1-fdc7fd478-8s998 -c heapster-nanny
ERROR: logging before flag.Parse: E0418 23:30:10.075539       1 nanny_lib.go:95] Error while querying apiserver for resources: Get https://10.11.240.1:443/api/v1/namespaces/kube-system/pods/heapster-v1.6.0-beta.1-fdc7fd478-8s998: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:10.971230       1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:11.972337       1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:12.973637       1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:13.975024       1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:14.976582       1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:16.063760       1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:27.065693       1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: net/http: TLS handshake timeout
ERROR: logging before flag.Parse: E0418 23:30:30.077159       1 nanny_lib.go:95] Error while querying apiserver for resources: Get https://10.11.240.1:443/api/v1/namespaces/kube-system/pods/heapster-v1.6.0-beta.1-fdc7fd478-8s998: net/http: TLS handshake timeout
ERROR: logging before flag.Parse: E0418 23:30:59.778560       1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: i/o timeout

以及在 heapster 容器中

I0423 07:02:10.765134       1 heapster.go:113] Starting heapster on port 8082
W0423 07:16:27.975467       1 manager.go:152] Failed to get all responses in time (got 2/3)
W0423 07:16:43.064110       1 manager.go:107] Failed to get kubelet_summary:10.128.0.49:10255 response in time
W0423 07:20:36.875359       1 manager.go:152] Failed to get all responses in time (got 2/3)
W0423 07:20:44.383790       1 manager.go:107] Failed to get kubelet_summary:10.128.0.49:10255 response in time
W0423 07:22:29.683060       1 manager.go:152] Failed to get all responses in time (got 2/3)
W0423 07:22:40.278962       1 manager.go:107] Failed to get kubelet_summary:10.128.0.49:10255 response in time
W0423 07:31:27.072711       1 manager.go:152] Failed to get all responses in time (got 2/3)
W0423 07:31:54.580031       1 manager.go:107] Failed to get kubelet_summary:10.128.0.49:10255 response in time

我怎样才能解决这个问题?

我还应该提供其他信息吗?

答案1

Heapster 弃用

Heapster 是一个已弃用的项目,在最近的 Kubernetes 版本中运行时可能会出现问题。

Heapster 弃用时间表

| Kubernetes Release  | Action              | Policy/Support                                                                   |
|---------------------|---------------------|----------------------------------------------------------------------------------|
| Kubernetes 1.11     | Initial Deprecation | No new features or sinks are added.  Bugfixes may be made.                       |
| Kubernetes 1.12     | Setup Removal       | The optional to install Heapster via the Kubernetes setup script is removed.     |
| Kubernetes 1.13     | Removal             | No new bugfixes will be made.  Move to kubernetes-retired organization.          |

从 Kubernetes v1.10 开始,kubectl top依赖于指标服务器默认情况下。

CHANGELOG-1.10.md

  • 在命令中支持指标 API kubectl top。(#56206, @brancz)

此 PR 实现了对kubectl top使用 metrics-server 作为聚合 API 的命令的支持,而不是直接从 heapster 请求指标。如果 APImetrics.k8s.io不是由 apiserver 提供的,则仍将恢复到以前的行为。


你应该做什么:

作为堆垛机已被弃用,并且您已经有一个指标服务器部署后,最好的选择是使用kubectl版本v1.10或更高版本,因为它从指标服务器获取指标。

然而,要小心kubectl版本倾斜策略

kubectl在以下一个次要版本(较旧或较新)内受支持 kube-apiserver

kube-apiserver在选择版本之前,请检查您的版本kubectl

答案2

我猜您的问题可能与 GKE 主节点的自动升级有关。

我最近升级到了,升级过程中,我发现容器v1.11.8-gke.6内出现了相同的间歇性错误:heapster-nanny

(错误代码:E0418)

对我来说,问题不再存在,我可以安全地获取节点的指标kubectl

相关内容