我是 k8s 的新手,一直在尝试学习它。我在尝试设置时遇到了一个问题亚马逊AWS官方博客在我新安装的 k8s 集群上coredns
。下面详细介绍了这个问题。
集群和网络架构
目前,集群设置仅由1 master
(实例) 节点组成。网络由 CIDR 范围和CIDR 范围、、、组成。公共子网通过 将出站流量路由到互联网,私有子网通过代理 (实例) 将出站流量路由到互联网。私有子网也通过 相互连接。我通过另一个充当堡垒的实例访问我的私有子网上的机器。因此,总共有 3 台机器。ec2
1 VPC
10.0.0.0/16
4 Subnets
(public_1, 10.0.0.0/20)
(public_2, 10.0.16.0/20)
(private_1, 10.0.32.0/20)
(private_2, 10.0.48.0/20)
internet gateway
NAT
ec2
VPC Endpoint
ec2
- K8S 主实例。
- 堡垒实例(仅限 SSH)
- NAT 代理实例(私有子网 --> 互联网 [仅出站])
AWS 设置主实例
我已附加一个安全组,用于转发kubernetes.io/docs/reference/networking/ports-and-protocols我还创建了一个 IAM 角色(ec2 实例配置文件)并附加了列出的策略云提供商-aws.sigs.k8s.io/prerequisites。
在主实例上
- 我将实例的主机名设置为私有 ipv4 dns 名称
curl http://169.254.169.254/latest/meta-data/hostname
。 - 我使用 设置了 iptables
iptables -P FORWARD ACCEPT
。 - 我安装了
containerd
,,,。kubelet
kubectl
kubeadm
- 我加载了模块
br_netfilter
&overlay
。 - 我设置了内核标志
net.bridge.bridge-nf-call-iptables = 1
,即net.bridge.bridge-nf-call-ip6tables = 1
,,net.ipv4.ip_forward = 1
。 - 我补充
KUBELET_EXTRA_ARGS=--node-ip=<ipv4_address> --cloud-provider=external
说/etc/systemd/system/kubelet.service.d/20-aws.conf
。 - 我添加
--cloud-provider=external
了/etc/kubernetes/manifests/kubernetes-controller-manager.yaml
&/etc/kubernetes/manifests/kube-apiserver.yaml
。 - 我初始化了一个新的 k8s 集群,
kubeadm init --token--ttl 0
并且它初始化成功。 - 我
aws-vpc-cni-k8s
通过部署kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.16.4/config/master/aws-k8s-cni.yaml
。 - 我
aws-cloud-controller-manager
通过部署kubectl apply -k 'github.com/kubernetes/cloud-provider-aws/examples/existing-cluster/base/?ref=master'
。
问题
aws-cloud-controller-manager
和都已aws-vpc-cni
成功部署,并且没有显示任何错误。虽然节点显示为“就绪”,但我无法开始coredns
工作。它无法启动。 和journalctl -u kubelet.service
都没有journalctl -u containerd.service
透露任何有关问题来源的信息。我相信我的端口转发正确,我不确定是什么导致corends
无法运行。
我的假设是,由于配置映射,coredns
pod 正在将 DNS 查找从它/etc/resolv.conf
指向我的主节点。主节点不知道集群,因此查找失败。我可能错了。我不确定如何解决这个问题。/etc/resolv.conf
forward . /etc/resolv.conf
以下是相关日志,
kubectl logs coredns-76f75df574-465x5 -n kube-system
[INFO] plugin/reload: Running configuration SHA512 = 591cf328cccc12bc490481273e738df59329c62c0b729d94e8b61db9961c2fa5f046dd37f1cf888b953814040d180f52594972691cd6ff41be96639138a43908
CoreDNS-1.11.1
linux/amd64, go1.20.7, ae2bbc2
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:39074->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:47632->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:34254->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:39274->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:49990->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:54792->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:46469->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:33147->10.0.0.2:53: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[597383834]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231 (16-Mar-2024 09:09:06.572) (total time: 30002ms):
Trace[597383834]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (09:09:36.574)
Trace[597383834]: [30.002080972s] [30.002080972s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1358557007]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231 (16-Mar-2024 09:09:06.573) (total time: 30001ms):
Trace[1358557007]: ---"Objects listed" error:Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30001ms (09:09:36.574)
Trace[1358557007]: [30.001364092s] [30.001364092s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[784641658]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231 (16-Mar-2024 09:09:06.574) (total time: 30001ms):
Trace[784641658]: ---"Objects listed" error:Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout 30000ms (09:09:36.575)
Trace[784641658]: [30.001209965s] [30.001209965s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:33770->10.0.0.2:53: i/o timeout
[ERROR] plugin/errors: 2 908260726288404915.5102928999241227370. HINFO: read udp 10.0.43.148:54282->10.0.0.2:53: i/o timeout
kubectl describe pod coredns-76f75df574-465x5 -n kube-system
admin@ip-10-0-32-163:~$ kubectl describe pod coredns-76f75df574-465x5 -n kube-system
Name: coredns-76f75df574-465x5
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: coredns
Node: ip-10-0-32-163.ap-south-1.compute.internal/10.0.32.163
Start Time: Sat, 16 Mar 2024 09:09:05 +0000
Labels: k8s-app=kube-dns
pod-template-hash=76f75df574
Annotations: <none>
Status: Running
IP: 10.0.43.148
IPs:
IP: 10.0.43.148
Controlled By: ReplicaSet/coredns-76f75df574
Containers:
coredns:
Container ID: containerd://cb07b34ca26b0b5d51dfcf658ba03fed0979346da243abd98ab8190a6df46800
Image: registry.k8s.io/coredns/coredns:v1.11.1
Image ID: registry.k8s.io/coredns/coredns@sha256:1eeb4c7316bacb1d4c8ead65571cd92dd21e27359f0d4917f1a5822a73b75db1
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Running
Started: Sat, 16 Mar 2024 09:31:42 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 16 Mar 2024 09:24:46 +0000
Finished: Sat, 16 Mar 2024 09:26:40 +0000
Ready: False
Restart Count: 8
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5l9f6 (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
kube-api-access-5l9f6:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 27m default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/unreachable: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
Normal Scheduled 23m default-scheduler Successfully assigned kube-system/coredns-76f75df574-465x5 to ip-10-0-32-163.ap-south-1.compute.internal
Normal Created 23m kubelet Created container coredns
Normal Started 23m kubelet Started container coredns
Warning Unhealthy 23m kubelet Readiness probe failed: Get "http://10.0.43.148:8181/ready": dial tcp 10.0.43.148:8181: i/o timeout (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 21m (x5 over 22m) kubelet Liveness probe failed: Get "http://10.0.43.148:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Normal Killing 21m kubelet Container coredns failed liveness probe, will be restarted
Normal Pulled 21m (x2 over 23m) kubelet Container image "registry.k8s.io/coredns/coredns:v1.11.1" already present on machine
Warning Unhealthy 13m (x80 over 23m) kubelet Readiness probe failed: Get "http://10.0.43.148:8181/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Warning BackOff 3m36s (x28 over 10m) kubelet Back-off restarting failed container coredns in pod coredns-76f75df574-465x5_kube-system(73e7c967-cf8f-4ee5-b525-1839d64e5058)
kubectl get pods -A -o wide
admin@ip-10-0-32-163:~$ kubectl get pods -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default dnsutils 1/1 Running 1 (65m ago) 10h 10.0.35.236 ip-10-0-32-163.ap-south-1.compute.internal <none> <none>
kube-system aws-cloud-controller-manager-czwrm 1/1 Running 4 (65m ago) 19h 10.0.32.163 ip-10-0-32-163.ap-south-1.compute.internal <none> <none>
kube-system aws-node-4rtrh 2/2 Running 8 (65m ago) 19h 10.0.32.163 ip-10-0-32-163.ap-south-1.compute.internal <none> <none>
kube-system coredns-76f75df574-465x5 0/1 CrashLoopBackOff 11 (4m24s ago) 43m 10.0.43.148 ip-10-0-32-163.ap-south-1.compute.internal <none> <none>
kube-system coredns-76f75df574-b9r5t 0/1 CrashLoopBackOff 11 (2m53s ago) 43m 10.0.44.228 ip-10-0-32-163.ap-south-1.compute.internal <none> <none>
kube-system etcd-ip-10-0-32-163.ap-south-1.compute.internal 1/1 Running 4 (65m ago) 19h 10.0.32.163 ip-10-0-32-163.ap-south-1.compute.internal <none> <none>
kube-system kube-apiserver-ip-10-0-32-163.ap-south-1.compute.internal 1/1 Running 4 (65m ago) 19h 10.0.32.163 ip-10-0-32-163.ap-south-1.compute.internal <none> <none>
kube-system kube-controller-manager-ip-10-0-32-163.ap-south-1.compute.internal 1/1 Running 4 (65m ago) 19h 10.0.32.163 ip-10-0-32-163.ap-south-1.compute.internal <none> <none>
kube-system kube-proxy-6cg92 1/1 Running 4 (65m ago) 19h 10.0.32.163 ip-10-0-32-163.ap-south-1.compute.internal <none> <none>
kube-system kube-scheduler-ip-10-0-32-163.ap-south-1.compute.internal 1/1 Running 4 (65m ago) 19h 10.0.32.163 ip-10-0-32-163.ap-south-1.compute.internal <none> <none>
kubectl get all -A
admin@ip-10-0-32-163:~$ kubectl get all -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/aws-cloud-controller-manager-8d4m9 1/1 Running 3 (2m17s ago) 17h
kube-system pod/aws-node-lvl5p 2/2 Running 4 (3m7s ago) 17h
kube-system pod/coredns-76f75df574-jqjk4 0/1 Running 13 (38s ago) 10h
kube-system pod/coredns-76f75df574-nbd9c 0/1 Running 13 (50s ago) 10h
kube-system pod/dnsutils 1/1 Running 1 (3m7s ago) 10h
kube-system pod/etcd-ip-10-0-32-163.ap-south-1.compute.internal 1/1 Running 14 (3m7s ago) 17h
kube-system pod/kube-apiserver-ip-10-0-32-163.ap-south-1.compute.internal 1/1 Running 2 (3m7s ago) 17h
kube-system pod/kube-controller-manager-ip-10-0-32-163.ap-south-1.compute.internal 1/1 Running 2 (3m7s ago) 17h
kube-system pod/kube-proxy-r2sh2 1/1 Running 2 (3m7s ago) 17h
kube-system pod/kube-scheduler-ip-10-0-32-163.ap-south-1.compute.internal 1/1 Running 5 (3m7s ago) 17h
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.97.0.1 <none> 443/TCP 17h
kube-system service/kube-dns ClusterIP 10.97.0.10 <none> 53/UDP,53/TCP,9153/TCP 17h
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/aws-cloud-controller-manager 1 1 1 1 1 node-role.kubernetes.io/control-plane= 17h
kube-system daemonset.apps/aws-node 1 1 1 1 1 <none> 17h
kube-system daemonset.apps/kube-proxy 1 1 1 1 1 kubernetes.io/os=linux 17h
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 0/2 2 0 17h
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-76f75df574 2 2 0 10h
sudo iptables -vnL
admin@ip-10-0-32-163:~$ sudo iptables -vnL
Chain INPUT (policy ACCEPT 576K packets, 102M bytes)
pkts bytes target prot opt in out source destination
5175 311K KUBE-PROXY-FIREWALL 0 -- * * 0.0.0.0/0 0.0.0.0/0 ctstate NEW /* kubernetes load balancer firewall */
571K 100M KUBE-NODEPORTS 0 -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes health check service ports */
5175 311K KUBE-EXTERNAL-SERVICES 0 -- * * 0.0.0.0/0 0.0.0.0/0 ctstate NEW /* kubernetes externally-visible service portals */
575K 102M KUBE-FIREWALL 0 -- * * 0.0.0.0/0 0.0.0.0/0
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 KUBE-PROXY-FIREWALL 0 -- * * 0.0.0.0/0 0.0.0.0/0 ctstate NEW /* kubernetes load balancer firewall */
0 0 KUBE-FORWARD 0 -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding rules */
0 0 KUBE-SERVICES 0 -- * * 0.0.0.0/0 0.0.0.0/0 ctstate NEW /* kubernetes service portals */
0 0 KUBE-EXTERNAL-SERVICES 0 -- * * 0.0.0.0/0 0.0.0.0/0 ctstate NEW /* kubernetes externally-visible service portals */
0 0 ACCEPT 0 -- * * 0.0.0.0/0 0.0.0.0/0
Chain OUTPUT (policy ACCEPT 580K packets, 110M bytes)
pkts bytes target prot opt in out source destination
7345 442K KUBE-PROXY-FIREWALL 0 -- * * 0.0.0.0/0 0.0.0.0/0 ctstate NEW /* kubernetes load balancer firewall */
7345 442K KUBE-SERVICES 0 -- * * 0.0.0.0/0 0.0.0.0/0 ctstate NEW /* kubernetes service portals */
579K 110M KUBE-FIREWALL 0 -- * * 0.0.0.0/0 0.0.0.0/0
Chain KUBE-EXTERNAL-SERVICES (2 references)
pkts bytes target prot opt in out source destination
Chain KUBE-FIREWALL (2 references)
pkts bytes target prot opt in out source destination
0 0 DROP 0 -- * * !127.0.0.0/8 127.0.0.0/8 /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT
Chain KUBE-FORWARD (1 references)
pkts bytes target prot opt in out source destination
0 0 DROP 0 -- * * 0.0.0.0/0 0.0.0.0/0 ctstate INVALID
0 0 ACCEPT 0 -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding rules */ mark match 0x4000/0x4000
0 0 ACCEPT 0 -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes forwarding conntrack rule */ ctstate RELATED,ESTABLISHED
Chain KUBE-KUBELET-CANARY (0 references)
pkts bytes target prot opt in out source destination
Chain KUBE-NODEPORTS (1 references)
pkts bytes target prot opt in out source destination
Chain KUBE-PROXY-CANARY (0 references)
pkts bytes target prot opt in out source destination
Chain KUBE-PROXY-FIREWALL (3 references)
pkts bytes target prot opt in out source destination
Chain KUBE-SERVICES (2 references)
pkts bytes target prot opt in out source destination
0 0 REJECT 6 -- * * 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:metrics has no endpoints */ tcp dpt:9153 reject-with icmp-port-unreachable
0 0 REJECT 17 -- * * 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:dns has no endpoints */ udp dpt:53 reject-with icmp-port-unreachable
0 0 REJECT 6 -- * * 0.0.0.0/0 10.96.0.10 /* kube-system/kube-dns:dns-tcp has no endpoints */ tcp dpt:53 reject-with icmp-port-unreachable
kubectl cluster-info
Kubernetes control plane is running at https://10.0.32.163:6443
CoreDNS is running at https://10.0.32.163:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'
cat /etc/resolve.conf
(在我的主节点上)
admin@ip-10-0-32-163:~$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "resolvectl status" to see details about the actual nameservers.
nameserver 10.0.0.2
kubectl -n kube-system exec dnsutils -- cat /etc/resolv.conf
(在部署到 kube-system 的测试 pod 上调试 dns)
admin@ip-10-0-32-163:~$ kubectl -n kube-system exec dnsutils -- cat /etc/resolv.conf
search kube-system.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.97.0.10
options ndots:5
kubectl describe cm corends -n kube-system
admin@ip-10-0-32-163:~$ kubectl describe cm coredns -n kube-system
Name: coredns
Namespace: kube-system
Labels: <none>
Annotations: <none>
Data
====
Corefile:
----
.:53 {
log
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
BinaryData
====
Events: <none>
集群信息:
- Kubernetes 版本:
v1.29.2
- 使用的云:(如果不是公共云,则放置裸机)
AWS
安装方法:kubeadm init --pod-network-cidr=10.96.0.0/16 --service-cidr=10.97.0.0/16 --token-ttl 0
- 主机操作系统:
Linux ip-10-0-32-163.ap-south-1.compute.internal 6.1.0-18-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
- CNI 和版本:aws-vpc-k8s-cni v1.16.4
- CRI 和版本:
containerd containerd.io 1.6.28 ae07eda36dd25f8a1b98dfbf587313b99c0190bb