kube-proxy 不适用于服务集群 IP

kube-proxy 不适用于服务集群 IP

我在四台运行 raspberrypi OS 11 的 raspberrypi 上安装了一个 k8s 1.23.3 集群(靶心)arm64;主要是通过以下本指南.
要点是使用此命令创建控制平面

kubeadm init --token={some_token} --kubernetes-version=v1.23.3 --pod-network-cidr=10.1.0.0/16 --service-cidr=10.11.0.0/16 --control-plane-endpoint=10.0.4.16 --node-name=rpi-1-1

然后我创建了自己的kube-verify命名空间,部署了回显服务器并为其创建了一个服务。

然而,我无法从任何节点访问该服务的集群 IP。为什么?请求只是超时,而对 pod 集群 IP 的请求则正常工作。
我怀疑我的kube-proxy系统没有正常工作。以下是我迄今为止调查的结果。

$ kubectl get services -n kube-verify -o=wide

NAME          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE   SELECTOR
echo-server   ClusterIP   10.11.213.180   <none>        8080/TCP   24h   app=echo-server
$ kubectl get pods -n kube-system -o=wide

NAME                              READY   STATUS    RESTARTS      AGE   IP          NODE      NOMINATED NODE   READINESS GATES
coredns-64897985d-47gpr           1/1     Running   1 (69m ago)   41h   10.1.0.5    rpi-1-1   <none>           <none>
coredns-64897985d-nf55w           1/1     Running   1 (69m ago)   41h   10.1.0.4    rpi-1-1   <none>           <none>
etcd-rpi-1-1                      1/1     Running   2 (69m ago)   41h   10.0.4.16   rpi-1-1   <none>           <none>
kube-apiserver-rpi-1-1            1/1     Running   2 (69m ago)   41h   10.0.4.16   rpi-1-1   <none>           <none>
kube-controller-manager-rpi-1-1   1/1     Running   2 (69m ago)   41h   10.0.4.16   rpi-1-1   <none>           <none>
kube-flannel-ds-5467m             1/1     Running   1 (69m ago)   28h   10.0.4.17   rpi-1-2   <none>           <none>
kube-flannel-ds-7wpvz             1/1     Running   1 (69m ago)   28h   10.0.4.18   rpi-1-3   <none>           <none>
kube-flannel-ds-9chxk             1/1     Running   1 (69m ago)   28h   10.0.4.19   rpi-1-4   <none>           <none>
kube-flannel-ds-x5rvx             1/1     Running   1 (69m ago)   29h   10.0.4.16   rpi-1-1   <none>           <none>
kube-proxy-8bbjn                  1/1     Running   1 (69m ago)   28h   10.0.4.17   rpi-1-2   <none>           <none>
kube-proxy-dw45d                  1/1     Running   1 (69m ago)   28h   10.0.4.18   rpi-1-3   <none>           <none>
kube-proxy-gkkxq                  1/1     Running   2 (69m ago)   41h   10.0.4.16   rpi-1-1   <none>           <none>
kube-proxy-ntl5w                  1/1     Running   1 (69m ago)   28h   10.0.4.19   rpi-1-4   <none>           <none>
kube-scheduler-rpi-1-1            1/1     Running   2 (69m ago)   41h   10.0.4.16   rpi-1-1   <none>           <none>
$ kubectl logs kube-proxy-gkkxq -n kube-system

I0220 13:52:02.281289       1 node.go:163] Successfully retrieved node IP: 10.0.4.16
I0220 13:52:02.281535       1 server_others.go:138] "Detected node IP" address="10.0.4.16"
I0220 13:52:02.281610       1 server_others.go:561] "Unknown proxy mode, assuming iptables proxy" proxyMode=""
I0220 13:52:02.604880       1 server_others.go:206] "Using iptables Proxier"
I0220 13:52:02.604966       1 server_others.go:213] "kube-proxy running in dual-stack mode" ipFamily=IPv4
I0220 13:52:02.605026       1 server_others.go:214] "Creating dualStackProxier for iptables"
I0220 13:52:02.605151       1 server_others.go:491] "Detect-local-mode set to ClusterCIDR, but no IPv6 cluster CIDR defined, , defaulting to no-op detect-local for IPv6"
I0220 13:52:02.606905       1 server.go:656] "Version info" version="v1.23.3"
W0220 13:52:02.614777       1 sysinfo.go:203] Nodes topology is not available, providing CPU topology
I0220 13:52:02.619535       1 conntrack.go:52] "Setting nf_conntrack_max" nf_conntrack_max=131072
I0220 13:52:02.620869       1 conntrack.go:100] "Set sysctl" entry="net/netfilter/nf_conntrack_tcp_timeout_close_wait" value=3600
I0220 13:52:02.660947       1 config.go:317] "Starting service config controller"
I0220 13:52:02.661015       1 shared_informer.go:240] Waiting for caches to sync for service config
I0220 13:52:02.662669       1 config.go:226] "Starting endpoint slice config controller"
I0220 13:52:02.662726       1 shared_informer.go:240] Waiting for caches to sync for endpoint slice config
I0220 13:52:02.762734       1 shared_informer.go:247] Caches are synced for service config 
I0220 13:52:02.762834       1 shared_informer.go:247] Caches are synced for endpoint slice config

我在这里注意到的是Nodes topology is not available,所以我进一步深入研究了 kube-proxy 配置,但没有什么特别之处。
如果我的集群中的节点拓扑确实存在问题,请向我提供一些有关如何解决此问题的资源,因为我无法根据此错误消息找到任何有意义的信息。

$ kubectl describe configmap kube-proxy -n kube-system

Name:         kube-proxy
Namespace:    kube-system
Labels:       app=kube-proxy
Annotations:  kubeadm.kubernetes.io/component-config.hash: sha256:edce433d45f2ed3a58ee400690184ad033594e8275fdbf52e9c8c852caa7124d

Data
====
config.conf:
----
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
bindAddressHardFail: false
clientConnection:
  acceptContentTypes: ""
  burst: 0
  contentType: ""
  kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
  qps: 0
clusterCIDR: 10.1.0.0/16
configSyncPeriod: 0s
conntrack:
  maxPerCore: null
  min: null
  tcpCloseWaitTimeout: null
  tcpEstablishedTimeout: null
detectLocalMode: ""
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
  masqueradeAll: false
  masqueradeBit: null
  minSyncPeriod: 0s
  syncPeriod: 0s
ipvs:
  excludeCIDRs: null
  minSyncPeriod: 0s
  scheduler: ""
  strictARP: false
  syncPeriod: 0s
  tcpFinTimeout: 0s
  tcpTimeout: 0s
  udpTimeout: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
showHiddenMetricsForVersion: ""
udpIdleTimeout: 0s
winkernel:
  enableDSR: false
  networkName: ""
  sourceVip: ""
kubeconfig.conf:
----
apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    server: https://10.0.4.16:6443
  name: default
contexts:
- context:
    cluster: default
    namespace: default
    user: default
  name: default
current-context: default
users:
- name: default
  user:
    tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token

BinaryData
====

Events:  <none>
$ kubectl -n kube-system exec kube-proxy-gkkxq cat /var/lib/kube-proxy/kubeconfig.conf

kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    server: https://10.0.4.16:6443
  name: default
contexts:
- context:
    cluster: default
    namespace: default
    user: default
  name: default
current-context: default
users:
- name: default
  user:
    tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token

mode 默认为iptables,如上面的日志所证实。
我还在所有节点上启用了 IP 转发。

$ sudo sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1

答案1

绒布可以通过从存储库应用清单来安装。

flannel Flannel 可以添加到任何现有的 Kubernetes 集群中,不过最简单的方法是在使用 pod 网络的任何 pod 启动之前 添加 。对于 Kubernetes v1.17+kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml

正如您在此文件中看到的,yaml默认情况下网络子网设置为10.244.0.0/16

  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }

kubeadm init命令要初始化集群,需要为集群网络指定一个子网,并且它需要与 CNI 中的子网相同。您可以查看更多选项

--pod-network-cidr 字符串 指定 pod 网络的 IP 地址范围。如果设置,控制平面将自动为每个节点分配 CIDR。

您初始化了一个集群,--pod-network-cidr=10.1.0.0/16并且集群的子网设置为与 flannel 清单的 yaml 文件中的子网不同"10.244.0.0/16",这就是它不起作用的原因。

有两个选项可以修复它:
第一 - 将 flannel 配置 yaml 中的子网更改为与集群初始化时应用的子网相同,在本例中是--pod-network-cidr=10.1.0.0/16(参见下面的脚本)
或者
第二 - 如果集群用于测试目的并且刚刚初始化,则销毁一个集群并从与 flannel 配置 yaml 相同的子网开始"Network": "10.244.0.0/16"

为了自动修改kube-flannel.yml,以下脚本基于yqjq可以使用命令:

#!/bin/bash

input=$1
output=$2

echo "Converting $input to $output"

netconf=$( yq '. | select(.kind == "ConfigMap") | select(.metadata.name == "kube-flannel-cfg") | .data."net-conf.json"' "$input" | jq 'fromjson | .Network="10.1.0.0/16"' | yq -R '.' )
kube_flannel_cfg=$( yq --yaml-output '. | select(.kind == "ConfigMap") | select(.metadata.name == "kube-flannel-cfg") | .data."net-conf.json"='"$netconf" "$input" )
everything_else=$( yq --yaml-output '. | select(.kind != "ConfigMap") | select(.metadata.name != "kube-flannel-cfg")' "$input" )
echo "$kube_flannel_cfg" >> "$output"
echo '---' >> "$output"
echo "$everything_else" >> "$output"

相关内容