我正在部署以下 statefulset,包含两个不同的服务。一个服务用于集群访问 pod(crdb-service.yaml
),另一个服务用于 pod 的内部通信(crdb.yaml
)。
crdb-service.yaml
apiVersion: v1
kind: Service
metadata:
name: crdb-service
labels:
app: crdb
spec:
ports:
- port: 26257
targetPort: 26257
name: grpc
- port: 80
targetPort: 8080
name: http
selector:
app: crdb
crdb.yaml
apiVersion: v1
kind: Service
metadata:
name: crdb
labels:
app: crdb
annotations:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
prometheus.io/scrape: "true"
prometheus.io/path: "_status/vars"
prometheus.io/port: "8080"
spec:
ports:
- port: 26257
targetPort: 26257
name: grpc
- port: 8080
targetPort: 8080
name: http
publishNotReadyAddresses: true
clusterIP: None
selector:
app: crdb
statefulset.yaml
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: crdb
labels:
app: crdb
spec:
serviceName: "crdb"
replicas: 5
template:
metadata:
labels:
app: crdb
spec:
serviceAccountName: crdb
containers:
- name: crdb
image: cockroachdb/cockroach:v19.1.2
imagePullPolicy: IfNotPresent
ports:
- containerPort: 26257
name: grpc
- containerPort: 8080
name: http
livenessProbe:
httpGet:
path: "/health"
port: http
scheme: HTTPS
initialDelaySeconds: 30
periodSeconds: 5
readinessProbe:
httpGet:
path: "/health?ready=1"
port: http
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 2
volumeMounts:
- name: datadir
mountPath: /cockroach/cockroach-data
- name: certs
mountPath: /cockroach/cockroach-certs
env:
- name: STATEFULSET_NAME
value: "crdb"
- name: STATEFULSET_FQDN
value: "crdb.default.svc.cluster.local"
- name: COCKROACH_CHANNEL
value: kubernetes-secure
command:
- "/bin/bash"
- "-ecx"
- "exec /cockroach/cockroach start --logtostderr --certs-dir /cockroach/cockroach-certs --advertise-host $(hostname -f) --http-host 0.0.0.0 --join crdb-0.crdb,crdb-1.crdb,crdb-2.crdb,crdb-3.crdb,crdb-4.crdb --cache 25% --max-sql-memory 25%"
terminationGracePeriodSeconds: 60
volumes:
- name: datadir
persistentVolumeClaim:
claimName: datadir
- name: certs
emptyDir: {}
podManagementPolicy: Parallel
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- metadata:
name: datadir
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: local-crdb-space
resources:
requests:
storage: 1800Gi
现在我检查已部署的服务:
$ kubectl describe service crdb
Name: crdb
Namespace: default
Labels: app=crdb
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/path":"_status/vars","prometheus.io/port":"8080","prometheus.io/scrape":"...
prometheus.io/path=_status/vars
prometheus.io/port=8080
prometheus.io/scrape=true
service.alpha.kubernetes.io/tolerate-unready-endpoints=true
Selector: app=crdb
Type: ClusterIP
IP: None
Port: grpc 26257/TCP
TargetPort: 26257/TCP
Endpoints: 10.244.10.24:26257,10.244.2.23:26257,10.244.3.18:26257 + 2 more...
Port: http 8080/TCP
TargetPort: 8080/TCP
Endpoints: 10.244.10.24:8080,10.244.2.23:8080,10.244.3.18:8080 + 2 more...
Session Affinity: None
Events: <none>
$ kubectl describe service crdb-service
Name: crdb-service
Namespace: default
Labels: app=crdb
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"crdb"},"name":"crdb-service","namespace":"default"},"spec":{"ports":[...
Selector: app=crdb
Type: ClusterIP
IP: 10.100.71.172
Port: grpc 26257/TCP
TargetPort: 26257/TCP
Endpoints:
Port: http 80/TCP
TargetPort: 8080/TCP
Endpoints:
Session Affinity: None
Events: <none>
尽管具有完全相同的标签选择器,但集群服务的端点字段为空。检查https://github.com/kubernetes/kubernetes/issues/11795 https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service没有发现原因。
一些可能与当前问题相关的其他信息。我已将集群从 1.13 更新到 1.14 再到 1.15。Pod 在最近添加到集群的节点上运行。之前在新节点上部署的 Pod 存在网络问题(由于 DNS 故障而无法访问,通过net.ipv4.ip_forward = 1
在新节点上进行设置解决了此问题)
我怎样才能让服务识别这些吊舱?
答案1
publishNotReadyAddresses: true
NVM,浪费了 2 个小时。它只是需要添加到服务中的字段,用于在启动时发布其 IP 的 Pod。