我有两台 coreos stable v1122.2.0 机器,每台机器的 etcd2 都配置了 tls。
我使用以下方式创建了证书https://github.com/coreos/etcd/tree/master/hack/tls-setup。
现在我正在尝试配置 calico-node 以使用 rkt 在我的 coreos 主节点上运行。
我在 cloud-config 配置中有以下内容:
write_files:
- path: "/etc/kubernetes/cni/net.d/10-calico.conf"
content: |
{
"name": "calico",
"type": "flannel",
"delegate": {
"type": "calico",
"etcd_endpoints": "https://10.79.218.2:2379,https://10.79.218.3:2379",
"log_level": "none",
"log_level_stderr": "info",
"hostname": "10.79.218.2",
"policy": {
"type": "k8s",
"k8s_api_root": "http://127.0.0.1:8080/api/v1/"
}
}
}
- path: "/etc/kubernetes/manifests/policy-controller.yaml"
content: |
apiVersion: v1
kind: Pod
metadata:
name: calico-policy-controller
namespace: calico-system
spec:
hostNetwork: true
containers:
# The Calico policy controller.
- name: k8s-policy-controller
image: calico/kube-policy-controller:v0.2.0
env:
- name: ETCD_ENDPOINTS
value: "https://10.79.218.2:2379,https://10.79.218.3:2379"
- name: K8S_API
value: "http://127.0.0.1:8080"
- name: LEADER_ELECTION
value: "true"
# Leader election container used by the policy controller.
- name: leader-elector
image: quay.io/calico/leader-elector:v0.1.0
imagePullPolicy: IfNotPresent
args:
- "--election=calico-policy-election"
- "--election-namespace=calico-system"
- "--http=127.0.0.1:4040"
...
units:
- name: calico-node.service
enable: true
command: start
content: |
[Unit]
Description=Calico per-host agent
Requires=network-online.target
After=network-online.target
[Service]
Slice=machine.slice
Environment=CALICO_DISABLE_FILE_LOGGING=true
Environment=HOSTNAME=10.79.218.2
Environment=IP=10.79.218.2
Environment=FELIX_FELIXHOSTNAME=10.79.218.2
Environment=CALICO_NETWORKING=false
Environment=NO_DEFAULT_POOLS=true
Environment=ETCD_ENDPOINTS=https://10.79.218.2:2379,https://10.79.218.3:2379
ExecStart=/usr/bin/rkt run --inherit-env --stage1-from-dir=stage1-fly.aci \
--volume=modules,kind=host,source=/lib/modules,readOnly=false \
--mount=volume=modules,target=/lib/modules \
--trust-keys-from-https quay.io/calico/node:v0.19.0
KillMode=mixed
Restart=always
TimeoutStartSec=0
[Install]
WantedBy=multi-user.target
请忽略空格缩进..我认为我没有正确复制/粘贴它:)
当我尝试启动 calico-node 服务时出现以下错误:
Sep 14 05:45:17 localhost systemd[1]: Started Calico per-host agent.
Sep 14 05:45:17 localhost rkt[1644]: image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
Sep 14 05:45:18 localhost rkt[1644]: image: using image from local store for image name quay.io/calico/node:v0.19.0
Sep 14 05:45:25 localhost rkt[1644]: Traceback (most recent call last):
Sep 14 05:45:25 localhost rkt[1644]: File "startup.py", line 292, in <module>
Sep 14 05:45:25 localhost rkt[1644]: client = IPAMClient()
Sep 14 05:45:25 localhost rkt[1644]: File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 228, in __init__
Sep 14 05:45:25 localhost rkt[1644]: "%s" % (ETCD_CA_CERT_FILE_ENV, etcd_ca))
Sep 14 05:45:25 localhost rkt[1644]: pycalico.datastore_errors.DataStoreError: Invalid ETCD_CA_CERT_FILE. Certificate Authority cert is required and m
Sep 14 05:45:25 localhost rkt[1644]: Calico node failed to start
Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Main process exited, code=exited, status=1/FAILURE
Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Unit entered failed state.
Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Failed with result 'exit-code'.
Sep 14 05:45:25 localhost systemd[1]: calico-node.service: Service hold-off time over, scheduling restart.
Sep 14 05:45:25 localhost systemd[1]: Stopped Calico per-host agent.
Sep 14 05:45:25 localhost systemd[1]: Started Calico per-host agent.
Sep 14 05:45:25 localhost rkt[1714]: image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
Sep 14 05:45:26 localhost rkt[1714]: image: using image from local store for image name quay.io/calico/node:v0.19.0
Sep 14 05:45:28 localhost rkt[1714]: Traceback (most recent call last):
Sep 14 05:45:28 localhost rkt[1714]: File "startup.py", line 292, in <module>
Sep 14 05:45:28 localhost rkt[1714]: client = IPAMClient()
Sep 14 05:45:28 localhost rkt[1714]: File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 228, in __init__
Sep 14 05:45:28 localhost rkt[1714]: "%s" % (ETCD_CA_CERT_FILE_ENV, etcd_ca))
Sep 14 05:45:28 localhost rkt[1714]: pycalico.datastore_errors.DataStoreError: Invalid ETCD_CA_CERT_FILE. Certificate Authority cert is required and m
第 2-25 行
所以我明白了Invalid ETCD_CA_CERT_FILE.
。我并没有真正向 calico 指定要使用的键。所以我想我缺少一些配置。
我在 /etc/ssl/etcd 中有以下与 etc 相关的键
8 -rw-------. 1 etcd etcd 1050 Sep 14 05:45 ca.pem
8 -rw-------. 1 etcd etcd 289 Sep 14 05:45 etcd1-key.pem
8 -rw-------. 1 etcd etcd 1058 Sep 14 05:45 etcd1.pem
8 -rw-------. 1 etcd etcd 227 Sep 12 03:49 server1-key.pem
8 -rw-------. 1 etcd etcd 822 Sep 12 03:49 server1.pem
我尝试添加Environment=ETCD_CA_CERT_FILE=/etc/ssl/etcd/ca.pem
到 calico-node systemd 文件,但得到完全相同的结果。
有任何想法吗 ?
更新
所以我尝试手动运行 calico,而不是使用 systemd。我还添加了 calico 所需的所有必需的环境变量
export CALICO_DISABLE_FILE_LOGGING=true
export HOSTNAME=10.79.218.2
export IP=10.79.218.2
export FELIX_FELIXHOSTNAME=10.79.218.2
export CALICO_NETWORKING=false
export NO_DEFAULT_POOLS=true
export ETCD_ENDPOINTS=https://10.79.218.2:2379,https://10.79.218.3:2379
export ETCD_AUTHORITY=10.79.218.2:2379
export ETCD_SCHEME=https
export ETCD_CA_CERT_FILE=/etc/ssl/etcd/ca.pem
export ETCD_CERT_FILE=/etc/ssl/etcd/etcd1.pem
export ETCD_KEY_FILE=/etc/ssl/etcd/etcd1-key.pem
当我尝试使用以下命令执行 calico 容器时:
/usr/bin/rkt run --inherit-env --stage1-from-dir=stage1-fly.aci \
--volume=modules,kind=host,source=/lib/modules,readOnly=false \
--mount=volume=modules,target=/lib/modules \
--trust-keys-from-https quay.io/calico/node:v0.19.0
我明白了
image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
image: using image from local store for image name quay.io/calico/node:v0.19.0
Traceback (most recent call last):
File "startup.py", line 292, in <module>
client = IPAMClient()
File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 221, in __init__
ETCD_CERT_FILE_ENV, etcd_cert))
pycalico.datastore_errors.DataStoreError: Cannot read ETCD_KEY_FILE and/or ETCD_CERT_FILE. Both must be readable file paths. Values provided: ETCD_KEY_FILE=/etc/ssl/etcd/etcd1-key.pem, ETCD_CERT_FILE=/etc/ssl/etcd/etcd1.pem
我将证书文件的文件权限更改为 666,但这并不能解决问题。并且我知道这些证书是有效的,因为 etcd tls 运行正常。那么我遗漏了什么呢?
更新 2
看来我缺少在 calico 容器上安装证书目录。
所以现在我正在运行 calico 容器
/usr/bin/rkt run --volume etcd-ssl,kind=host,source=/etc/ssl/etcd/,readOnly=true --inherit-env --stage1-from-dir=stage1-fly.aci --volume=modules,kind=host,source=/lib/modules,readOnly=false --mount=volume=modules,target=/lib/modules --trust-keys-from-https quay.io/calico/node:v0.19.0 --mount volume=etcd-ssl,target=/etc/ssl/etcd
我得到以下输出:
image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
image: using image from local store for image name quay.io/calico/node:v0.19.0
Traceback (most recent call last):
File "startup.py", line 292, in <module>
client = IPAMClient()
File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 246, in __init__
allow_reconnect=True)
File "/usr/lib/python2.7/site-packages/etcd/client.py", line 204, in __init__
set(self.machines))
File "/usr/lib/python2.7/site-packages/etcd/client.py", line 299, in machines
return self.machines
File "/usr/lib/python2.7/site-packages/etcd/client.py", line 301, in machines
raise etcd.EtcdException("Could not get the list of servers, "
etcd.EtcdException: Could not get the list of servers, maybe you provided the wrong host(s) to connect to?
Calico node failed to start
我更接近了一点...但仍然没有解决方案。
更新 3
我尝试通过运行将 ETCD_ENDPOINTS 设置为 coreos 机器上的 etcd 服务器export ETCD_ENDPOINTS=https://10.79.218.2:2379
,现在当我尝试运行 calico rkt 图像时,我得到:
image: using image from file /usr/lib64/rkt/stage1-images/stage1-fly.aci
image: using image from local store for image name quay.io/calico/node:v0.19.0
Traceback (most recent call last):
File "startup.py", line 295, in <module>
main()
File "startup.py", line 251, in main
warn_if_hostname_conflict(ip)
File "startup.py", line 192, in warn_if_hostname_conflict
current_ipv4, _ = client.get_host_bgp_ips(hostname)
File "/usr/lib/python2.7/site-packages/pycalico/datastore.py", line 132, in wrapped
"running?" % (fn.__name__, e.message))
pycalico.datastore_errors.DataStoreError: get_host_bgp_ips: Error accessing etcd (Connection to etcd failed due to SSLError(CertificateError("hostname '10.79.218.2' doesn't match u'etcd'",),)). Is etcd running?
Calico node failed to start
答案1
我也遇到了这个问题,最终通过查看 etcd 连接逻辑和所使用的库的代码以及来自 Calico 团队在他们的 Slack 频道中的一些指针找到了问题的根源。
问题在于,Calico 的当前版本(至少到 0.22.0)使用的 Python etcd 客户端不支持 TLS 证书中的 IP SAN(Subject Alt Name)。这意味着您使用的证书无法正确关联到它们所配置的 etcd 服务器。
这在此进行了描述GitHub 问题。
要解决此问题,您必须等到 urllib 库的新版本发布、etcd 客户端采用该库、发布新版本,并且更新 Calico 以使用新的 etcd 客户端。或者,您可以使用 FQDN 而不是 SAN 字段中的 IP 地址重新生成证书。这意味着您需要确保您的服务器可以通过这些名称访问,无论是使用 DNS 还是/etc/hosts
正确设置。用于生成证书的 OpenSSL 配置应包含以下内容:
[alt_names]
DNS.1 = $ENV::FQDN
描述您如何生成证书的链接使用CFSSL因此我建议阅读其文档,了解如何改为使用主机名而不是 IP 地址。我相信这可能很简单,只需修改 JSON 配置即可,如下所示:
"hosts": [
"example.com",
"www.example.com"
],
答案2
我发现,如果满足以下条件,使用这个不稳定的库我可以成功:客户端打开与 IP 地址的连接;服务器的证书在主题中声明该 IP 地址;并且服务器的证书在主题备用名称列表中没有任何 DNS 类型条目。以下是示例服务器证书的选定输出,当客户端使用 IP 地址打开连接以识别服务器openssl x509 -text ...
时,该证书有效:10.10.10.1
...
Subject: CN=10.10.10.1
...
X509v3 extensions:
X509v3 Basic Constraints:
CA:FALSE
X509v3 Key Usage:
Digital Signature, Non Repudiation, Key Encipherment
X509v3 Subject Alternative Name:
IP Address:100.127.0.2, IP Address:100.127.0.2, IP Address:10.10.10.1
...
此外,Calico 镜像也有更新版本。我只听说过两件关于 的坏事calico/node:v0.23.0
。一件来自别人 ---https://calicousers.slack.com/archives/kubernetes/p1478206011002345我自己也对该图像进行了一些测试,只发现一个问题,https://github.com/projectcalico/calico-containers/issues/1107。目前有 v1.0.0 betas 和 rc1,我没听到关于它们的不好的消息。