我在尝试使用 ansible 安装 Openshift 3 时遇到了几个问题。每次安装时出现的所有错误都不同,但它们总是依赖于一对节点
版本
openshift-ansible from git repo: openshift-ansible-3.6.173.0.32-1
ansible: 2.3.0.0
重现步骤
- 占用两个节点和一个主节点
- node1.my-site.com
- node2.my-site.com
- master.my-site.com
- 遵循 openshift 文档中的先决条件。
- 按照 openshift 文档中的主机准备操作。
- 写出大致相同的主机单主多节点
ansible-playbook playbooks/byo/config.yml
以下是我的真实剧本:
# to be save in /etc/ansible/hosts.
# coming from https://docs.openshift.org/latest/install_config/install/advanced_install.html#single-master
# Create an OSEv3 group that contains the masters and nodes groups
[OSEv3:children]
masters
nodes
# Set variables common for all OSEv3 hosts
[OSEv3:vars]
# SSH user, this user should allow ssh based auth without requiring a password
ansible_ssh_user=root
# If ansible_ssh_user is not root, ansible_become must be set to true
#ansible_become=true
openshift_deployment_type=origin
openshift_disable_check=memory_availability
# uncomment the following to enable htpasswd authentication; defaults to DenyAllPasswordIdentityProvider
#openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
# host group for masters
[masters]
master.my-site.com
# host group for etcd
[etcd]
master.my-site.com
# host group for nodes, includes region info
[nodes]
node1.my-site.com openshift_node_labels="{'region': 'primary', 'zone': 'default'}"
node2.my-site.com openshift_node_labels="{'region': 'primary', 'zone': 'default'}"
infra-node1.my-site.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
infra-node2.my-site.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
这基本上是文档中的复制粘贴,只是我改变了节点的区域。
重要提示:也许我做错了:node1.my-site.com 解析的 IP 与 infra-node1.my-site.com 相同,node2.my-site.com 解析的 IP 与 infra-node2.my-site.com 相同
我不知道这是否是正确的方法,但在文档中他们只讨论了 2 个节点,所以我怀疑它应该解析相同的 IP。
观察结果
大约 10 或 20 分钟后,部署因随机错误而失败:
Failure summary:
1. Host: node2.my-site.com
Play: Verify Requirements
Task: openshift_health_check
Message: One or more checks failed
Details: check "docker_storage":
Some dependencies are required in order to query docker storage on host:
Unable to install required packages on this host:
python-docker-py
Failure talking to yum: [Errno 2] No such file or directory: '/var/cache/yum/x86_64/7/epel/gen/primary_db.sqlite'
check "package_availability":
Error with yum repository configuration: updates: Check uncompressed DB failed
2. Host: infra-node2.my-site.com
Play: Verify Requirements
Task: openshift_health_check
Message: One or more checks failed
Details: check "docker_storage":
Some dependencies are required in order to query docker storage on host:
Unable to install required packages on this host:
python-docker-py
Failure talking to yum: updates: Check uncompressed DB failed
这个很奇怪'/var/cache/yum/x86_64/7/epel/gen/primary_db.sqlite'
有验证吗
Failure summary:
1. Host: infra-node1.my-site.com
Play: Verify Requirements
Task: openshift_health_check
Message: One or more checks failed
Details: check "package_version":
MODULE FAILURE
check "package_availability":
Error with yum repository configuration: File /var/cache/yum/x86_64/7/epel/metalink.xml is not XML
2. Host: node1.my-site.com
Play: Verify Requirements
Task: openshift_health_check
Message: One or more checks failed
Details: check "package_version":
MODULE FAILURE
3. Host: infra-node2.my-site.com
Play: Verify Requirements
Task: openshift_health_check
Message: One or more checks failed
Details: check "package_version":
MODULE FAILURE
check "package_availability":
Error with yum repository configuration: File /var/cache/yum/x86_64/7/epel/metalink.xml does not exist
4. Host: node2.my-site.com
Play: Verify Requirements
Task: openshift_health_check
Message: One or more checks failed
Details: check "package_version":
MODULE FAILURE
check "package_availability":
Error with yum repository configuration: File /var/cache/yum/x86_64/7/epel/metalink.xml does not exist
我确认这个文件不是 XML:
<?xml version="1.0" encoding="utf-8"?>
<metalink version="3.0" xmlns="http://www.metalinker.org/" type="dynamic" pubdate="Tue, 12 Sep 2017 20:22:13 GMT" generator="mirrormanager" xmlns:mm0="http://fedorahosted.org/mirrormanager">
<files>
<file name="repomd.xml">
<resources maxconnections="1">
<url protocol="blablabla">http://blablabla</url>
...
...
<url protocol="blablabla">http://blablabla</url>
</resources>
</file>
</files>
</metalink>
x86_64/repodata/repomd.xml</url> <==== What?????
<url protocol="blablabla">http://blablabla</url>
</resources>
</file>
</files>
</metalink>
第三个错误我完全不明白,因为它是一个 old.tmp (repomd.xml 在那里)
Failure summary:
1. Host: infra-node1.my-site.com
Play: Disable excluders
Task: openshift_excluder : Install docker excluder
Message: Failure talking to yum: [Errno 2] No such file or directory: '/var/cache/yum/x86_64/7/centos-openshift-origin/repomd.xml.old.tmp'
另一个:
Failure summary:
1. Host: node2.my-site.com
Play: Verify Requirements
Task: openshift_health_check
Message: One or more checks failed
Details: check "package_availability":
Error with yum repository configuration: updates: Check uncompressed DB failed
2. Host: infra-node2.my-site.com
Play: Verify Requirements
Task: openshift_health_check
Message: One or more checks failed
Details: check "package_availability":
Error with yum repository configuration: updates: Check uncompressed DB failed
最后一条:
Failure summary:
1. Host: node1.my-site.com
Play: Verify Requirements
Task: openshift_health_check
Message: One or more checks failed
Details: check "package_availability":
Unexpected error with yum repository: [Errno 2] No such file or directory: '/var/cache/yum/x86_64/7/epel/gen/primary_db.sqlite'
2. Host: node2.my-site.com
Play: Verify Requirements
Task: openshift_health_check
Message: One or more checks failed
Details: check "package_version":
MODULE FAILURE
check "package_availability":
Unexpected error with yum repository: /builddir/build/BUILD/Python-2.7.5/Objects/stringobject.c:3902: bad argument to internal function
3. Host: infra-node2.my-site.com
Play: Verify Requirements
Task: openshift_health_check
Message: One or more checks failed
Details: check "package_availability":
Unexpected error with yum install/update: database disk image is malformed
附加信息
提供任何可能有助于我们诊断问题的其他信息。
- 操作系统:
centos-release-7-3.1611.el7.centos.x86_64
- 我认为我没有对 hosts 文件进行正确的操作,因为关注的几乎总是节点 1 infra-node1、节点 2 infra-node2 对
- 主服务器有 15GB 内存和 2 个 vcpu,节点有 8GB 内存和 2 个 vcpu
- 我被 OVH 托管
- 出于安全原因我关闭了我的平台