自从我最初构建 OpenStack 云以来已经过去了几个月,我违背自己的判断,sudo apt-get update ; sudo apt-get upgrade
在我的节点上运行它。这是一个坏主意。
重启后一切似乎都很正常,nagios 报告服务没有问题,但是当我启动我的实例时,它们都无法获取 IP。因此,当我开始在 neutron 中进行调查时,我在 JUJU 中看到大量错误。我甚至不知道从哪里开始。
当节点升级时,他们询问了我一些配置更改,我选择了(N)进行任何修改。我猜这就是问题所在?
landscape@juju-machine-0-lxc-1:~$ juju status --format=tabular
[Services]
NAME STATUS EXPOSED CHARM
base-machine error false cs:trusty/ubuntu-6
ceilometer active false cs:trusty/ceilometer-171
ceilometer-agent false cs:trusty/ceilometer-agent-167
ceph-mon active false cs:~openstack-charmers-next/trusty/ceph-mon-137
ceph-osd active false cs:trusty/ceph-osd-169
ceph-radosgw active false cs:trusty/ceph-radosgw-173
cinder error false cs:trusty/cinder-188
glance active false cs:trusty/glance-185
keystone active false cs:trusty/keystone-253
landscape-client false cs:trusty/landscape-client-12
mongodb unknown false cs:trusty/mongodb-35
mysql active false cs:trusty/percona-cluster-178
nagios unknown false cs:trusty/nagios-10
neutron-api active false cs:trusty/neutron-api-177
neutron-gateway error false cs:trusty/neutron-gateway-163
neutron-openvswitch false cs:trusty/neutron-openvswitch-169
nova-cloud-controller active false cs:trusty/nova-cloud-controller-220
nova-compute error false cs:trusty/nova-compute-190
nrpe false cs:trusty/nrpe-7
ntp false cs:trusty/ntp-15
ntpmaster unknown false cs:trusty/ntpmaster-2
openstack-dashboard active false cs:trusty/openstack-dashboard-175
rabbitmq-server error false cs:trusty/rabbitmq-server-43
[Units]
ID WORKLOAD-STATE AGENT-STATE VERSION MACHINE PORTS PUBLIC-ADDRESS MESSAGE
base-machine/0 error idle 1.25.6 0 node01.maas hook failed: "leader-elected"
landscape-client/0 unknown idle 1.25.6 node01.maas
ntp/0 unknown idle 1.25.6 node01.maas
base-machine/1 unknown idle 1.25.6 2 node02.maas
landscape-client/9 unknown idle 1.25.6 node02.maas
ntp/1 error idle 1.25.6 node02.maas hook failed: "leader-elected"
base-machine/2 unknown idle 1.25.6 1 node03.maas
landscape-client/10 unknown idle 1.25.6 node03.maas
ntp/2 unknown idle 1.25.6 node03.maas
ceilometer/0 active idle 1.25.6 0/lxc/2 8777/tcp 10.14.0.47 Unit is ready
landscape-client/5 unknown idle 1.25.6 10.14.0.47
nrpe/4 unknown idle 1.25.6 10.14.0.47
ceph-mon/0 active idle 1.25.6 0/lxc/4 10.14.0.53 Unit is ready and clustered
landscape-client/2 unknown idle 1.25.6 10.14.0.53
nrpe/1 unknown idle 1.25.6 10.14.0.53
ceph-mon/1 active idle 1.25.6 2/lxc/4 10.14.0.60 Unit is ready and clustered
landscape-client/14 unknown idle 1.25.6 10.14.0.60
nrpe/10 unknown idle 1.25.6 10.14.0.60
ceph-mon/2 active idle 1.25.6 1/lxc/0 10.14.0.62 Unit is ready and clustered
landscape-client/19 unknown idle 1.25.6 10.14.0.62
nrpe/13 unknown idle 1.25.6 10.14.0.62
ceph-osd/0 active idle 1.25.6 0 node01.maas Unit is ready (2 OSD)
landscape-client/1 unknown idle 1.25.6 node01.maas
nrpe/0 unknown idle 1.25.6 node01.maas
ceph-osd/1 active idle 1.25.6 2 node02.maas Unit is ready (5 OSD)
landscape-client/11 unknown idle 1.25.6 node02.maas
nrpe/8 unknown idle 1.25.6 node02.maas
ceph-osd/2 active idle 1.25.6 1 node03.maas Unit is ready (5 OSD)
landscape-client/12 unknown idle 1.25.6 node03.maas
nrpe/9 error idle 1.25.6 node03.maas hook failed: "config-changed"
ceph-radosgw/0 active idle 1.25.6 2/lxc/0 80/tcp 10.14.0.56 Unit is ready
landscape-client/16 unknown idle 1.25.6 10.14.0.56
cinder/0 error idle 1.25.6 1/lxc/2 10.14.0.64 hook failed: "update-status"
landscape-client/22 unknown idle 1.25.6 10.14.0.64
nrpe/16 unknown idle 1.25.6 10.14.0.64
glance/0 active idle 1.25.6 0/lxc/5 9292/tcp 10.14.0.54 Unit is ready
landscape-client/4 unknown idle 1.25.6 10.14.0.54
nrpe/3 unknown idle 1.25.6 10.14.0.54
keystone/0 active idle 1.25.6 2/lxc/2 10.14.0.58 Unit is ready
landscape-client/18 unknown idle 1.25.6 10.14.0.58
nrpe/12 unknown idle 1.25.6 10.14.0.58
mongodb/0 unknown idle 1.25.6 1/lxc/3 27017/tcp,27019/tcp,27021/tcp,28017/tcp 10.14.0.65
landscape-client/20 unknown idle 1.25.6 10.14.0.65
nrpe/14 unknown idle 1.25.6 10.14.0.65
mysql/0 active idle 1.25.6 0/lxc/1 10.14.0.50 Unit is ready
landscape-client/7 unknown idle 1.25.6 10.14.0.50
nrpe/6 unknown idle 1.25.6 10.14.0.50
nagios/0 unknown idle 1.25.6 2/lxc/3 80/tcp 10.14.0.59
landscape-client/15 unknown idle 1.25.6 10.14.0.59
neutron-api/0 active idle 1.25.6 1/lxc/4 9696/tcp 10.14.0.66 Unit is ready
landscape-client/23 unknown idle 1.25.6 10.14.0.66
nrpe/17 unknown idle 1.25.6 10.14.0.66
neutron-gateway/0 error idle 1.25.6 0 node01.maas hook failed: "config-changed"
landscape-client/6 unknown idle 1.25.6 node01.maas
nrpe/5 unknown idle 1.25.6 node01.maas
nova-cloud-controller/0 active idle 1.25.6 0/lxc/0 3333/tcp,8773/tcp,8774/tcp,9696/tcp 10.14.0.49 Unit is ready
landscape-client/8 unknown idle 1.25.6 10.14.0.49
nrpe/7 unknown idle 1.25.6 10.14.0.49
nova-compute/0 error idle 1.25.6 2 node02.maas hook failed: "update-status"
ceilometer-agent/0 active idle 1.25.6 node02.maas Unit is ready
landscape-client/17 unknown idle 1.25.6 node02.maas
neutron-openvswitch/0 active idle 1.25.6 node02.maas Unit is ready
nrpe/11 unknown idle 1.25.6 node02.maas
nova-compute/1 error idle 1.25.6 1 node03.maas hook failed: "update-status"
ceilometer-agent/1 active idle 1.25.6 node03.maas Unit is ready
landscape-client/21 unknown idle 1.25.6 node03.maas
neutron-openvswitch/1 active idle 1.25.6 node03.maas Unit is ready
nrpe/15 unknown idle 1.25.6 node03.maas
ntpmaster/0 unknown idle 1.25.6 2/lxc/1 123/udp 10.14.0.57
landscape-client/13 unknown idle 1.25.6 10.14.0.57
openstack-dashboard/0 active idle 1.25.6 1/lxc/1 80/tcp,443/tcp 10.14.0.63 Unit is ready
landscape-client/24 unknown idle 1.25.6 10.14.0.63
nrpe/18 unknown idle 1.25.6 10.14.0.63
rabbitmq-server/0 error idle 1.25.6 0/lxc/3 5672/tcp 10.14.0.52 hook failed: "update-status"
landscape-client/3 unknown idle 1.25.6 10.14.0.52
nrpe/2 unknown idle 1.25.6 10.14.0.52
[Machines]
ID STATE VERSION DNS INS-ID SERIES HARDWARE
0 started 1.25.6 node01.maas /MAAS/api/1.0/nodes/node-be8673ca-1d31-11e6-a83b-0015c5efa6ff/ trusty arch=amd64 cpu-cores=8 mem=32768M
1 started 1.25.6 node03.maas /MAAS/api/1.0/nodes/node-b672c22e-1d31-11e6-82b6-0015c5efa6ff/ trusty arch=amd64 cpu-cores=8 mem=32768M
2 started 1.25.6 node02.maas /MAAS/api/1.0/nodes/node-ba12aac0-1d31-11e6-89e9-0015c5efa6ff/ trusty arch=amd64 cpu-cores=8 mem=32768M
答案1
如果 OpenStack 集群发生故障,并且多个 charm unit 显示在一个错误或者受阻状态,运行以下步骤使集群重新启动。
- 确保节点之间以及与互联网之间的连通性。
- 需要网络连接,因为在解析单元时将重新运行钩子。大多数 charm 钩子运行如下命令apt-get 更新这需要互联网连接。
- 如果 juju 命令卡住,则重新启动 juju 控制器/引导节点或重新启动该节点上的 juju-* 服务。
- 如果您遇到任何“代理丢失”错误,请重新启动这些节点/容器内的 jujud-unit-charm-name-unit 服务。
解决处于错误状态的魅力单元。
$ juju solved--retry charm-name/unit
这将重新运行最初失败的钩子。按以下顺序解析魅力单元:
- mysql
- 基石
- rabbitmq 服务器
- 头孢菌素
- 迅速
- nova-cloud-控制器
- 煤渣
- 一瞥
- 中子 API
- 中子网关
- nova-计算
- openstack-仪表板
如果解决单元问题没有帮助,请查看 juju 日志以查看错误是什么,然后尝试手动解决。确保所有单元都处于活动状态。
确认集群已备份
- 登录 Horizon 并检查所有服务是否处于活动状态
- 启动所有 OpenStack 实例并确保卷和网络已正确配置。