这个问题与问题我刚刚问了 OpenStack 元数据服务器。这是另一个设置,在其他机器上,它们显然没有相互连接。
我在 8 个节点 ( ) 上设置了 OpenStack gn008..gn015
,其中所有节点都是计算 ( libvirt/kvm
)、网络 ( linuxbridge
) 和存储 ( lvm
) 节点;gn011
另外还运行所有 OpenStack 管理服务。当/var/log
分区已满时,我偶尔会遇到问题,尤其是在 上gn011
,但删除大日志文件并重新启动相关守护程序可以解决问题。
现在,OpenStack 的卷服务部分在创建任何新卷(甚至是空白卷)时都会失败。为了排除存储空间不足的可能性,我删除了一些虚拟机及其关联的卷;但卷删除也失败了。现在我发现卷附加到了不存在的虚拟机上(请参阅block1
下面跟踪中的卷):
[root@gn011 ~]# openstack volume list
+--------------------------------------+---------------------+-----------+------+---------------------------------------------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+---------------------+-----------+------+---------------------------------------------------------------+
| 217d9087-3175-4565-91f9-dcca2e1be383 | cpu1_instances | in-use | 50 | Attached to cpu1 on /dev/vdb |
| dde062b6-0fc6-4e76-b936-1d2cfed14af4 | cpu1 | in-use | 16 | Attached to cpu1 on /dev/vda |
| c7d11144-278e-4785-9024-a685a0406215 | block1_volumes | available | 50 | |
| 67d7882d-903a-4e2a-b386-5d0af65b6c65 | block1 | in-use | 16 | Attached to c0866654-fcc5-48f1-a446-1c33e518a10e on /dev/vda |
| 680c085c-2959-45ee-85bf-b249b8f0a6bd | block0_volumes | in-use | 50 | Attached to 399aa8dd-9aea-4059-bed7-eb66209813f9 on /dev/vdb |
| 9a272ff6-c443-49d6-befd-730d1635d6eb | block0 | in-use | 16 | Attached to 399aa8dd-9aea-4059-bed7-eb66209813f9 on /dev/vda |
| 1e716bcc-a381-4da6-80c6-44572437f610 | designate01 | in-use | 16 | Attached to 5ca79be8-4fe1-43fe-a04a-aed4dfcf2158 on /dev/vda |
| 53adcf54-543c-4d37-95c4-37696e76b747 | cinder01_conversion | in-use | 20 | Attached to d2bd428b-dbc7-45a1-bf6c-6f82d05c3a89 on /dev/vdb |
| 356d474f-13a5-42c1-8ee7-0411e5671f98 | cinder01 | in-use | 16 | Attached to d2bd428b-dbc7-45a1-bf6c-6f82d05c3a89 on /dev/vda |
| 4f13a989-ee46-48cf-99ef-19922f8f9564 | horizon01 | in-use | 16 | Attached to f2ba2db1-c0aa-4256-a2c7-a0d9047cd374 on /dev/vda |
| b9c3b5b6-d242-408f-b939-503096ab51c7 | neutron01 | in-use | 16 | Attached to a6d4e0d3-ac6e-40bd-ac5d-5e2347b0b9e4 on /dev/vda |
| a212296b-6380-4f22-ad01-749d28dec198 | nova01 | in-use | 16 | Attached to e76a0843-f093-49d6-80a1-a44cd55335be on /dev/vda |
| 4c6e239c-67f7-43a7-bdb8-178fbb639159 | placement01 | in-use | 16 | Attached to 044290a6-ffc4-48d0-a709-0628e4a1fa57 on /dev/vda |
| 50f6ca48-2d79-40ac-8a37-b7719eadbce1 | glance01_images | in-use | 100 | Attached to c7fb6b3b-f13a-45f9-95c2-248233d7a982 on /dev/vdb |
| e88d302e-c06c-4751-b117-8a3c44ced804 | glance01 | in-use | 16 | Attached to c7fb6b3b-f13a-45f9-95c2-248233d7a982 on /dev/vda |
| 50621c38-6c38-496b-89df-c6fcbf466186 | keystone01 | in-use | 16 | Attached to df51164e-15ff-460c-bfcd-fb91bc569f47 on /dev/vda |
| eb48421c-121e-4cf0-9833-c617cc7fadec | rabbitmq01 | in-use | 16 | Attached to e64f26d1-6405-4f0c-9427-b5ae1656f30f on /dev/vda |
| 844cacae-ec0c-42b4-9229-423b48fd9eb6 | memcached01 | in-use | 16 | Attached to edabcee3-69f3-4a60-861e-b4f961501102 on /dev/vda |
| d54a0aff-efea-43fe-9ce8-7eba4597b7ea | openstack_base | available | 16 | |
+--------------------------------------+---------------------+-----------+------+---------------------------------------------------------------+
[root@gn011 ~]# openstack server show c0866654-fcc5-48f1-a446-1c33e518a10e
No server with a name or ID of 'c0866654-fcc5-48f1-a446-1c33e518a10e' exists.
[root@gn011 ~]#
我确实设法将一些我尝试删除的卷设置为error
,然后删除它们,但现在删除部分卡住了好几天,卷仍然没有被删除。我确实注意到每个 cinder 守护进程都因greenlet
或的一些问题而死锁eventlet
。我不再有确切的踪迹,但看起来像此错误报告。我以前经常遇到这个错误,所以我只是在每台运行服务的机器heartbeat_in_pthread = true
上设置并在所有机器上重新启动服务,但一切仍然卡住。/etc/cinder/cinder.conf
openstack-cinder-volume
gn008-gn015
当我设法删除卷时,我注意到重新启动 rabbitmq-server 有点帮助,即使重新启动后只有几十秒。但这也不再有帮助。重新启动httpd
需要gn011
很长时间,但没有帮助。我无法再通过仪表板登录,但我在 /var/log/httpd/error_log 中找到了这一行:
Timeout when reading response headers from daemon process 'dashboard': /usr/share/openstack-dashboard/openstack_dashboard/wsgi.py, referer: http://openstack.svc.lunarc/dashboard/auth/login/?next=/dashboard/project/
我尝试通过添加日志设置来获取更多信息/usr/share/openstack-dashboard/openstack_dashboard/local/local_settings.py
:
LOGGING = {
'version': 1,
# When set to True this will disable all logging except
# for loggers specified in this configuration dictionary. Note that
# if nothing is specified here and disable_existing_loggers is True,
# django.db.backends will still log unless it is disabled explicitly.
'disable_existing_loggers': False,
# If apache2 mod_wsgi is used to deploy OpenStack dashboard
# timestamp is output by mod_wsgi. If WSGI framework you use does not
# output timestamp for logging, add %(asctime)s in the following
# format definitions.
'formatters': {
'console': {
'format': '%(levelname)s %(name)s %(message)s'
},
'operation': {
# The format of "%(message)s" is defined by
# OPERATION_LOG_OPTIONS['format']
'format': '%(message)s'
},
'verbose': {
'format': '%(levelname)s %(asctime)s %(module)s %(process)d %(thread)d %(message)s'
},
},
'handlers': {
...
#'file': {
# 'level': 'DEBUG' if DEBUG else 'INFO',
# 'class': 'logging.FileHandler',
# 'filename': '/var/log/httpd/dashboard.log',
# 'formatter': 'console',
#},
'syslog': {
'level': 'DEBUG' if DEBUG else 'INFO',
'class': 'logging.handlers.SysLogHandler',
'formatter': 'console',
'facility': 'user',
},
},
'loggers': {
'horizon': {
'handlers': ['syslog'],
'level': 'DEBUG',
'propagate': False,
},
'horizon.file_log': {
'handlers': ['syslog'],
'level': 'DEBUG',
'propagate': False,
},
'openstack_dashboard': {
'handlers': ['syslog'],
'level': 'DEBUG',
'propagate': False,
},
'novaclient': {
'handlers': ['syslog'],
'level': 'DEBUG',
'propagate': False,
},
... long long list, all set to debug and "handler" set to 'syslog' ...
}
并将 /etc/rsyslog.conf 更改为:
...
user.* /var/log/user.log
...
但/var/log/user.log
没有提供任何有用的东西:
Jan 4 18:50:01 gn011 httpd[2275299]: Server configured, listening on: port 5000, port 8778, port 80
Jan 5 10:51:00 gn011 httpd[2375720]: Server configured, listening on: port 5000, port 8778, port 80
Jan 5 11:22:08 gn011 httpd[2380573]: Server configured, listening on: port 5000, port 8778, port 80
Jan 5 15:32:42 gn011 httpd[2412748]: Server configured, listening on: port 5000, port 8778, port 80
Jan 5 16:28:21 gn011 httpd[2420940]: Server configured, listening on: port 5000, port 8778, port 80
我也尝试过直接在文件中输出日志/tmp
但总是失败,报告缺少权限。
此时,我将重新启动gn011
(控制器和计算节点),但我不愿意在迁移或删除它托管的所有虚拟机和卷之前这样做,因为我无法这样做,因为一切似乎都卡住了。如果仍然不起作用,那么我将重新启动所有节点。它在过去是有效的,尽管每个正在使用的卷实际上都丢失了,使用它们的虚拟机也是如此。我并不介意它们,但如果我在生产安装中遇到它(什么时候?),我真的希望能够解决这个问题而不会丢失虚拟机和数据。
有人能给我提供一些指导,让我的设置重新上线吗?
操作系统(所有节点相同)
[root@gn011 httpd]# cat /etc/redhat-release
Rocky Linux release 8.6 (Green Obsidian)
[root@gn011 httpd]#
OpenStack 版本(我不记得是哪个版本了):
[root@gn011 httpd]# rpm -qa | grep openstack
openstack-dashboard-20.1.2-1.el8.noarch
openstack-designate-api-13.0.0-1.el8.noarch
openstack-placement-common-6.0.0-1.el8.noarch
openstack-designate-producer-13.0.0-1.el8.noarch
openstack-nova-novncproxy-24.1.0-1.el8.noarch
openstack-designate-ui-13.0.0-2.el8.noarch
openstack-neutron-linuxbridge-19.3.0-1.el8.noarch
openstack-neutron-common-19.3.0-1.el8.noarch
openstack-neutron-19.3.0-1.el8.noarch
python-openstackclient-lang-5.6.0-1.el8.noarch
openstack-designate-sink-13.0.0-1.el8.noarch
openstack-dashboard-theme-20.1.2-1.el8.noarch
openstack-cinder-19.1.0-1.el8.noarch
openstack-designate-agent-13.0.0-1.el8.noarch
openstack-nova-common-24.1.0-1.el8.noarch
openstack-neutron-ml2-19.3.0-1.el8.noarch
openstack-keystone-20.0.0-2.el8.noarch
openstack-placement-api-6.0.0-1.el8.noarch
openstack-designate-mdns-13.0.0-1.el8.noarch
openstack-nova-conductor-24.1.0-1.el8.noarch
openstack-designate-worker-13.0.0-1.el8.noarch
openstack-nova-api-24.1.0-1.el8.noarch
openstack-glance-23.0.0-2.el8.noarch
openstack-nova-scheduler-24.1.0-1.el8.noarch
python3-openstacksdk-0.59.0-1.el8.noarch
python3-openstackclient-5.6.0-1.el8.noarch
openstack-selinux-0.8.27-1.el8.noarch
openstack-designate-common-13.0.0-1.el8.noarch
openstack-nova-compute-24.1.0-1.el8.noarch
openstack-designate-central-13.0.0-1.el8.noarch
[root@gn011 httpd]#
(标记centos
缺少标签rocky
)