Openstack 存储无响应

Openstack 存储无响应

这个问题与问题我刚刚问了 OpenStack 元数据服务器。这是另一个设置,在其他机器上,它们显然没有相互连接。

我在 8 个节点 ( ) 上设置了 OpenStack gn008..gn015,其中所有节点都是计算 ( libvirt/kvm)、网络 ( linuxbridge) 和存储 ( lvm) 节点;gn011另外还运行所有 OpenStack 管理服务。当/var/log分区已满时,我偶尔会遇到问题,尤其是在 上gn011,但删除大日志文件并重新启动相关守护程序可以解决问题。

现在,OpenStack 的卷服务部分在创建任何新卷(甚至是空白卷)时都会失败。为了排除存储空间不足的可能性,我删除了一些虚拟机及其关联的卷;但卷删除也失败了。现在我发现卷附加到了不存在的虚拟机上(请参阅block1下面跟踪中的卷):

[root@gn011 ~]# openstack volume list
+--------------------------------------+---------------------+-----------+------+---------------------------------------------------------------+
| ID                                   | Name                | Status    | Size | Attached to                                                   |
+--------------------------------------+---------------------+-----------+------+---------------------------------------------------------------+
| 217d9087-3175-4565-91f9-dcca2e1be383 | cpu1_instances      | in-use    |   50 | Attached to cpu1 on /dev/vdb                                  |
| dde062b6-0fc6-4e76-b936-1d2cfed14af4 | cpu1                | in-use    |   16 | Attached to cpu1 on /dev/vda                                  |
| c7d11144-278e-4785-9024-a685a0406215 | block1_volumes      | available |   50 |                                                               |
| 67d7882d-903a-4e2a-b386-5d0af65b6c65 | block1              | in-use    |   16 | Attached to c0866654-fcc5-48f1-a446-1c33e518a10e on /dev/vda  |
| 680c085c-2959-45ee-85bf-b249b8f0a6bd | block0_volumes      | in-use    |   50 | Attached to 399aa8dd-9aea-4059-bed7-eb66209813f9 on /dev/vdb  |
| 9a272ff6-c443-49d6-befd-730d1635d6eb | block0              | in-use    |   16 | Attached to 399aa8dd-9aea-4059-bed7-eb66209813f9 on /dev/vda  |
| 1e716bcc-a381-4da6-80c6-44572437f610 | designate01         | in-use    |   16 | Attached to 5ca79be8-4fe1-43fe-a04a-aed4dfcf2158 on /dev/vda  |
| 53adcf54-543c-4d37-95c4-37696e76b747 | cinder01_conversion | in-use    |   20 | Attached to d2bd428b-dbc7-45a1-bf6c-6f82d05c3a89 on /dev/vdb  |
| 356d474f-13a5-42c1-8ee7-0411e5671f98 | cinder01            | in-use    |   16 | Attached to d2bd428b-dbc7-45a1-bf6c-6f82d05c3a89 on /dev/vda  |
| 4f13a989-ee46-48cf-99ef-19922f8f9564 | horizon01           | in-use    |   16 | Attached to f2ba2db1-c0aa-4256-a2c7-a0d9047cd374 on /dev/vda  |
| b9c3b5b6-d242-408f-b939-503096ab51c7 | neutron01           | in-use    |   16 | Attached to a6d4e0d3-ac6e-40bd-ac5d-5e2347b0b9e4 on /dev/vda  |
| a212296b-6380-4f22-ad01-749d28dec198 | nova01              | in-use    |   16 | Attached to e76a0843-f093-49d6-80a1-a44cd55335be on /dev/vda  |
| 4c6e239c-67f7-43a7-bdb8-178fbb639159 | placement01         | in-use    |   16 | Attached to 044290a6-ffc4-48d0-a709-0628e4a1fa57 on /dev/vda  |
| 50f6ca48-2d79-40ac-8a37-b7719eadbce1 | glance01_images     | in-use    |  100 | Attached to c7fb6b3b-f13a-45f9-95c2-248233d7a982 on /dev/vdb  |
| e88d302e-c06c-4751-b117-8a3c44ced804 | glance01            | in-use    |   16 | Attached to c7fb6b3b-f13a-45f9-95c2-248233d7a982 on /dev/vda  |
| 50621c38-6c38-496b-89df-c6fcbf466186 | keystone01          | in-use    |   16 | Attached to df51164e-15ff-460c-bfcd-fb91bc569f47 on /dev/vda  |
| eb48421c-121e-4cf0-9833-c617cc7fadec | rabbitmq01          | in-use    |   16 | Attached to e64f26d1-6405-4f0c-9427-b5ae1656f30f on /dev/vda  |
| 844cacae-ec0c-42b4-9229-423b48fd9eb6 | memcached01         | in-use    |   16 | Attached to edabcee3-69f3-4a60-861e-b4f961501102 on /dev/vda  |
| d54a0aff-efea-43fe-9ce8-7eba4597b7ea | openstack_base      | available |   16 |                                                               |
+--------------------------------------+---------------------+-----------+------+---------------------------------------------------------------+
[root@gn011 ~]# openstack server show c0866654-fcc5-48f1-a446-1c33e518a10e
No server with a name or ID of 'c0866654-fcc5-48f1-a446-1c33e518a10e' exists.
[root@gn011 ~]# 

我确实设法将一些我尝试删除的卷设置为error,然后删除它们,但现在删除部分卡住了好几天,卷仍然没有被删除。我确实注意到每个 cinder 守护进程都因greenlet或的一些问题而死锁eventlet。我不再有确切的踪迹,但看起来像此错误报告。我以前经常遇到这个错误,所以我只是在每台运行服务的机器heartbeat_in_pthread = true上设置并在所有机器上重新启动服务,但一切仍然卡住。/etc/cinder/cinder.confopenstack-cinder-volumegn008-gn015

当我设法删除卷时,我注意到重新启动 rabbitmq-server 有点帮助,即使重新启动后只有几十秒。但这也不再有帮助。重新启动httpd需要gn011很长时间,但没有帮助。我无法再通过仪表板登录,但我在 /var/log/httpd/error_log 中找到了这一行:

Timeout when reading response headers from daemon process 'dashboard': /usr/share/openstack-dashboard/openstack_dashboard/wsgi.py, referer: http://openstack.svc.lunarc/dashboard/auth/login/?next=/dashboard/project/

我尝试通过添加日志设置来获取更多信息/usr/share/openstack-dashboard/openstack_dashboard/local/local_settings.py

LOGGING = {
    'version': 1,
    # When set to True this will disable all logging except
    # for loggers specified in this configuration dictionary. Note that
    # if nothing is specified here and disable_existing_loggers is True,
    # django.db.backends will still log unless it is disabled explicitly.
    'disable_existing_loggers': False,
    # If apache2 mod_wsgi is used to deploy OpenStack dashboard
    # timestamp is output by mod_wsgi. If WSGI framework you use does not
    # output timestamp for logging, add %(asctime)s in the following
    # format definitions.
    'formatters': {
        'console': {
            'format': '%(levelname)s %(name)s %(message)s'
        },
        'operation': {
            # The format of "%(message)s" is defined by
            # OPERATION_LOG_OPTIONS['format']
            'format': '%(message)s'
        },
        'verbose': {
            'format': '%(levelname)s %(asctime)s %(module)s %(process)d %(thread)d %(message)s'
        },
    },
    'handlers': {
...
        #'file': {
        #    'level': 'DEBUG' if DEBUG else 'INFO',
        #    'class': 'logging.FileHandler',
        #    'filename': '/var/log/httpd/dashboard.log',
        #    'formatter': 'console',
        #},
        'syslog': {
            'level': 'DEBUG' if DEBUG else 'INFO',
            'class': 'logging.handlers.SysLogHandler',
            'formatter': 'console',
            'facility': 'user',
        },
    },
    'loggers': {
        'horizon': {
            'handlers': ['syslog'],
            'level': 'DEBUG',
            'propagate': False,
        },
        'horizon.file_log': {
            'handlers': ['syslog'],
            'level': 'DEBUG',
            'propagate': False,
        },
        'openstack_dashboard': {
            'handlers': ['syslog'],
            'level': 'DEBUG',
            'propagate': False,
        },
        'novaclient': {
            'handlers': ['syslog'],
            'level': 'DEBUG',
            'propagate': False,
        },
... long long list, all set to debug and "handler" set to 'syslog' ...
}

并将 /etc/rsyslog.conf 更改为:

...
user.*                          /var/log/user.log
...

/var/log/user.log没有提供任何有用的东西:

Jan  4 18:50:01 gn011 httpd[2275299]: Server configured, listening on: port 5000, port 8778, port 80
Jan  5 10:51:00 gn011 httpd[2375720]: Server configured, listening on: port 5000, port 8778, port 80
Jan  5 11:22:08 gn011 httpd[2380573]: Server configured, listening on: port 5000, port 8778, port 80
Jan  5 15:32:42 gn011 httpd[2412748]: Server configured, listening on: port 5000, port 8778, port 80
Jan  5 16:28:21 gn011 httpd[2420940]: Server configured, listening on: port 5000, port 8778, port 80

我也尝试过直接在文件中输出日志/tmp但总是失败,报告缺少权限。

此时,我将重新启动gn011(控制器和计算节点),但我不愿意在迁移或删除它托管的所有虚拟机和卷之前这样做,因为我无法这样做,因为一切似乎都卡住了。如果仍然不起作用,那么我将重新启动所有节点。它在过去是有效的,尽管每个正在使用的卷实际上都丢失了,使用它们的虚拟机也是如此。我并不介意它们,但如果我在生产安装中遇到它(什么时候?),我真的希望能够解决这个问题而不会丢失虚拟机和数据。

有人能给我提供一些指导,让我的设置重新上线吗?

操作系统(所有节点相同)

[root@gn011 httpd]# cat /etc/redhat-release 
Rocky Linux release 8.6 (Green Obsidian)
[root@gn011 httpd]# 

OpenStack 版本(我不记得是哪个版本了):

[root@gn011 httpd]# rpm -qa | grep openstack
openstack-dashboard-20.1.2-1.el8.noarch
openstack-designate-api-13.0.0-1.el8.noarch
openstack-placement-common-6.0.0-1.el8.noarch
openstack-designate-producer-13.0.0-1.el8.noarch
openstack-nova-novncproxy-24.1.0-1.el8.noarch
openstack-designate-ui-13.0.0-2.el8.noarch
openstack-neutron-linuxbridge-19.3.0-1.el8.noarch
openstack-neutron-common-19.3.0-1.el8.noarch
openstack-neutron-19.3.0-1.el8.noarch
python-openstackclient-lang-5.6.0-1.el8.noarch
openstack-designate-sink-13.0.0-1.el8.noarch
openstack-dashboard-theme-20.1.2-1.el8.noarch
openstack-cinder-19.1.0-1.el8.noarch
openstack-designate-agent-13.0.0-1.el8.noarch
openstack-nova-common-24.1.0-1.el8.noarch
openstack-neutron-ml2-19.3.0-1.el8.noarch
openstack-keystone-20.0.0-2.el8.noarch
openstack-placement-api-6.0.0-1.el8.noarch
openstack-designate-mdns-13.0.0-1.el8.noarch
openstack-nova-conductor-24.1.0-1.el8.noarch
openstack-designate-worker-13.0.0-1.el8.noarch
openstack-nova-api-24.1.0-1.el8.noarch
openstack-glance-23.0.0-2.el8.noarch
openstack-nova-scheduler-24.1.0-1.el8.noarch
python3-openstacksdk-0.59.0-1.el8.noarch
python3-openstackclient-5.6.0-1.el8.noarch
openstack-selinux-0.8.27-1.el8.noarch
openstack-designate-common-13.0.0-1.el8.noarch
openstack-nova-compute-24.1.0-1.el8.noarch
openstack-designate-central-13.0.0-1.el8.noarch
[root@gn011 httpd]# 

(标记centos缺少标签rocky

相关内容