Openstack:启用 GPU 直通后无法启动实例

Openstack:启用 GPU 直通后无法启动实例

我在 Ubuntu 22.04 (Jammy) LTS 上的单节点配置中使用 devstack 安装了 openstack。我按照以下教程在我的 openstack 上设置了 GPU 直通:https://superuser.openinfra.dev/articles/a-comprehensive-guide-to-configuring-gpu-passthrough-in-openstack-for-high-performance-computing/

我目前有一台 GTX 1630(我知道它不适合 HPC,但这是一个测试配置,计划以后再开发)。我的 BIOS 中启用了 VT-d。

我的 grub 和 initramfs 配置与文章中完全相同,我只是在配置文件中放置了正确的供应商 ID 和产品 ID。以下是有关硬件和驱动程序的一些详细信息:

$ sudo lspci -nn | grep NVIDIA
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1f83] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10fa] (rev a1)
$ sudo lspci -s 01:00.0 -k
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1f83 (rev a1)
        Subsystem: NVIDIA Corporation Device 169c
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau
$ sudo lspci -s 01:00.1 -k
01:00.1 Audio device: NVIDIA Corporation Device 10fa (rev a1)
        Subsystem: NVIDIA Corporation Device 169c
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel

以下是我添加到 nova 配置中的内容:

...
[pci]
device_spec = { "vendor_id":"10de", "product_id":"1f83" }
alias: { "vendor_id":"10de", "product_id":"1f83", "device_type":"type-PCI", "name":"geforce-gtx-1630" }

[filter_scheduler]
enabled_filters = PciPassthroughFilter
available_filters = nova.scheduler.filters.all_filters

这是我用来尝试启动实例的图像和风格:

$ openstack image show bc01668d-6716-4c19-8b20-f9e60f98a4dc
+------------------+-------------------------------------------------------------------------------------------------------+
| Field            | Value                                                                                                 |
+------------------+-------------------------------------------------------------------------------------------------------+
| checksum         | fd981e3a7528b5911631886a03fa5693                                                                      |
| container_format | bare                                                                                                  |
| created_at       | 2023-11-20T07:06:27Z                                                                                  |
| disk_format      | qcow2                                                                                                 |
| file             | /v2/images/bc01668d-6716-4c19-8b20-f9e60f98a4dc/file                                                  |
| id               | bc01668d-6716-4c19-8b20-f9e60f98a4dc                                                                  |
| min_disk         | 0                                                                                                     |
| min_ram          | 0                                                                                                     |
| name             | Ubuntu 20.04 LTS (Focal Fossa)                                                                        |
| owner            | 6209bfb566b749fe943f27521b7519ea                                                                      |
| properties       | img_hide_hypervisor_id='true', os_hash_algo='sha512', os_hash_value='48059a837a24997117c48456c985d9b0 |
|                  | d9c4fb89b2a0b81d6e9e9589f02216d925a8b3d18848acb89e0fb7cbccacc1fbb08d95115d444cca5cbb093cfa37e830',    |
|                  | os_hidden='False'                                                                                     |
| protected        | False                                                                                                 |
| schema           | /v2/schemas/image                                                                                     |
| size             | 620167168                                                                                             |
| status           | active                                                                                                |
| tags             |                                                                                                       |
| updated_at       | 2023-11-20T07:07:07Z                                                                                  |
| virtual_size     | 2361393152                                                                                            |
| visibility       | private                                                                                               |
+------------------+-------------------------------------------------------------------------------------------------------+
$ openstack flavor show gpu_flavor
+----------------------------+--------------------------------------------+
| Field                      | Value                                      |
+----------------------------+--------------------------------------------+
| OS-FLV-DISABLED:disabled   | False                                      |
| OS-FLV-EXT-DATA:ephemeral  | 0                                          |
| access_project_ids         | None                                       |
| description                | None                                       |
| disk                       | 25                                         |
| id                         | 025649e0-3904-4322-b43a-d5d55c780198       |
| name                       | gpu_flavor                                 |
| os-flavor-access:is_public | True                                       |
| properties                 | pci_passthrough:alias='geforce-gtx-1630:1' |
| ram                        | 4096                                       |
| rxtx_factor                | 1.0                                        |
| swap                       | 0                                          |
| vcpus                      | 2                                          |
+----------------------------+--------------------------------------------+

以下是尝试启动实例时 nova 调度程序的日志:

$ journalctl -xeu [email protected] | grep -i "Nov 20 09:08:57"
Hint: You are currently not seeing messages from other users and the system.
      Users in groups 'adm', 'systemd-journal' can see all messages.
      Pass -q to turn off this notice.
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.scheduler.manager [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Starting to schedule for instances: ['41bdc44a-3e8c-475a-9e65-4331666abd75'] {{(pid=127748) select_destinations /opt/stack/nova/nova/scheduler/manager.py:175}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.scheduler.request_filter [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] compute_status_filter request filter added forbidden trait COMPUTE_STATUS_DISABLED {{(pid=127748) compute_status_filter /opt/stack/nova/nova/scheduler/request_filter.py:253}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.scheduler.request_filter [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Request filter 'compute_status_filter' took 0.0 seconds {{(pid=127748) wrapper /opt/stack/nova/nova/scheduler/request_filter.py:46}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.scheduler.request_filter [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Request filter 'accelerators_filter' took 0.0 seconds {{(pid=127748) wrapper /opt/stack/nova/nova/scheduler/request_filter.py:46}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.scheduler.request_filter [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Request filter 'remote_managed_ports_filter' took 0.0 seconds {{(pid=127748) wrapper /opt/stack/nova/nova/scheduler/request_filter.py:46}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.scheduler.request_filter [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] ephemeral_encryption_filter skipped {{(pid=127748) ephemeral_encryption_filter /opt/stack/nova/nova/scheduler/request_filter.py:410}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG oslo_concurrency.lockutils [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Acquiring lock "13ea0d9f-bae6-45df-9ee2-6fbc0a0080ad" by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" {{(pid=127748) inner /opt/stack/data/venv/lib/python3.10/site-packages/oslo_concurrency/lockutils.py:404}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG oslo_concurrency.lockutils [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Lock "13ea0d9f-bae6-45df-9ee2-6fbc0a0080ad" acquired by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: waited 0.000s {{(pid=127748) inner /opt/stack/data/venv/lib/python3.10/site-packages/oslo_concurrency/lockutils.py:409}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG oslo_concurrency.lockutils [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Lock "13ea0d9f-bae6-45df-9ee2-6fbc0a0080ad" "released" by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: held 0.000s {{(pid=127748) inner /opt/stack/data/venv/lib/python3.10/site-packages/oslo_concurrency/lockutils.py:423}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG oslo_concurrency.lockutils [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Acquiring lock "13ea0d9f-bae6-45df-9ee2-6fbc0a0080ad" by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" {{(pid=127748) inner /opt/stack/data/venv/lib/python3.10/site-packages/oslo_concurrency/lockutils.py:404}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG oslo_concurrency.lockutils [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Lock "13ea0d9f-bae6-45df-9ee2-6fbc0a0080ad" acquired by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: waited 0.000s {{(pid=127748) inner /opt/stack/data/venv/lib/python3.10/site-packages/oslo_concurrency/lockutils.py:409}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG oslo_concurrency.lockutils [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Lock "13ea0d9f-bae6-45df-9ee2-6fbc0a0080ad" "released" by "nova.context.set_target_cell.<locals>.get_or_set_cached_cell_and_set_connections" :: held 0.000s {{(pid=127748) inner /opt/stack/data/venv/lib/python3.10/site-packages/oslo_concurrency/lockutils.py:423}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG oslo_concurrency.lockutils [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Acquiring lock "('itopenstack', 'itopenstack')" by "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" {{(pid=127748) inner /opt/stack/data/venv/lib/python3.10/site-packages/oslo_concurrency/lockutils.py:404}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG oslo_concurrency.lockutils [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Lock "('itopenstack', 'itopenstack')" acquired by "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" :: waited 0.000s {{(pid=127748) inner /opt/stack/data/venv/lib/python3.10/site-packages/oslo_concurrency/lockutils.py:409}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.scheduler.host_manager [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Update host state from compute node: ComputeNode(cpu_allocation_ratio=4.0,cpu_info='{"arch": "x86_64", "model": "Skylake-Client-noTSX-IBRS", "vendor": "Intel", "topology": {"cells": 1, "sockets": 1, "cores": 6, "threads": 1}, "features": ["smx", "abm", "mce", "sse4.2", "vmx", "lm", "msr", "mpx", "xtpr", "tm2", "ht", "fma", "pat", "de", "adx", "tsc", "tsc-deadline", "clflushopt", "est", "dtes64", "popcnt", "arch-capabilities", "apic", "pclmuldq", "tsc_adjust", "rsba", "acpi", "vme", "movbe", "md-clear", "bmi1", "avx", "pni", "f16c", "pse36", "xsavec", "pge", "xsaves", "cx16", "ss", "sse4.1", "cx8", "xgetbv1", "smep", "nx", "mtrr", "lahf_lm", "x2apic", "avx2", "pdpe1gb", "ds_cpl", "arat", "spec-ctrl", "cmov", "pcid", "xsaveopt", "fsgsbase", "invpcid", "pae", "ssbd", "sse2", "fxsr", "stibp", "bmi2", "rdtscp", "invtsc", "rdseed", "mmx", "pse", "monitor", "syscall", "xsave", "ds", "ssse3", "intel-pt", "smap", "pbe", "fpu", "3dnowprefetch", "erms", "aes", "rdrand", "tm", "sse", "pdcm", "mca", "clflush", "sep"]}',created_at=2023-11-20T06:55:11Z,current_workload=0,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=79,free_disk_gb=97,free_ram_mb=7292,host='itopenstack',host_ip=192.168.1.10,hypervisor_hostname='itopenstack',hypervisor_type='QEMU',hypervisor_version=6002000,id=1,local_gb=97,local_gb_used=0,mapped=1,memory_mb=7804,memory_mb_used=512,metrics='[]',numa_topology='{"nova_object.name": "NUMATopology", "nova_object.namespace": "nova", "nova_object.version": "1.2", "nova_object.data": {"cells": [{"nova_object.name": "NUMACell", "nova_object.namespace": "nova", "nova_object.version": "1.5", "nova_object.data": {"id": 0, "cpuset": [0, 1, 2, 3, 4, 5], "pcpuset": [0, 1, 2, 3, 4, 5], "memory": 7804, "cpu_usage": 0, "memory_usage": 0, "pinned_cpus": [], "siblings": [[2], [5], [4], [1], [0], [3]], "mempages": [{"nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", "nova_object.version": "1.1", "nova_object.data": {"size_kb": 4, "total": 1997843, "used": 0, "reserved": 0}, "nova_object.changes": ["used", "size_kb", "reserved", "total"]}, {"nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", "nova_object.version": "1.1", "nova_object.data": {"size_kb": 2048, "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used", "size_kb", "reserved", "total"]}, {"nova_object.name": "NUMAPagesTopology", "nova_object.namespace": "nova", "nova_object.version": "1.1", "nova_object.data": {"size_kb": 1048576, "total": 0, "used": 0, "reserved": 0}, "nova_object.changes": ["used", "size_kb", "reserved", "total"]}], "network_metadata": {"nova_object.name": "NetworkMetadata", "nova_object.namespace": "nova", "nova_object.version": "1.0", "nova_object.data": {"physnets": [], "tunneled": false}, "nova_object.changes": ["physnets", "tunneled"]}, "socket": 0}, "nova_object.changes": ["id", "memory_usage", "cpu_usage", "pcpuset", "socket", "siblings", "mempages", "network_metadata", "memory", "pinned_cpus", "cpuset"]}]}, "nova_object.changes": ["cells"]}',pci_device_pools=PciDevicePoolList,ram_allocation_ratio=1.0,running_vms=0,service_id=3,stats={failed_builds='0'},supported_hv_specs=[HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec,HVSpec],updated_at=2023-11-20T08:25:01Z,uuid=afec2be8-dc5c-41ef-8d6a-e1729719bece,vcpus=6,vcpus_used=0) {{(pid=127748) _locked_update /opt/stack/nova/nova/scheduler/host_manager.py:169}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.scheduler.host_manager [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Update host state with aggregates: [] {{(pid=127748) _locked_update /opt/stack/nova/nova/scheduler/host_manager.py:172}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.scheduler.host_manager [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Update host state with service dict: {'id': 3, 'uuid': 'fba4961f-4107-45bf-9408-7a2af793beb7', 'host': 'itopenstack', 'binary': 'nova-compute', 'topic': 'compute', 'report_count': 73, 'disabled': False, 'disabled_reason': None, 'last_seen_up': datetime.datetime(2023, 11, 20, 9, 7, 6, tzinfo=datetime.timezone.utc), 'forced_down': False, 'version': 66, 'created_at': datetime.datetime(2023, 11, 20, 6, 55, 11, tzinfo=datetime.timezone.utc), 'updated_at': datetime.datetime(2023, 11, 20, 9, 7, 6, tzinfo=datetime.timezone.utc), 'deleted_at': None, 'deleted': False} {{(pid=127748) _locked_update /opt/stack/nova/nova/scheduler/host_manager.py:175}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.scheduler.host_manager [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Update host state with instances: [] {{(pid=127748) _locked_update /opt/stack/nova/nova/scheduler/host_manager.py:178}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG oslo_concurrency.lockutils [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Lock "('itopenstack', 'itopenstack')" "released" by "nova.scheduler.host_manager.HostState.update.<locals>._locked_update" :: held 0.002s {{(pid=127748) inner /opt/stack/data/venv/lib/python3.10/site-packages/oslo_concurrency/lockutils.py:423}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.filters [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Starting with 1 host(s) {{(pid=127748) get_filtered_objects /opt/stack/nova/nova/filters.py:70}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.scheduler.filters [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] PciPassthroughFilter tries allocation candidate: {'allocations': {'afec2be8-dc5c-41ef-8d6a-e1729719bece': {'resources': {'DISK_GB': 25, 'MEMORY_MB': 4096, 'VCPU': 2}}}, 'mappings': {'': ['afec2be8-dc5c-41ef-8d6a-e1729719bece']}} {{(pid=127748) filter_candidates /opt/stack/nova/nova/scheduler/filters/__init__.py:77}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.pci.stats [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Not enough PCI devices left to satisfy request {{(pid=127748) _filter_pools /opt/stack/nova/nova/pci/stats.py:654}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.scheduler.filters [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] PciPassthroughFilter rejected allocation candidate: {'allocations': {'afec2be8-dc5c-41ef-8d6a-e1729719bece': {'resources': {'DISK_GB': 25, 'MEMORY_MB': 4096, 'VCPU': 2}}}, 'mappings': {'': ['afec2be8-dc5c-41ef-8d6a-e1729719bece']}} {{(pid=127748) filter_candidates /opt/stack/nova/nova/scheduler/filters/__init__.py:88}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.scheduler.filters.pci_passthrough_filter [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] (itopenstack, itopenstack) ram: 7292MB disk: 80896MB io_ops: 0 instances: 0, allocation_candidates: 0 doesn't have the required PCI devices (InstancePCIRequests(instance_uuid=<?>,requests=[InstancePCIRequest])) {{(pid=127748) host_passes /opt/stack/nova/nova/scheduler/filters/pci_passthrough_filter.py:68}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: INFO nova.filters [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Filter PciPassthroughFilter returned 0 hosts
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.filters [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Filtering removed all hosts for the request with instance ID '41bdc44a-3e8c-475a-9e65-4331666abd75'. Filter results: [('PciPassthroughFilter', None)] {{(pid=127748) get_filtered_objects /opt/stack/nova/nova/filters.py:114}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: INFO nova.filters [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Filtering removed all hosts for the request with instance ID '41bdc44a-3e8c-475a-9e65-4331666abd75'. Filter results: ['PciPassthroughFilter: (start: 1, end: 0)']
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.scheduler.manager [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] Filtered [] {{(pid=127748) _get_sorted_hosts /opt/stack/nova/nova/scheduler/manager.py:708}}
Nov 20 09:08:57 itopenstack nova-scheduler[127748]: DEBUG nova.scheduler.manager [None req-20b0843e-0997-4c83-9abc-b20fd73706a4 demo admin] There are 0 hosts available but 1 instances requested to build. {{(pid=127748) _ensure_sufficient_hosts /opt/stack/nova/nova/scheduler/manager.py:527}}

我不明白错误从何而来。这是我在 StackExchange 上的第一篇文章,我试图提供尽可能多的详细信息。如果我需要提供任何其他详细信息,请告诉我。我将不胜感激任何建议,谢谢。

相关内容