如何使用 prometheus 和 grafana 监控 Openstack

Question 1

你写道你已经在运行 prometheus，所以我就不详细说了。文档非常简单。它提供了不同的安装方法（docker/podman 容器或通过 snap）。启动导出器的两种方法对我来说都有效（podman 容器和 snap）。

配置导出器

1. Create a clouds.yaml    
root@control01:~# cat clouds.yaml 
clouds:
  admin:
    region_name: "RegionOne"
    identity_api_version: 3
    identity_interface: "internal"
    auth:
      username: admin
      password: ****
      project_name: ADMIN
      project_domain_name: Default
      user_domain_name: Default
      auth_url: 'https://control.example.com:5000/v3'

2. Start openstack-exporter (I recommend the legacy mode first)
root@control01:~# podman run -v "$HOME/clouds.yaml":/etc/openstack/clouds.yaml -it -p 9180:9180 ghcr.io/openstack-exporter/openstack-exporter:latest admin
ts=2023-10-07T12:23:21.616Z caller=main.go:64 level=info msg="Build context" build_context="(go=go1.18.10, platform=linux/amd64, user=, date=, tags=unknown)"
ts=2023-10-07T12:23:21.616Z caller=main.go:85 level=info msg="openstack exporter started in legacy mode"
ts=2023-10-07T12:23:21.617Z caller=tls_config.go:274 level=info msg="Listening on" address=[::]:9180
ts=2023-10-07T12:23:21.617Z caller=tls_config.go:277 level=info msg="TLS is disabled." http2=false address=[::]:9180

2a. Using the snap you should be able to just start the exporter instance from the command line:
$ ./openstack-exporter --os-client-config /etc/openstack/clouds.yaml myregion

3. Verify
root@control01:~# curl http://localhost:9180/metrics
# HELP openstack_cinder_agent_state agent_state
# TYPE openstack_cinder_agent_state counter
openstack_cinder_agent_state{adminState="enabled",disabledReason="",hostname="control",service="cinder-backup",uuid="94ecf3be-864e-a2a0-dd93-7e9f8e38adf5",zone="nova"} 0
openstack_cinder_agent_state{adminState="enabled",disabledReason="",hostname="control",service="cinder-scheduler",uuid="c4cda40e-9d17-f960-fa84-cc3897ddda9f",zone="nova"} 1
[and many more]

4. Add openstack-exporter to prometheus (just an excerpt of the new job)
root@control01:~# tail -5 /etc/prometheus/prometheus.yml
  - job_name: openstack
    static_configs:
      - targets: ['localhost:9180']
    scrape_interval: 120s
    scrape_timeout: 120s

在我的例子中，我在控制节点上也运行了 prometheus 服务器（只是一个测试环境），这就是为什么我将目标定义为“localhost”。现在，您可以使用其中任何指标来创建自己的 grafana 仪表板，以图表形式可视化指标。

配置 Grafana

你需要一个正在运行的 grafana 实例，并且你有权添加仪表板，我不会详细介绍这些内容。你可以导入仪表板您所提到的，我导入了 json 文件。导入过程中，系统会要求您提供 json 文件和 prometheus 数据源。一旦导出器启动并且 prometheus 收集了其指标，您就会在导入的仪表板中看到图表：

Answer

你写道你已经在运行 prometheus，所以我就不详细说了。文档非常简单。它提供了不同的安装方法（docker/podman 容器或通过 snap）。启动导出器的两种方法对我来说都有效（podman 容器和 snap）。

配置导出器

1. Create a clouds.yaml    
root@control01:~# cat clouds.yaml 
clouds:
  admin:
    region_name: "RegionOne"
    identity_api_version: 3
    identity_interface: "internal"
    auth:
      username: admin
      password: ****
      project_name: ADMIN
      project_domain_name: Default
      user_domain_name: Default
      auth_url: 'https://control.example.com:5000/v3'

2. Start openstack-exporter (I recommend the legacy mode first)
root@control01:~# podman run -v "$HOME/clouds.yaml":/etc/openstack/clouds.yaml -it -p 9180:9180 ghcr.io/openstack-exporter/openstack-exporter:latest admin
ts=2023-10-07T12:23:21.616Z caller=main.go:64 level=info msg="Build context" build_context="(go=go1.18.10, platform=linux/amd64, user=, date=, tags=unknown)"
ts=2023-10-07T12:23:21.616Z caller=main.go:85 level=info msg="openstack exporter started in legacy mode"
ts=2023-10-07T12:23:21.617Z caller=tls_config.go:274 level=info msg="Listening on" address=[::]:9180
ts=2023-10-07T12:23:21.617Z caller=tls_config.go:277 level=info msg="TLS is disabled." http2=false address=[::]:9180

2a. Using the snap you should be able to just start the exporter instance from the command line:
$ ./openstack-exporter --os-client-config /etc/openstack/clouds.yaml myregion

3. Verify
root@control01:~# curl http://localhost:9180/metrics
# HELP openstack_cinder_agent_state agent_state
# TYPE openstack_cinder_agent_state counter
openstack_cinder_agent_state{adminState="enabled",disabledReason="",hostname="control",service="cinder-backup",uuid="94ecf3be-864e-a2a0-dd93-7e9f8e38adf5",zone="nova"} 0
openstack_cinder_agent_state{adminState="enabled",disabledReason="",hostname="control",service="cinder-scheduler",uuid="c4cda40e-9d17-f960-fa84-cc3897ddda9f",zone="nova"} 1
[and many more]

4. Add openstack-exporter to prometheus (just an excerpt of the new job)
root@control01:~# tail -5 /etc/prometheus/prometheus.yml
  - job_name: openstack
    static_configs:
      - targets: ['localhost:9180']
    scrape_interval: 120s
    scrape_timeout: 120s

在我的例子中，我在控制节点上也运行了 prometheus 服务器（只是一个测试环境），这就是为什么我将目标定义为“localhost”。现在，您可以使用其中任何指标来创建自己的 grafana 仪表板，以图表形式可视化指标。

配置 Grafana

你需要一个正在运行的 grafana 实例，并且你有权添加仪表板，我不会详细介绍这些内容。你可以导入仪表板您所提到的，我导入了 json 文件。导入过程中，系统会要求您提供 json 文件和 prometheus 数据源。一旦导出器启动并且 prometheus 收集了其指标，您就会在导入的仪表板中看到图表：

Question 2

对于 wallaby 版本的 openstack，某些图表不会显示。例如，总体内存使用情况、总体 CPU 核心使用情况、本地存储等。原因是虚拟机管理程序值已从 2.88 nova API 中删除。

https://docs.openstack.org/nova/wallaby/reference/api-microversion-history.html#maximum-in-wallaby

原因大概是这些查询不会给 nova api 带来负担。这已合并并移至 placement API，因此即使在 openstack-exporter 版本 1.6.0（但也包括更早的版本）中也可以从此指标中请求它。

有关虚拟机管理程序放置 API 的一些信息

https://specs.openstack.org/openstack/nova-specs/specs/wallaby/implemented/modernize-os-hypervisors-api.html

Openstack 导出器提供的放置指标：

openstack_placement_resource_allocation_ratio{主机名="nova-a-01.cloud",资源类型="DISK_GB"} 1 openstack_placement_resource_allocation_ratio{主机名="nova-a-01.cloud",资源类型="MEMORY_MB"} 1 openstack_placement_resource_allocation_ratio{主机名="nova-a-01.cloud",资源类型="VCPU"} 16 openstack_placement_resource_allocation_ratio{主机名="nova-a-02.cloud",资源类型="DISK_GB"} 1 openstack_placement_resource_allocation_ratio{主机名="nova-a-02.cloud",资源类型="MEMORY_MB"} 1 openstack_placement_resource_allocation_ratio{主机名="nova-a-02.cloud",资源类型="VCPU"} 16

openstack_placement_resource_reserved{主机名="nova-a-01.cloud",资源类型="DISK_GB"} 0 openstack_placement_resource_reserved{主机名="nova-a-01.cloud",资源类型="MEMORY_MB"} 4096 openstack_placement_resource_reserved{主机名="nova-a-01.cloud",资源类型="VCPU"} 0 openstack_placement_resource_reserved{主机名="nova-a-02.cloud",资源类型="DISK_GB"} 0 openstack_placement_resource_reserved{主机名="nova-a-02.cloud",资源类型="MEMORY_MB"} 4096 openstack_placement_resource_reserved{主机名="nova-a-02.cloud",资源类型="VCPU"} 0

openstack_placement_resource_total {主机名="nova-a-01.cloud",资源类型="DISK_GB"} 1314 openstack_placement_resource_total {主机名="nova-a-01.cloud",资源类型="MEMORY_MB"} 385046 openstack_placement_resource_total {主机名="nova-a-01.cloud",资源类型="VCPU"} 64 openstack_placement_resource_total {主机名="nova-a-02.cloud",资源类型="DISK_GB"} 1314 openstack_placement_resource_total {主机名="nova-a-02.cloud",资源类型="MEMORY_MB"} 385046 openstack_placement_resource_total {主机名="nova-a-02.cloud",资源类型="VCPU"} 64

openstack_placement_resource_usage{主机名="nova-a-01.cloud",资源类型="DISK_GB"} 0 openstack_placement_resource_usage{主机名="nova-a-01.cloud",资源类型="MEMORY_MB"} 256 openstack_placement_resource_usage{主机名="nova-a-01.cloud",资源类型="VCPU"} 1 openstack_placement_resource_usage{主机名="nova-a-02.cloud",资源类型="DISK_GB"} 0 openstack_placement_resource_usage{主机名="nova-a-02.cloud",资源类型="MEMORY_MB"} 0 openstack_placement_resource_usage{主机名="nova-a-02.cloud",资源类型="VCPU"} 0

openstack_placement_up 1

在这种情况下，

总和（openstack_placement_resource_usage）

应该用来代替

总和（openstack_nova_memory_used_bytes）

Answer

对于 wallaby 版本的 openstack，某些图表不会显示。例如，总体内存使用情况、总体 CPU 核心使用情况、本地存储等。原因是虚拟机管理程序值已从 2.88 nova API 中删除。

https://docs.openstack.org/nova/wallaby/reference/api-microversion-history.html#maximum-in-wallaby

原因大概是这些查询不会给 nova api 带来负担。这已合并并移至 placement API，因此即使在 openstack-exporter 版本 1.6.0（但也包括更早的版本）中也可以从此指标中请求它。

有关虚拟机管理程序放置 API 的一些信息

https://specs.openstack.org/openstack/nova-specs/specs/wallaby/implemented/modernize-os-hypervisors-api.html

Openstack 导出器提供的放置指标：

openstack_placement_resource_allocation_ratio{主机名="nova-a-01.cloud",资源类型="DISK_GB"} 1 openstack_placement_resource_allocation_ratio{主机名="nova-a-01.cloud",资源类型="MEMORY_MB"} 1 openstack_placement_resource_allocation_ratio{主机名="nova-a-01.cloud",资源类型="VCPU"} 16 openstack_placement_resource_allocation_ratio{主机名="nova-a-02.cloud",资源类型="DISK_GB"} 1 openstack_placement_resource_allocation_ratio{主机名="nova-a-02.cloud",资源类型="MEMORY_MB"} 1 openstack_placement_resource_allocation_ratio{主机名="nova-a-02.cloud",资源类型="VCPU"} 16

openstack_placement_resource_reserved{主机名="nova-a-01.cloud",资源类型="DISK_GB"} 0 openstack_placement_resource_reserved{主机名="nova-a-01.cloud",资源类型="MEMORY_MB"} 4096 openstack_placement_resource_reserved{主机名="nova-a-01.cloud",资源类型="VCPU"} 0 openstack_placement_resource_reserved{主机名="nova-a-02.cloud",资源类型="DISK_GB"} 0 openstack_placement_resource_reserved{主机名="nova-a-02.cloud",资源类型="MEMORY_MB"} 4096 openstack_placement_resource_reserved{主机名="nova-a-02.cloud",资源类型="VCPU"} 0

openstack_placement_resource_total {主机名="nova-a-01.cloud",资源类型="DISK_GB"} 1314 openstack_placement_resource_total {主机名="nova-a-01.cloud",资源类型="MEMORY_MB"} 385046 openstack_placement_resource_total {主机名="nova-a-01.cloud",资源类型="VCPU"} 64 openstack_placement_resource_total {主机名="nova-a-02.cloud",资源类型="DISK_GB"} 1314 openstack_placement_resource_total {主机名="nova-a-02.cloud",资源类型="MEMORY_MB"} 385046 openstack_placement_resource_total {主机名="nova-a-02.cloud",资源类型="VCPU"} 64

openstack_placement_resource_usage{主机名="nova-a-01.cloud",资源类型="DISK_GB"} 0 openstack_placement_resource_usage{主机名="nova-a-01.cloud",资源类型="MEMORY_MB"} 256 openstack_placement_resource_usage{主机名="nova-a-01.cloud",资源类型="VCPU"} 1 openstack_placement_resource_usage{主机名="nova-a-02.cloud",资源类型="DISK_GB"} 0 openstack_placement_resource_usage{主机名="nova-a-02.cloud",资源类型="MEMORY_MB"} 0 openstack_placement_resource_usage{主机名="nova-a-02.cloud",资源类型="VCPU"} 0

openstack_placement_up 1

在这种情况下，

总和（openstack_placement_resource_usage）

应该用来代替

总和（openstack_nova_memory_used_bytes）

如何使用 prometheus 和 grafana 监控 Openstack

答案1

配置导出器

配置 Grafana

答案2

相关内容