我已经ceph
使用cephadm
包括监控堆栈prometheus
,,alertmanager
node-exporter
目前,我正在尝试添加一个telegram
接收器(从 v0.24.0 开始支持 Telegram,因此我已手动将其mgr/container_image_alertmanager
从 0.23 更新到 0.24)alertmanager
,但在文档中找不到alertmanager.yml
应该创建的位置。
我可以看到这个文件是在 ceph 集群内创建的/var/lib/ceph/{hash}/alertmanager.ceph-1/etc/alertmanager/alertmanager.yml
我已将配置添加到上述文件中,如下所示:
templates:
- '/etc/alertmanager/config/*.tmpl'
route:
receiver: 'default'
routes:
- group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 30m
receiver: 'telegram'
receivers:
- name: 'default'
webhook_configs:
- name: 'ceph-dashboard'
webhook_configs:
- url: 'https://ceph-1:8443/api/prometheus_receiver'
- name: 'telegram'
telegram_configs:
- bot_token: <bot_token>
chat_id: <chat_id>
send_resolved: true
parse_mode: 'HTML'
api_url: 'https://api.telegram.org'
message: '{{ template "telegram.text" . }}'
接收器工作正常,但从alertmanager
Ceph 仪表板重新部署后,配置消失了,这是合乎逻辑的,因为我正在编辑生成的文件。
如果有人可以协助和/或指出正确的方向,我应该创建警报管理器配置来扩展/覆盖默认值。
答案1
阅读ceph 手册关于它的监控,其中有关于使用自定义配置堆栈组件的部分。
Option names
The following templates for files that will be generated by cephadm can be overridden. These are the names to be used when storing with `ceph config-key set`:
- services/alertmanager/alertmanager.yml
- services/grafana/ceph-dashboard.yml
- services/grafana/grafana.ini
- services/prometheus/prometheus.yml
- services/prometheus/alerting/custom_alerts.yml
- services/loki.yml
- services/promtail.yml
You can look up the file templates that are currently used by cephadm in src/pybind/mgr/cephadm/templates:
- services/alertmanager/alertmanager.yml.j2
- services/grafana/ceph-dashboard.yml.j2
- services/grafana/grafana.ini.j2
- services/prometheus/prometheus.yml.j2
- services/loki.yml.j2
- services/promtail.yml.j2
只需采用 ceph 模板,根据您的喜好进行编辑,并将ceph config-key set
其作为生成警报管理器配置时使用的模板。