我有一条警报规则:
groups:
- name: somename
rules:
- alert: CertificateExpiry
expr: certificate_expires_in_days < 20
for: 1h
labels:
severity: critical
annotations:
summary: Certificate for {{ $labels.instance }} will expire in {{ $value }} days.
以及一个测试:
rule_files:
- 'path/to/alert_rules.yml'
evaluation_interval: 15m
tests:
- interval: 15m
input_series:
- series: 'certificate_expires_in_days{instance="foo"}'
values: '5 5 5 5 5 5 5 5'
alert_rule_test:
- eval_time: 2h
alertname: CertificateExpiry
exp_alerts:
- exp_labels:
severity: critical
instance: foo
exp_annotations:
summary: Certificate for foo will expire in 5 days.
测试失败,表明警报根本没有触发:
$ promtool test rules certificates.yml
Unit Testing: certificates.yml
FAILED:
alertname:CertificateExpiry, time:2h,
exp:"[Labels:{alertname=\"CertificateExpiry\", instance=\"foo\", severity=\"critical\"} Annotations:{summary=\"Certificate for foo will expire in 5 days.\"}]",
got:"[]"
但是,如果我改变评估和系列时间,如下所示:
- alert: CertificateExpiry
expr: certificate_expires_in_days < 20
for: 8m
labels:
severity: critical
annotations:
summary: Certificate for {{ $labels.instance }} will expire in {{ $value }} days.
和
rule_files:
- 'path/to/alert_rules.yml'
evaluation_interval: 1m
tests:
- interval: 1m
input_series:
- series: 'certificate_expires_in_days{instance="foo"}'
values: '5 5 5 5 5 5 5 5'
alert_rule_test:
- eval_time: 8m
alertname: CertificateExpiry
exp_alerts:
- exp_labels:
severity: critical
instance: foo
exp_annotations:
summary: Certificate for foo will expire in 5 days.
测试成功。所以,问题出在时间上,其他地方的拼写错误可以排除。
显然,这是我的某种愚蠢的注意力错误,但我看不出来。有人能帮忙吗?