Prometheus 警报规则测试在较长间隔内失败

Prometheus 警报规则测试在较长间隔内失败

我有一条警报规则:

groups:
  - name: somename                                                                    
    rules:
      - alert: CertificateExpiry
        expr: certificate_expires_in_days < 20
        for: 1h
        labels:
          severity: critical
        annotations:
          summary: Certificate for {{ $labels.instance }} will expire in {{ $value }} days.   

以及一个测试:

rule_files:
  - 'path/to/alert_rules.yml'
                                                                                         
evaluation_interval: 15m 

tests:
  - interval: 15m 
    input_series:
      - series: 'certificate_expires_in_days{instance="foo"}'
        values: '5 5 5 5 5 5 5 5'
    alert_rule_test:
      - eval_time: 2h
        alertname: CertificateExpiry
        exp_alerts:
          - exp_labels:
              severity: critical
              instance: foo 
            exp_annotations:
              summary: Certificate for foo will expire in 5 days.

测试失败,表明警报根本没有触发:

$ promtool test rules certificates.yml
Unit Testing:  certificates.yml
  FAILED:
    alertname:CertificateExpiry, time:2h, 
        exp:"[Labels:{alertname=\"CertificateExpiry\", instance=\"foo\", severity=\"critical\"} Annotations:{summary=\"Certificate for foo will expire in 5 days.\"}]", 
        got:"[]"

但是,如果我改变评估和系列时间,如下所示:

      - alert: CertificateExpiry                                                         
        expr: certificate_expires_in_days < 20                                           
        for: 8m                                                                          
        labels:                                                                          
          severity: critical                                                             
        annotations:                                                                     
          summary: Certificate for {{ $labels.instance }} will expire in {{ $value }} days. 

rule_files:
  - 'path/to/alert_rules.yml'

evaluation_interval: 1m                                                                  
                                                                                         
tests:                                                                                   
  - interval: 1m                                                                         
    input_series:                                                                        
      - series: 'certificate_expires_in_days{instance="foo"}'                            
        values: '5 5 5 5 5 5 5 5'                                                        
    alert_rule_test:                                                                     
      - eval_time: 8m                                                                    
        alertname: CertificateExpiry                                                     
        exp_alerts:                                                                      
          - exp_labels:                                                                  
              severity: critical                                                         
              instance: foo                                                              
            exp_annotations:                                                             
              summary: Certificate for foo will expire in 5 days.

测试成功。所以,问题出在时间上,其他地方的拼写错误可以排除。

显然,这是我的某种愚蠢的注意力错误,但我看不出来。有人能帮忙吗?

相关内容