以不同频率收集 Grafana/Prometheus 数据库导出器的指标

2024-12-4 • tag-icon

我正在尝试为 Oracle 数据库设置 Prometheus 导出器。对于数据库监控，我不想每次都收集所有指标。相反，我会以不同的频率收集指标。例如，对于数据库，几乎每分钟都需要监控 CPU。但是，对于 0 级备份，我每天只需监控一次。

有什么方法可以实现同样的效果吗？

这是为了保护数据库，避免对每天只能收集一次的指标进行不必要的多次查询。

这是我的 Grafana 配置文件。

server:
  log_level: warn
 
metrics:
  global:
    scrape_interval: 60s
    remote_write:
      - url:
        oauth2:
          client_id: 
          client_secret: 
          token_url:
 
  configs:
    - name: default
      scrape_configs:
        - job_name: traces
          honor_labels: true
          static_configs:
            - targets:
        - job_name: oracle_pdb_default
          honor_labels: true
          static_configs:
            - targets:
               - 
              labels:
                service_name: 
                instance: 
                role: "primary"
        - job_name: oracle_cdb_default
          honor_labels: true
          static_configs:
            - targets:
               - 
              labels:
                service_name: 
                instance: 
                role: "primary"
integrations:
  agent:
    enabled: true
  node_exporter:
    enabled: true
    include_exporter_metrics: true
    enable_collectors:
      - "systemd"

根据此，抓取间隔为 60 秒。因此，每 60 秒收集一次所有指标。

这是我的 Oracle 节点导出器。

cat /opt/custom-metrics-cdb.toml

[[metric]]

context = "index_unusable"

metricsdesc = { count="Gauge metric with count of index_unusable." }

labels = ["segment_name"]

request = "SELECT COUNT(*) as count from dba_indexes, v$database where status not in ('VALID','N/A') and database_role='PRIMARY'"

scrape_interval = "1h"
 
[[metric]]

context = "invalid_objects"

labels = [ "owner","object_name","object_type" ]

metricsdesc = {count="Generic counter metric from CDB_invalid_objects in Oracle." }

request = "select owner,object_name,object_type,count(*) as count from cdb_invalid_objects, v$database where database_role='PRIMARY' group by owner,object_name,object_type"

ignorezeroresult = true
 
[[metric]]

context = "stale_stats_checks"

labels = [ "owner","table_name","object_type" ,"partition_name","last_analyzed"]

metricsdesc = {count="Generic counter metric from CDB_TAB_STATISTICS in Oracle." }

request = "select owner,table_name,object_type,partition_name,to_char(last_analyzed,'dd-Mon-yyyy hh24:mi') last_analyzed,count(*) as count  from CDB_TAB_STATISTICS, v$database where stale_stats = 'YES' and owner not in ('SYS','SYSTEM','DBSNMP','SYSMAN','ODSOGG') and table_name not like '%INTERIM%' and table_name not like '%TEMP%' and owner not like '%XDB%' and last_analyzed< trunc(sysdate-1)+22/24 and database_role='PRIMARY' group by owner,table_name,object_type,partition_name,last_analyzed order by 1,2"
 
 
[[metric]]

context = "number_logs_switches"

labels = [ "period","thread","hour"]

metricsdesc = {count="Generic counter metric in Oracle." }

request = "SELECT comp_time_start || '-' || comp_time_end as period, thread# as thread,comp_time_end as hour, no_logs as count FROM ( SELECT TO_CHAR(completion_time, 'dd/mm/yyyy hh24')|| ':00' comp_time_start ,TO_CHAR(completion_time +(1 / 24), 'hh24')|| ':00' comp_time_end, THREAD#,DEST_ID,COUNT(*) no_logs FROM v$archived_log, v$database WHERE completion_time BETWEEN SYSDATE - 1/24 AND SYSDATE AND standby_dest = 'NO' and database_role='PRIMARY' GROUP BY TO_CHAR(completion_time, 'dd/mm/yyyy hh24'), TO_CHAR(completion_time +(1 / 24), 'hh24') ,THREAD#,DEST_ID  ) where rownum =1"

这里对于 index_unusable 指标，我们尝试给出“scrape_interval = “1h””，但这不起作用，并且此文件中提到的所有指标每 60 秒收集一次，如 Grafana 全局抓取间隔中所述。

期待一些可以实现相同目标的建议。

谢谢！

相关内容