我正在尝试为 Oracle 数据库设置 Prometheus 导出器。对于数据库监控,我不想每次都收集所有指标。相反,我会以不同的频率收集指标。例如,对于数据库,几乎每分钟都需要监控 CPU。但是,对于 0 级备份,我每天只需监控一次。
有什么方法可以实现同样的效果吗?
这是为了保护数据库,避免对每天只能收集一次的指标进行不必要的多次查询。
这是我的 Grafana 配置文件。
server:
log_level: warn
metrics:
global:
scrape_interval: 60s
remote_write:
- url:
oauth2:
client_id:
client_secret:
token_url:
configs:
- name: default
scrape_configs:
- job_name: traces
honor_labels: true
static_configs:
- targets:
- job_name: oracle_pdb_default
honor_labels: true
static_configs:
- targets:
-
labels:
service_name:
instance:
role: "primary"
- job_name: oracle_cdb_default
honor_labels: true
static_configs:
- targets:
-
labels:
service_name:
instance:
role: "primary"
integrations:
agent:
enabled: true
node_exporter:
enabled: true
include_exporter_metrics: true
enable_collectors:
- "systemd"
根据此,抓取间隔为 60 秒。因此,每 60 秒收集一次所有指标。
这是我的 Oracle 节点导出器。
cat /opt/custom-metrics-cdb.toml
[[metric]]
context = "index_unusable"
metricsdesc = { count="Gauge metric with count of index_unusable." }
labels = ["segment_name"]
request = "SELECT COUNT(*) as count from dba_indexes, v$database where status not in ('VALID','N/A') and database_role='PRIMARY'"
scrape_interval = "1h"
[[metric]]
context = "invalid_objects"
labels = [ "owner","object_name","object_type" ]
metricsdesc = {count="Generic counter metric from CDB_invalid_objects in Oracle." }
request = "select owner,object_name,object_type,count(*) as count from cdb_invalid_objects, v$database where database_role='PRIMARY' group by owner,object_name,object_type"
ignorezeroresult = true
[[metric]]
context = "stale_stats_checks"
labels = [ "owner","table_name","object_type" ,"partition_name","last_analyzed"]
metricsdesc = {count="Generic counter metric from CDB_TAB_STATISTICS in Oracle." }
request = "select owner,table_name,object_type,partition_name,to_char(last_analyzed,'dd-Mon-yyyy hh24:mi') last_analyzed,count(*) as count from CDB_TAB_STATISTICS, v$database where stale_stats = 'YES' and owner not in ('SYS','SYSTEM','DBSNMP','SYSMAN','ODSOGG') and table_name not like '%INTERIM%' and table_name not like '%TEMP%' and owner not like '%XDB%' and last_analyzed< trunc(sysdate-1)+22/24 and database_role='PRIMARY' group by owner,table_name,object_type,partition_name,last_analyzed order by 1,2"
[[metric]]
context = "number_logs_switches"
labels = [ "period","thread","hour"]
metricsdesc = {count="Generic counter metric in Oracle." }
request = "SELECT comp_time_start || '-' || comp_time_end as period, thread# as thread,comp_time_end as hour, no_logs as count FROM ( SELECT TO_CHAR(completion_time, 'dd/mm/yyyy hh24')|| ':00' comp_time_start ,TO_CHAR(completion_time +(1 / 24), 'hh24')|| ':00' comp_time_end, THREAD#,DEST_ID,COUNT(*) no_logs FROM v$archived_log, v$database WHERE completion_time BETWEEN SYSDATE - 1/24 AND SYSDATE AND standby_dest = 'NO' and database_role='PRIMARY' GROUP BY TO_CHAR(completion_time, 'dd/mm/yyyy hh24'), TO_CHAR(completion_time +(1 / 24), 'hh24') ,THREAD#,DEST_ID ) where rownum =1"
这里对于 index_unusable 指标,我们尝试给出“scrape_interval = “1h””,但这不起作用,并且此文件中提到的所有指标每 60 秒收集一次,如 Grafana 全局抓取间隔中所述。
期待一些可以实现相同目标的建议。
谢谢!