我有 4 台服务器运行collectd:
PVE
(Proxmox VE) 和AP1
(Zyxel AP with OpenWRT) 服务器运行collectd
写入操作ROUTER
,路由器采用它自己的指标并充当代理collectd
并将其全部发送到SERVER
PVE和AP1配置相同:
Hostname "pve or ap1"
FQDNLookup false
Interval 5
LoadPlugin network
LoadPlugin cpu
LoadPlugin memory
LoadPlugin uptime
<Plugin "network">
<Server "ip-of-router" "25826">
ResolveInterval 30
</Server>
ReportStats false
</Plugin>
<Plugin "cpu">
ReportByState true
ReportByCpu false
ValuesPercentage false
ReportNumCpu true
</Plugin>
路由器这样配置:
Hostname "router"
FQDNLookup false
Interval 5
LoadPlugin network
LoadPlugin cpu
LoadPlugin memory
LoadPlugin interface
LoadPlugin uptime
<Plugin "network">
<Listen "0.0.0.0" "25826">
</Listen>
<Server "ip-of-collectd-server" "25826">
Interface wgcli_hub
ResolveInterval 30
</Server>
Forward true
ReportStats true
</Plugin>
<Plugin "cpu">
ReportByState true
ReportByCpu false
ValuesPercentage false
ReportNumCpu true
</Plugin>
<Plugin "interface">
Interface eth1
Interface eth2
IgnoreSelected false
ReportInactive true
</Plugin>
数据流以这种方式正常,但在router
日志中我收到了和 的Value too old
错误pve
ap1
Mon Mar 13 19:44:04 2023 daemon.err collectd[2616]: uc_update: Value too old: name = pve/cpufreq-2/cpufreq; value time = 1678725834.442; last cache update = 1678725839.442;
Mon Mar 13 19:44:59 2023 daemon.err collectd[2616]: uc_update: Value too old: name = ap1/memory/memory-buffered; value time = 1678725889.720; last cache update = 1678725894.720;
Mon Mar 13 19:44:59 2023 daemon.err collectd[2616]: uc_update: Value too old: name = ap1/cpu/percent-interrupt; value time = 1678725889.721; last cache update = 1678725894.721;
Mon Mar 13 19:45:49 2023 daemon.err collectd[2616]: uc_update: Value too old: name = pve/cpu/percent-wait; value time = 1678725944.382; last cache update = 1678725949.382;
Mon Mar 13 19:45:49 2023 daemon.err collectd[2616]: uc_update: Value too old: name = pve/cpu/percent-nice; value time = 1678725944.382; last cache update = 1678725949.382;
Mon Mar 13 19:46:59 2023 daemon.err collectd[2616]: uc_update: Value too old: name = ap1/cpu/percent-wait; value time = 1678726009.721; last cache update = 1678726014.721;
Mon Mar 13 19:46:59 2023 daemon.err collectd[2616]: uc_update: Value too old: name = ap1/memory/memory-slab_recl; value time = 1678726009.720; last cache update = 1678726014.720;
我已经仔细检查过:ap1\pve 上没有任何第二个collectd 进程,没有同名的collectd 发送数据,网络插件仅加载一次。
我注意到总是在时间value time
之前5秒last cache update
,并且5秒是collectd的间隔。
我注意到它会定期发生,但每次指标都不同(主机也是如此,但不包括在内):
18:30:40 <..> pve/memory/memory-free; value time = 1678721430.332; last cache update = 1678721435.332;
18:31:00 <..> pve/cpu/percent-softirq; value time = 1678721455.332; last cache update = 1678721460.332;
18:34:10 <..> pve/cpu/percent-nice; value time = 1678721645.332; last cache update = 1678721650.332;
18:34:30 <..> pve/cpu/percent-idle; value time = 1678721665.332; last cache update = 1678721670.332;
18:34:30 <..> pve/cpu/percent-wait; value time = 1678721665.332; last cache update = 1678721670.332;
18:36:15 <..> pve/memory/memory-free; value time = 1678721765.332; last cache update = 1678721770.332;
18:36:15 <..> pve/sensors-coretemp-isa-0000/temperature-temp1; value time = 1678721765.333; last cache update = 1678721770.333;
18:36:35 <..> pve/cpu/count; value time = 1678721790.332; last cache update = 1678721795.332;
18:40:05 <..> pve/memory/memory-used; value time = 1678722000.332; last cache update = 1678722005.332;
18:40:05 <..> pve/cpu/percent-idle; value time = 1678722000.332; last cache update = 1678722005.332;
18:42:30 <..> pve/memory/memory-slab_unrecl; value time = 1678722145.332; last cache update = 1678722150.332;
18:42:30 <..> pve/memory/memory-free; value time = 1678722145.332; last cache update = 1678722150.332;
18:43:00 <..> pve/memory/memory-used; value time = 1678722175.332; last cache update = 1678722180.332;
18:44:35 <..> pve/cpu/percent-steal; value time = 1678722270.332; last cache update = 1678722275.332;
18:44:50 <..> pve/memory/memory-used; value time = 1678722285.332; last cache update = 1678722290.332;
18:44:50 <..> pve/memory/memory-free; value time = 1678722285.332; last cache update = 1678722290.332;
18:46:25 <..> pve/memory/memory-slab_unrecl; value time = 1678722380.332; last cache update = 1678722385.332;
18:47:10 <..> pve/cpu/count; value time = 1678722425.332; last cache update = 1678722430.332;
18:47:15 <..> pve/cpufreq-1/cpufreq; value time = 1678722430.374; last cache update = 1678722435.374;
18:49:05 <..> pve/memory/memory-used; value time = 1678722540.332; last cache update = 1678722545.332;
18:50:40 <..> pve/memory/memory-buffered; value time = 1678722635.332; last cache update = 1678722640.332;
18:54:45 <..> pve/memory/memory-slab_recl; value time = 1678722875.332; last cache update = 1678722880.332;
19:01:05 <..> pve/cpufreq-1/cpufreq; value time = 1678723255.374; last cache update = 1678723260.374;
19:07:10 <..> pve/cpu/percent-softirq; value time = 1678723625.332; last cache update = 1678723630.332;
19:08:00 <..> pve/cpu/percent-user; value time = 1678723675.332; last cache update = 1678723680.332;
19:08:20 <..> pve/memory/memory-slab_recl; value time = 1678723695.332; last cache update = 1678723700.332;
19:08:20 <..> pve/memory/memory-cached; value time = 1678723695.332; last cache update = 1678723700.332;
19:14:00 <..> pve/uptime/uptime; value time = 1678724030.335; last cache update = 1678724035.335;
19:14:00 <..> pve/cpufreq-0/cpufreq; value time = 1678724030.354; last cache update = 1678724035.354;
19:15:50 <..> pve/uptime/uptime; value time = 1678724140.335; last cache update = 1678724145.335;
19:15:50 <..> pve/sensors-coretemp-isa-0000/temperature-temp1; value time = 1678724140.333; last cache update = 1678724145.333;
19:16:55 <..> pve/cpufreq-2/cpufreq; value time = 1678724205.394; last cache update = 1678724210.394;
19:20:05 <..> pve/cpu/percent-wait; value time = 1678724400.332; last cache update = 1678724405.332;
19:25:20 <..> pve/uptime/uptime; value time = 1678724710.335; last cache update = 1678724715.335;
19:25:20 <..> pve/cpufreq-1/cpufreq; value time = 1678724710.374; last cache update = 1678724715.374;
19:28:30 <..> pve/uptime/uptime; value time = 1678724900.335; last cache update = 1678724905.335;
19:28:30 <..> pve/cpufreq-0/cpufreq; value time = 1678724900.354; last cache update = 1678724905.354;
19:30:05 <..> pve/uptime/uptime; value time = 1678724995.335; last cache update = 1678725000.335;
19:30:55 <..> pve/cpufreq-2/cpufreq; value time = 1678725045.394; last cache update = 1678725050.394;
19:31:30 <..> pve/cpu/percent-wait; value time = 1678725085.332; last cache update = 1678725090.332;