我正在使用 0.9.10 版 graphite 来读取ifHCInOctets
和ifHCOutOctets
,我正在用它进行轮询collectd
snmp
和graphite_write
插件;我正在使用collectd
版本 5.1.0。
数据毫无问题地到达 Graphite;但是,我想以每秒位数来绘制图表。为了测试我的统计数据是否正确,我开始下载 CD iso 并观察下载速率……它在 1.0Mbps 和 2.0Mbps 之间变化。
常识告诉您需要将八位字节计数器乘以 8 才能获得位;但是,我似乎需要除以 8 才能使石墨正确显示。
当我乘以 0.125 比例因子时,scale(scaleToSeconds(nonNegativeDerivative(<SERIES>), 60),0.125)
公式正确地转换为比特/秒,我看到 1Mpbs 和 2Mbps 之间的数字......
当我乘以 8.0 比例因子时,scale(scaleToSeconds(nonNegativeDerivative(<SERIES>), 60),8)
结果显然是错误的……图表的峰值为 120Mbps。我知道这是错误的,因为这是一个上限为 5M 的家用电缆调制解调器。
问题:如果我将八位字节发送到石墨,为什么会scale(<foo>, 8)
产生不正确的结果?
/opt/collectd/etc/collectd.conf
LoadPlugin syslog
LoadPlugin cpu
LoadPlugin interface
LoadPlugin load
LoadPlugin memory
LoadPlugin network
LoadPlugin snmp
LoadPlugin write_graphite
<Plugin snmp>
<Data "std_traffic">
Type "if_octets"
Table true
Instance "IF-MIB::ifName"
Values "IF-MIB::ifHCInOctets" "IF-MIB::ifHCOutOctets"
</Data>
<Host "fw.pennington.net">
Address "172.16.1.1"
Version 2
Community "public"
Collect "std_traffic"
Interval 60
</Host>
</Plugin>
<Plugin write_graphite>
<Carbon>
Host "localhost"
Port "2003"
Prefix ""
Postfix ""
StoreRates false
AlwaysAppendDS false
EscapeCharacter "_"
</Carbon>
</Plugin>
/opt/graphite/conf/storage-schema.conf:
[carbon]
pattern = ^carbon\.
retentions = 60s:90d
[default]
pattern = .*
retentions = 60s:1w, 5m:1y
/opt/graphite/conf/carbon.conf:
[cache]
USER = carbon
MAX_CACHE_SIZE = inf
MAX_UPDATES_PER_SECOND = 500
MAX_CREATES_PER_MINUTE = 50
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2003
ENABLE_UDP_LISTENER = False
UDP_RECEIVER_INTERFACE = 0.0.0.0
UDP_RECEIVER_PORT = 2003
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2004
USE_INSECURE_UNPICKLER = False
CACHE_QUERY_INTERFACE = 0.0.0.0
CACHE_QUERY_PORT = 7002
USE_FLOW_CONTROL = True
LOG_UPDATES = False
WHISPER_AUTOFLUSH = False
[relay]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2013
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2014
RELAY_METHOD = rules
REPLICATION_FACTOR = 1
DESTINATIONS = 127.0.0.1:2004
MAX_DATAPOINTS_PER_MESSAGE = 500
MAX_QUEUE_SIZE = 10000
USE_FLOW_CONTROL = True
[aggregator]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2023
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2024
DESTINATIONS = 127.0.0.1:2004
REPLICATION_FACTOR = 1
MAX_QUEUE_SIZE = 10000
USE_FLOW_CONTROL = True
MAX_DATAPOINTS_PER_MESSAGE = 500
MAX_AGGREGATION_INTERVALS = 5
输出自whisper-fetch.py
root@tsunami:/opt/graphite/conf# python /usr/local/bin/whisper-fetch.py --pretty /opt/graphite/storage/whisper/fw_pennington_net/snmp/if_octets-Ethernet0_0/rx.wsp
Mon Sep 10 02:53:00 2012 110454375894.000000
...
Tue Sep 11 02:50:00 2012 110532796093.000000
Tue Sep 11 02:51:00 2012 110532819931.000000 <------------ Correct
Tue Sep 11 02:52:00 2012 None
root@tsunami:/opt/graphite/conf#
输出自show interface eth0/0
mpenning-fw# sh int eth0/0
Interface Ethernet0/0 "", is up, line protocol is up
Hardware is 88E6095, BW 100 Mbps, DLY 100 usec
Auto-Duplex(Full-duplex), Auto-Speed(100 Mbps)
Description: TIME WARNER 5Mbps circuit
Available but not configured via nameif
MAC address 0019.0726.4a39, MTU not set
IP address unassigned
157040376 packets input, 110532814004 bytes, 0 no buffer
^^^^^^^^^^^^^^^^^^
Received 68921847 broadcasts, 0 runts, 0 giants
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 L2 decode drops
8589974681 switch ingress policy drops
57851429 packets output, 8036229250 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 babbles, 0 late collisions, 0 deferred
0 lost carrier, 0 no carrier
0 rate limit drops
0 switch egress policy drops
mpenning-fw#
答案1
如果您想要每秒位数,则需要指定 1 而不是 60 作为scaleToSeconds()
函数的第二个参数,因为您想要的是每秒位数,而不是每分钟位数。(助记符:它是到秒,不从秒:)
这是原始补丁实现该功能;它可能会澄清。
答案2
我花了几个小时尝试让它在 Grafana v2.6 中正常工作,但找不到正确的解决方案。所以这里是:
- 确保您在
/etc/carbon/storage-aggregation.conf
我的所有网络设备都带有前缀net
[net] pattern = ^net.* retentions = 10s:7d,5m:180d,5m:5y
创建
/etc/collectd/collectd.conf.d/snmp.conf
并添加以下内容:FQDNLookup true AutoLoadPlugin true Interval 10 LoadPlugin syslog <Plugin syslog> LogLevel info </Plugin> LoadPlugin contextswitch LoadPlugin cpu LoadPlugin df LoadPlugin entropy LoadPlugin interface LoadPlugin irq LoadPlugin load LoadPlugin memory LoadPlugin processes #LoadPlugin sensors LoadPlugin swap LoadPlugin unixsock LoadPlugin users LoadPlugin write_graphite <Plugin write_graphite> <Node "stats"> Host "stats.foo.com" Port "2003" Protocol "tcp" LogSendErrors true Prefix "net." SeparateInstances true StoreRates true AlwaysAppendDS false EscapeCharacter "_" </Node> </Plugin> <Plugin unixsock> SocketFile "/var/run/collectd-unixsock" SocketGroup "adm" SocketPerms "0660" </Plugin> <Plugin df> # ignore rootfs; else, the root file-system would appear twice, causing # one of the updates to fail and spam the log FSType rootfs # ignore the usual virtual / temporary file-systems FSType sysfs FSType proc FSType devtmpfs FSType devpts FSType tmpfs FSType fusectl FSType cgroup IgnoreSelected true </Plugin> # added a special types.db for cisco devices TypesDB "/usr/share/collectd/types.db" "/usr/share/collectd/types.custom.db" LoadPlugin snmp <Plugin snmp> # the <Data> name is what to <Collect> down in the <Host> blocks # the "Type" must be from the list included in /usr/share/collectd/types.db (or custom.db) <Data "if_octets"> Type "if_octets" Table true # Note: I use ifAlias so that it shows the Interface Descriptions instead just "GigabitEthernet_1_0_0, etc. But of course, make sure you have interface descriptions if you use this :) Instance "IF-MIB::ifAlias" Values "IF-MIB::ifHCInOctets" "IF-MIB::ifHCOutOctets" </Data> <Data "if_errors"> Type "if_errors" Table true Instance "IF-MIB::ifAlias" Values "IF-MIB::ifInErrors" "IF-MIB::ifOutErrors" </Data> <Data "cisco_cpu"> Type "cisco_cpu" Table true Values "CISCO-PROCESS-MIB::cpmCPUTotal5secRev" "CISCO-PROCESS-MIB::cpmCPUTotal1minRev" "CISCO-PROCESS-MIB::cpmCPUTotal5minRev" </Data> <Data "uptime"> Type "uptime" Table false Instance "Uptime" scale 0.01 Values "DISMAN-EVENT-MIB::sysUpTimeInstance" </Data> <Data "memory_free"> Type "memory_free" Table true Instance "CISCO-MEMORY-POOL-MIB::ciscoMemoryPoolName" Values "CISCO-MEMORY-POOL-MIB::ciscoMemoryPoolFree" </Data> <Data "memory_used"> Type "memory_used" Table true Instance "CISCO-MEMORY-POOL-MIB::ciscoMemoryPoolName" Values "CISCO-MEMORY-POOL-MIB::ciscoMemoryPoolUsed" </Data> # Hosts: <Host "rtr"> Address "192.168.1.1" Version 2 Community "public" Collect "if_octets" "cisco_cpu" "uptime" Interval 10 </Host> <Host "switch"> Address "192.168.1.254" Version 2 Community "public" Collect "if_octets" "cisco_cpu" "uptime" Interval 10 </Host> </Plugin>
这是 custom.db:(我不记得从哪里得到原件,但还是要感谢那家伙!)
if_stats ifHCInOctets:COUNTER:0:U, ifHCOutOctets:COUNTER:0:U, ifHCInUcastPkts:COUNTER:0:U, ifHCInMulticastPkts:COUNTER:0:U, ifHCInBroadcastPkts:COUNTER:0:U, ifHCOutUcastPkts:COUNTER:0:U, ifHCOutMulticastPkts:COUNTER:0:U, ifHCOutBroadcastPkts:COUNTER:0:U, ifInDiscards:COUNTER:0:U, ifInErrors:COUNTER:0:U, ifOutDiscards:COUNTER:0:U, ifOutErrors:COUNTER:0:U
if_octets_hc ifHCInOctets:COUNTER:0:U, ifHCOutOctets:COUNTER:0:U
if_packets_hc ifHCInUcastPkts:COUNTER:0:U, ifHCInMcastPkts:COUNTER:0:U, ifHCInBcastPkts:COUNTER:0:U, ifHCOutUcastPkts:COUNTER:0:U, ifHCOutMcastPkts:COUNTER:0:U, ifHCOutBcastPkts:COUNTER:0:U
if_drop_discard_err_que ifInDiscards:COUNTER:0:U, ifInErrors:COUNTER:0:U, ifOutDiscards:COUNTER:0:U, ifOutErrors:COUNTER:0:U
if_rgpackets ifInUcastPkts:COUNTER:0:U, ifInNUcastPkts:COUNTER:0:U, ifOutUcastPkts:COUNTER:0:U, ifOutNUcastPkts:COUNTER:0:U
sensors sensorValue:GAUGE:U:U, sensorThreshold:GAUGE:U:U
uptime uptime:GAUGE:U:U
cisco_cpu cpu5sec:GAUGE:0:100, cpu1min:GAUGE:0:100, cpu5min:GAUGE:0:100
routes ipv4routes:GAUGE:0:U, ipv6routes:GAUGE:0:U, mcastroutes:GAUGE:0:U
ipsla rttAdmNumDistBkt:GAUGE:0:200, rttAdmDistInt:GAUGE:0:200, rttTotalsInit:COUNTER:0:U, rttCollectDrops:COUNTER:0:U, rttCollectTimeouts:COUNTER:0:U, rttCptComplTimeMn:GAUGE:0:100000, rttCptComplTimeMx:GAUGE:0:100000, rttCptSumCmpTm2Hi:COUNTER:0:U, rttCptSumCmpTm2Lo:COUNTER:0:U, rttCptSumCmpTm:COUNTER:0:U, rttCptOverThres:COUNTER:0:U
ipslaminimal rttCptCompletions:COUNTER:0:U
ipsla2 rttCollectTimeouts:COUNTER:0:U
如果有人有正确的配置(和types.custom.db
)来跟踪其他思科指标,如电源、双工、风扇、(特别是)NBAR 等,请分享!
在 Grafana 中,像这样配置图表:
alias(scale(scaleToSeconds(net.rtr.snmp.if_octets.RTR-Outside-Gi0_0.rx, 0.125), 3600), 'Download')
您需要将其替换net.rtr.snmp.if_octets.RTR-Outside-Gi0_0
为您的设备名称和 ifAlias
答案3
我正在运行 Graphite 0.9.9,scaleToSeconds 对我来说不可用。要解决这个问题,您需要详细了解该指标... 以我的例子为例:
Y 值以百万为单位,而不是 Mbps。您可以通过在图表 URL 中设置 yUnitSystem=none 来验证这一点。其次,八位字节是 8 位数据,即一个字节。我的峰值 2000000000 字节(八位字节)是每分钟的指标,因此为了更好地理解它,让我们进行计算:
2000000000B/60s ≈ 33333333B/s ≈ 32 兆字节/秒 ≈ 254 兆比特/秒
我的千兆以太网(1000 Mbps)接口上的 254 Mbps 完全在其能力范围内。希望这能有所帮助。