正在使用 collectd 收集系统指标并将其推送到 influx db。
为了监控正在运行的过程,我编写了一个脚本,利用 collectd exec 插件将我的自定义指标推送到 influx db...我的 exec 插件脚本是
#!/bin/bash
tmpfile=$(mktemp)
HOSTNAME="${COLLECTD_HOSTNAME:-slave1-collectd}"
INTERVAL="${COLLECTD_INTERVAL:-6}"
while sleep "$INTERVAL"; do
sudo systemctl list-units --type service --all | grep running | awk -v OFS='\t' '{ print $1, $2, $4 }' > "$tmpfile"
done
我的 collectd conf 文件
<Plugin exec>
Exec developer "/home/developer/process.sh"
<Plugin>
如果我在 influxdb 中运行命令 show measure,它显示的是其他插件的测量值而不是执行插件:
cpu_value
df_value
disk_io_time
disk_read
disk_value
disk_weighted_io_time
disk_write
interface_rx
interface_tx
load_longterm
load_midterm
load_shortterm
memory_value
processes_majflt
processes_minflt
processes_processes
processes_read
processes_rx
processes_syst
processes_threads
processes_tx
processes_user
processes_value
processes_write
table_value
uptime_value
users_value
谁能帮帮我吗!
我的最终目标是监控所有正在运行的服务,如下所示
[developer@slave1-collectd ~]$ systemctl list-units --type service --state=running
UNIT LOAD ACTIVE SUB DESCRIPTION
amazon-ssm-agent.service loaded active running amazon-ssm-agent
auditd.service loaded active running Security Auditing Service
chronyd.service loaded active running NTP client/server
collectd.service loaded active running Collectd statistics daemon
crond.service loaded active running Command Scheduler
dbus.service loaded active running D-Bus System Message Bus
[email protected] loaded active running Getty on tty1
gssproxy.service loaded active running GSSAPI Proxy Daemon
httpd.service loaded active running The Apache HTTP Server
network.service loaded active running LSB: Bring up/down networking
polkit.service loaded active running Authorization Manager
postfix.service loaded active running Postfix Mail Transport Agent
rpcbind.service loaded active running RPC bind service
rsyslog.service loaded active running System Logging Service
[email protected] loaded active running Serial Getty on ttyS0
sshd.service loaded active running OpenSSH server daemon
systemd-journald.service loaded active running Journal Service
systemd-logind.service loaded active running Login Service
systemd-udevd.service loaded active running udev Kernel Device Manager
tuned.service loaded active running Dynamic System Tuning Daemon
我想推送所有正在运行的进程及其单元名称和状态,例如 sshd、kafka、tsdb、collectd、grafana(我想在 Grafana 仪表板上监控所有这些 systemctl 服务)
#!/bin/bash
HOSTNAME="${COLLECTD_HOSTNAME:-`hostname -f`}"
INTERVAL="${COLLECTD_INTERVAL:-10}"
PORT=6379
while sleep "$INTERVAL"
do
b=$(systemctl list-units --type service --all | awk 'BEGIN{print "Service State Status"};$4 ~ /^running$/{print $1,$2,$4}' | column -t )
echo "PUTVAL $HOSTNAME/vs_processes/if_octets interval=$INTERVAL N:$b"
done
当我启动 collectd 时出现错误:parse_value:无法将字符串解析为派生词:“服务”。