我想在发生内核错误时运行自定义脚本。这可行吗?如果可行,该怎么做?
答案1
我的回答是,您可以rsyslog
通过 Shell Execute 操作符 ( ^program-to-execute;template
) 安装并运行该脚本。但是,这可能行不通,因为内核出错后系统肯定会不负责任,不会运行自定义脚本。
因此,我建议你运行一个脚本在另一台服务器中当发生内核 oop 时。例如:
在最终产生内核 oop 的服务器中,使用网络控制台模块。
# /etc/modprobe.d/netconsole.conf # This example assumes 10.0.0.1 as the "bad" server and 10.0.0.2 as the "monitor" server options netconsole [email protected]/eth0,[email protected]/01:23:45:67:89:AB options netconsole oops_only=1 # /etc/modules-load.d/netconsole.conf # Tells 'systemd-modules-load' to load 'netconsole' automatically at boot netconsole
在监视服务器(接收内核 oops 的服务器)中,通过 运行自定义脚本
rsyslog
。# /etc/rsyslog.d/kernel-oops-handler.conf module(load="imudp") input(type="imudp" port="30514" ruleset="KernelOopsRuleSet") # This aims to supply the IP address of the "bad" server in command line template(name="KernelOopsArgs" type="string" string="%fromhost-ip%") ruleset(name="KernelOopsRuleSet") { # This assumes that the '--[ cut here ]--' string is a kernel oops evidence if ($msg contains "------------[ cut here ]------------") then { kern.crit ^/path/to/custom/script.sh;KernelOopsArgs } }
自定义脚本可以通过带外管理接口(戴尔服务器上的 iDRAC)重新启动机器:
#!/bin/bash # /path/to/custom/script.sh # A successful SSH to the host indicates the server is responsible sleep 3 server="${1}" if ! ssh -n -o ConnectTimeout=10 -o ControlPath=none "${server}" true; then # Let me suppose 10.100.0.1 is the iDRAC IP address of a server whose IP is 10.0.0.1 idrac="`echo \"${server}\" | sed 's/^10\.0\./10.100./'`" # Trigger a forced reboot using 'ipmitool' ipmitool -H "${idrac}" -U root -P root chassis power reset # Notify administrators mail -s "Server '${server}' was restarted!" [email protected] < /dev/null fi