发生内核错误 (oop) 时如何运行脚本?

发生内核错误 (oop) 时如何运行脚本?

我想在发生内核错误时运行自定义脚本。这可行吗?如果可行,该怎么做?

答案1

我的回答是,您可以rsyslog通过 Shell Execute 操作符 ( ^program-to-execute;template) 安装并运行该脚本。但是,这可能行不通,因为内核出错后系统肯定会不负责任,不会运行自定义脚本。

因此,我建议你运行一个脚本在另一台服务器中当发生内核 oop 时。例如:

  1. 在最终产生内核 oop 的服务器中,使用网络控制台模块

    # /etc/modprobe.d/netconsole.conf
    # This example assumes 10.0.0.1 as the "bad" server and 10.0.0.2 as the "monitor" server
    options netconsole [email protected]/eth0,[email protected]/01:23:45:67:89:AB
    options netconsole oops_only=1
    
    
    # /etc/modules-load.d/netconsole.conf
    # Tells 'systemd-modules-load' to load 'netconsole' automatically at boot
    netconsole
    
  2. 在监视服务器(接收内核 oops 的服务器)中,通过 运行自定义脚本rsyslog

    # /etc/rsyslog.d/kernel-oops-handler.conf
    
    module(load="imudp")
    
    input(type="imudp" 
        port="30514"
        ruleset="KernelOopsRuleSet")
    
    # This aims to supply the IP address of the "bad" server in command line
    template(name="KernelOopsArgs"
        type="string"
        string="%fromhost-ip%")
    
    ruleset(name="KernelOopsRuleSet") {
        # This assumes that the '--[ cut here ]--' string is a kernel oops evidence
        if ($msg contains "------------[ cut here ]------------") then {
            kern.crit ^/path/to/custom/script.sh;KernelOopsArgs
        }
    }
    
  3. 自定义脚本可以通过带外管理接口(戴尔服务器上的 iDRAC)重新启动机器:

    #!/bin/bash
    # /path/to/custom/script.sh
    # A successful SSH to the host indicates the server is responsible
    sleep 3
    server="${1}"
    if ! ssh -n -o ConnectTimeout=10 -o ControlPath=none "${server}" true; then
        # Let me suppose 10.100.0.1 is the iDRAC IP address of a server whose IP is 10.0.0.1
        idrac="`echo \"${server}\" | sed 's/^10\.0\./10.100./'`"
        # Trigger a forced reboot using 'ipmitool'
        ipmitool -H "${idrac}" -U root -P root chassis power reset
        # Notify administrators
        mail -s "Server '${server}' was restarted!" [email protected] < /dev/null
    fi
    

相关内容