我是 Sensu 修复的新手。我尝试使用自定义脚本重新启动进程。
场景:如果我的 http 链接断开,那么我想手动运行脚本并启动它。
我尝试了 sensu 上的修复,它允许在监控出现问题时自动运行脚本,并使用自定义脚本进行监控检查。然而,我面临的问题是,所有检查和连接都很好,但当我的链接断开时,sensu 修复程序不会触发客户端。我已经发布了日志和配置,请告诉我我哪里出错了……
这是 Sensu 服务器日志
{"timestamp":"2016-05-16T09:44:52.768622+0000","level":"info","message":"processing event","event":{"id":"9a9f66c2-e70e-45fb-87fb-c9e9085c8e05","client":{"name":"zubron","address":"10.0.0.110","subscriptions":["zubron"],"version":"0.20.3","timestamp":1463391880},"check":{"command":"/etc/sensu/plugins/check_http -H 10.0.0.110 -p 7077","interval":60,"occurrences":2,"handlers":["remediator"],"subscribers":["zubron"],"standalone":false,"remediation":{"remediate-zubron":{"occurrences":[1,3],"severities":[2]},"trigger_on":["zubron"]},"name":"check-zubron-port","issued":1463391892,"executed":1463391892,"duration":0.002,"output":"connect to address 10.0.0.110 and port 7077: Connection refused\nHTTP CRITICAL - Unable to open TCP socket\n","status":2,"history":["0","0","0","2","2","2","2","2","2","2","2","2","2","2","2","2","2","2","2","2","2"],"total_state_change":4},"occurrences":18,"action":"create","timestamp":1463391892}}
{"timestamp":"2016-05-16T09:44:52.864908+0000","level":"info","message":"handler output","handler":{"command":"/etc/sensu/handlers/sensu.rb","type":"pipe","severities":["critical"],"name":"remediator"},"output":["/etc/sensu/handlers/sensu.rb:108:in `[]': can't convert String into Integer (TypeError)\n","\tfrom /etc/sensu/handlers/sensu.rb:108:in `block in parse_remediations'\n","\tfrom /etc/sensu/handlers/sensu.rb:106:in `each'\n","\tfrom /etc/sensu/handlers/sensu.rb:106:in `parse_remediations'\n","\tfrom /etc/sensu/handlers/sensu.rb:90:in `handle'\n","\tfrom /var/lib/gems/1.9.1/gems/sensu-plugin-1.2.0/lib/sensu-handler.rb:55:in `block in <class:Handler>'\n","REMEDIATION: Evaluating remediation: zubron {\"remediate-zubron\"=>{\"occurrences\"=>[1, 3], \"severities\"=>[2]}, \"trigger_on\"=>[\"zubron\"]} #=18 sev=2\n"]}
这是我在 Sensu-server 上的检查文件......
{
"checks": {
"check-zubron-port": {
"command": "/etc/sensu/plugins/check_http -H 10.0.0.110 -p 7077",
"interval": 60,
"occurrences": 2,
"handlers": [
"remediator"
],
"subscribers": [
"zubron"
],
"standalone": false,
"remediation": {
"remediate-zubron": {
"occurrences": [
1,
3
],
"severities": [
2
]
},
"trigger_on": [
"zubron"
]
}
}
}
}
这是我的补救文件...
{
"remediate-zubron": {
"command": "sudo /bin/bash ~/zubron/home/moofwd-zubron-server/bin/start-moofwd.sh",
"handlers": [],
"subscribers": [
"zubron"
],
"standalone": false,
"publish": false
}
}
我从这里使用了 Rest sensu.rb关联
我是否遗漏了什么?
如果出现任何问题,是否有其他监控系统可供我们运行脚本或命令?
我已经尝试过 nagios nectar 和 monit。
答案1
我发现您的补救文件有错误。您缺少“检查”键。
应该是这样的,
{
"checks": {
"remediate-zubron": {
"command": "sudo /bin/bash ~/zubron/home/moofwd-zubron-server/bin/start-moofwd.sh",
"handlers": [],
"subscribers": [
"zubron"
],
"standalone": false,
"publish": false
}
}
}
其他一些可能的问题是您的客户端配置应该订阅您的名称,即名称
university
应该university
在订阅者中订阅。{ "client": { "name": "university", "address": "IP ADDRESS", "subscriptions": [ "linux", "web-server", "system", "university" ] } }
另一个问题可能与 remedediator.rb 或 sensu.rb 中的 api_request(:POST) 有关,以下方法应与下面给出的完全相同。在某些旧代码中,它有 ,
'/checks/request'
而不是 ,'/request'
并导致中断。def trigger_remediation(check, subscribers) api_request(:POST, '/request') do |req| req.body = JSON.dump('check' => check, 'subscribers' => subscribers) end end
在某些情况下,您需要同时修复情况 2 和情况 3。
以下是参考链接。让您更清楚地了解该问题。
https://github.com/sensu/sensu-community-plugins/issues/1162