我刚刚在 CentOS 中使用 Apache 部署了我的网络服务器,我想知道是否有人对如何检查每个指定的时间量(如果服务器出现故障)有什么好的想法,然后我可以在发生这种情况时使用 postfix 给我发电子邮件,这样我就可以回去立即连接到我的服务器并修复问题并查看导致问题的原因。我猜想许多网站都会使用一些软件/脚本来让他们在客户开始抱怨问题之前知道他们的服务何时停止。
答案1
答案2
您可以在服务器中放置一个脚本,如果服务器处于活动状态,则可以在其中进行各种测试,如下所示:
#!/bin/bash
date;
echo "uptime:"
uptime
echo "Currently connected:"
w
echo "--------------------"
echo "Last logins:"
last -a |head -3
echo "--------------------"
echo "Disk and memory usage:"
df -h | xargs | awk '{print "Free/total disk: " $11 " / " $9}'
free -m | xargs | awk '{print "Free/total memory: " $17 " / " $8 " MB"}'
echo "--------------------"
start_log=`head -1 /var/log/messages |cut -c 1-12`
oom=`grep -ci kill /var/log/messages`
echo -n "OOM errors since $start_log :" $oom
echo ""
echo "--------------------"
echo "Utilization and most expensive processes:"
top -b |head -3
echo
top -b |head -10 |tail -4
echo "--------------------"
echo "Open TCP ports:"
nmap -p- -T4 127.0.0.1
echo "--------------------"
echo "Current connections:"
ss -s
echo "--------------------"
echo "processes:"
ps auxf --width=200
echo "--------------------"
echo "vmstat:"
vmstat 1 5
这会给你以下结果:
./Server-Health.sh
Tue Jul 16 22:01:06 IST 2013
uptime:
22:01:06 up 174 days, 4:42, 1 user, load average: 0.36, 0.25, 0.18
Currently connected:
22:01:06 up 174 days, 4:42, 1 user, load average: 0.36, 0.25, 0.18
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
tecmint pts/0 116.72.134.162 21:48 0.00s 0.03s 0.03s sshd: tecmint [priv]
--------------------
Last logins:
tecmint pts/0 Tue Jul 16 21:48 still logged in 116.72.134.162
tecmint pts/0 Tue Jul 16 21:24 - 21:43 (00:19) 116.72.134.162
--------------------
Disk and memory usage:
Free/total disk: 292G / 457G
Free/total memory: 3510 / 3838 MB
--------------------
OOM errors since Jul 14 03:37 : 0
--------------------
Utilization and most expensive processes:
top - 22:01:07 up 174 days, 4:42, 1 user, load average: 0.36, 0.25, 0.18
Tasks: 149 total, 1 running, 148 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.1%us, 0.0%sy, 0.0%ni, 99.3%id, 0.6%wa, 0.0%hi, 0.0%si, 0.0%st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 3788 1128 932 S 0.0 0.0 0:32.94 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:14.07 migration/0