因此 ping 成功了,但 SSH 和其他服务没有响应(apache2 等)。我不得不做一个硬重启服务器,现在事情进展顺利,我可以连接到 SSH,并且 Web 服务器也正常。
重启后我该如何调试?我使用的是 Debian 7.10 (Wheezy),使用 root 权限
如果我检查/var/log/messages
我发现14:36 和 14:53 之间有一个空洞(我在 14:53 进行了硬重启)。您可以在此处找到 14.36 处的消息:
Dec 2 14:36:11 nsserver kernel: apache2 invoked oom-killer: gfp_mask=0x3000d0, order=2, oom_score_adj=0
Dec 2 14:36:16 nsserver kernel: apache2 cpuset=/ mems_allowed=0
Dec 2 14:36:16 nsserver kernel: CPU: 0 PID: 19242 Comm: apache2 Tainted: G W 3.14.32-xxxx-grs-ipv6-64 #7
Dec 2 14:36:16 nsserver kernel: Hardware name: OVH SP/DG31PR, BIOS PRG3110H.86A.0071.2010.0318.1704 03/18/2010
Dec 2 14:36:16 nsserver kernel: 0000000000000000 ffffc9000516baf0 ffffffff81efbbb8 0000000000000007
Dec 2 14:36:16 nsserver kernel: ffff880129678000 ffffc9000516bb90 ffffffff81ef504f ffffc9000516bb30
Dec 2 14:36:16 nsserver kernel: ffffffff81136cc7 0000000000000000 ffff8800abafde50 ffff8800abafde68
Dec 2 14:36:16 nsserver kernel: Call Trace:
Dec 2 14:36:16 nsserver kernel: [<ffffffff81efbbb8>] dump_stack+0x46/0x58
Dec 2 14:36:16 nsserver kernel: [<ffffffff81ef504f>] dump_header+0x75/0x1ea
Dec 2 14:36:16 nsserver kernel: [<ffffffff81136cc7>] ? ktime_get_ts+0x47/0xe0
Dec 2 14:36:16 nsserver kernel: [<ffffffff81158134>] ? delayacct_end+0x84/0xa0
Dec 2 14:36:16 nsserver kernel: [<ffffffff8169faa6>] ? ___ratelimit+0x96/0x110
Dec 2 14:36:16 nsserver kernel: [<ffffffff81169b81>] oom_kill_process+0x201/0x350
Dec 2 14:36:16 nsserver kernel: [<ffffffff810f4612>] ? has_capability_noaudit+0x12/0x20
Dec 2 14:36:16 nsserver kernel: [<ffffffff8116a2cc>] out_of_memory+0x41c/0x510
Dec 2 14:36:16 nsserver kernel: [<ffffffff8116fb06>] __alloc_pages_nodemask+0x776/0x810
Dec 2 14:36:16 nsserver kernel: [<ffffffff81165f62>] ? unlock_page+0x62/0x70
Dec 2 14:36:16 nsserver kernel: [<ffffffff810e9674>] copy_process.part.47+0x124/0x17d0
Dec 2 14:36:16 nsserver kernel: [<ffffffff816b31e1>] ? __list_del_entry+0x11/0x30
Dec 2 14:36:16 nsserver kernel: [<ffffffff816b3211>] ? list_del+0x11/0x30
Dec 2 14:36:16 nsserver kernel: [<ffffffff81136cc7>] ? ktime_get_ts+0x47/0xe0
Dec 2 14:36:16 nsserver kernel: [<ffffffff811cf7d8>] ? poll_select_copy_remaining+0x138/0x280
Dec 2 14:36:16 nsserver kernel: [<ffffffff810eaee9>] do_fork+0xd9/0x310
Dec 2 14:36:16 nsserver kernel: [<ffffffff811d090b>] ? SyS_select+0x12b/0x1b0
Dec 2 14:36:16 nsserver kernel: [<ffffffff810eb1a1>] SyS_clone+0x11/0x20
Dec 2 14:36:16 nsserver kernel: [<ffffffff81f05b35>] stub_clone+0x65/0x90
Dec 2 14:36:16 nsserver kernel: [<ffffffff81f0589e>] ? system_call_fastpath+0x16/0x1b
Dec 2 14:36:16 nsserver kernel: Mem-Info:
Dec 2 14:36:16 nsserver kernel: Node 0 DMA per-cpu:
Dec 2 14:36:16 nsserver kernel: CPU 0: hi: 0, btch: 1 usd: 0
Dec 2 14:36:16 nsserver kernel: CPU 1: hi: 0, btch: 1 usd: 0
Dec 2 14:36:16 nsserver kernel: Node 0 DMA32 per-cpu:
Dec 2 14:36:16 nsserver kernel: CPU 0: hi: 186, btch: 31 usd: 0
Dec 2 14:36:16 nsserver kernel: CPU 1: hi: 186, btch: 31 usd: 0
Dec 2 14:36:16 nsserver kernel: Node 0 Normal per-cpu:
Dec 2 14:36:16 nsserver kernel: CPU 0: hi: 186, btch: 31 usd: 0
Dec 2 14:36:16 nsserver kernel: CPU 1: hi: 186, btch: 31 usd: 0
Dec 2 14:36:16 nsserver kernel: active_anon:28548 inactive_anon:28612 isolated_anon:32
Dec 2 14:36:16 nsserver kernel: active_file:295 inactive_file:425 isolated_file:0
Dec 2 14:36:16 nsserver kernel: unevictable:0 dirty:0 writeback:120 unstable:0
Dec 2 14:36:16 nsserver kernel: free:501981 slab_reclaimable:117091 slab_unreclaimable:218928
Dec 2 14:36:16 nsserver kernel: mapped:85 shmem:14 pagetables:14224 bounce:0
Dec 2 14:36:16 nsserver kernel: free_cma:0
Dec 2 14:36:16 nsserver kernel: Node 0 DMA free:15432kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:256kB slab_unreclaimable:36kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Dec 2 14:36:16 nsserver kernel: lowmem_reserve[]: 0 3212 3915 3915
Dec 2 14:36:16 nsserver kernel: Node 0 DMA32 free:1724844kB min:6556kB low:8192kB high:9832kB active_anon:74240kB inactive_anon:74456kB active_file:740kB inactive_file:940kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3379436kB managed:3290224kB mlocked:0kB dirty:0kB writeback:260kB mapped:204kB shmem:52kB slab_reclaimable:419520kB slab_unreclaimable:625416kB kernel_stack:296448kB pagetables:29336kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:10388 all_unreclaimable? yes
Dec 2 14:36:16 nsserver kernel: lowmem_reserve[]: 0 0 702 702
Dec 2 14:36:16 nsserver kernel: Node 0 Normal free:267648kB min:1432kB low:1788kB high:2148kB active_anon:39952kB inactive_anon:39992kB active_file:440kB inactive_file:760kB unevictable:0kB isolated(anon):128kB isolated(file):0kB present:786432kB managed:719824kB mlocked:0kB dirty:0kB writeback:220kB mapped:136kB shmem:4kB slab_reclaimable:48588kB slab_unreclaimable:250260kB kernel_stack:5168kB pagetables:27560kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:7288 all_unreclaimable? yes
Dec 2 14:36:16 nsserver kernel: lowmem_reserve[]: 0 0 0 0
Dec 2 14:36:16 nsserver kernel: Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 1*64kB (U) 0*128kB 2*256kB (UE) 1*512kB (E) 2*1024kB (UE) 2*2048kB (UE) 2*4096kB (MR) = 15432kB
Dec 2 14:36:16 nsserver kernel: Node 0 DMA32: 140964*4kB (EM) 144937*8kB (EM) 95*16kB (M) 1*32kB (R) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1724904kB
Dec 2 14:36:16 nsserver kernel: Node 0 Normal: 57353*4kB (EM) 4777*8kB (EM) 4*16kB (M) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 267692kB
Dec 2 14:36:16 nsserver kernel: 1351 total pagecache pages
Dec 2 14:36:16 nsserver kernel: 568 pages in swap cache
Dec 2 14:36:16 nsserver kernel: Swap cache stats: add 909368, delete 908800, find 1781721897/1781923273
Dec 2 14:36:16 nsserver kernel: Free swap = 0kB
Dec 2 14:36:16 nsserver kernel: Total swap = 523260kB
Dec 2 14:36:16 nsserver kernel: 1045465 pages RAM
Dec 2 14:36:16 nsserver kernel: 0 pages HighMem/MovableOnly
Dec 2 14:36:16 nsserver kernel: 16652 pages reserved
Dec 2 14:36:16 nsserver kernel: 0 pages hwpoisoned
...
以下是完整的日志:https://pastebin.com/hrQM5GmB
答案1
如果我不得不猜测,apache 是伪随机的 :-),那么它就是 OOM 杀手的受害者。日志中的间隙可能是因为 syslog 无法正常工作 - 它可能已经崩溃或因某种原因被阻止,或者也被杀死了。我敢打赌,某个文件系统(/var?/temp?也许还有其他)或内存被填满了。你使用 tmpfs 吗?
程序可能会在文件仍处于打开状态时创建文件并从目录中删除引用。这样会保留磁盘上的分配空间,直到文件被进程关闭(显然重启后您将看不到该操作)。
如果没有可用空间将数据写入文件系统或无法分配更多内存,SSH 将出现故障 - 我不确定细节,但我知道它会在某些条件下停止工作。在这里,准确很重要。当您尝试打开与 ssh 的连接时 - TCP 连接是否打开?或者甚至没有打开?
归根结底,这是运行时问题,重启后可能很难诊断。只是内核在运行(机器正在响应 ping),但没有用户空间程序可以做任何事情。发生这种情况的原因有很多;有些可能性较大,有些可能性较小。不过,这不太可能是由于黑客攻击造成的。
解决方案是:如果再次发生这种情况,请在关闭机器之前查看控制台。如果可以,请保留打开的会话。开始在另一台机器上监控 CPU/磁盘/内存。将系统日志输出发送到外部机器 - 这样,即使本地文件系统已满,您也能够看到日志条目。
PS. 内核发生问题的可能性很小,但重启之前您肯定会从控制台上看到。