我遇到了一个奇怪的问题。每周有几次,我在 Amazon Lightsail 上运行的网站会崩溃并成为 IO 限制(发生这种情况时我通常无法使用 ssh),并且服务器内存不足。这是一个流量不大的网站,每天的页面浏览量不到 1000 次,并且它位于 varnish 服务器后面。我无法运行命令top
来查看问题,因为服务器成为 IO 限制。我认为它是由于内存不足而导致的抖动。服务器有 2GB 的 RAM,我认为应该足够了。发生这种情况时,网站负载并不重。它只是有点出乎意料。我今天能够捕获它并运行top
,看到 CPU 负载平均值在 30 到 40 之间。这些情况可能会持续约 30 分钟到一个小时,然后就会像它们来时一样迅速消失。
我思考大型 mysql 查询的问题,但我不确定。以下是每次运行二进制文件时 top 所显示的内容:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1946 mysql 20 0 1502.5m 615.4m 27.5m S 0.3 30.8 0:35.33 mysqld.bin
1502.5 MB VIRT 数字和 30.8% 的内存使用率对我来说似乎太高了。
我今天遇到了一个问题,查看了系统日志,发现了这一点:
3748 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901476] .php-fpm.bin invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
3749 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901481] .php-fpm.bin cpuset=/ mems_allowed=0
3750 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901486] CPU: 0 PID: 13816 Comm: .php-fpm.bin Not tainted 4.4.0-1114-aws #127-Ubuntu
3751 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901487] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
3752 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901489] 0000000000000286 b444799d441c3af0 ffff88003ec079e0 ffffffff8140accb
3753 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901491] ffff88003ec07b98 ffff880021027000 ffff88003ec07a50 ffffffff81216c47
3754 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901493] 0000000000000000 ffff88007a349f00 ffff88007b59e200 ffff88003ec07a38
3755 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901496] Call Trace:
3756 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901502] [<ffffffff8140accb>] dump_stack+0x6d/0x92
3757 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901507] [<ffffffff81216c47>] dump_header+0x5a/0x1c3
3758 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901512] [<ffffffff813a25f1>] ? apparmor_capable+0x131/0x1b0
3759 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901517] [<ffffffff8119b35b>] oom_kill_process+0x20b/0x3d0
3760 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901519] [<ffffffff8119b768>] out_of_memory+0x1f8/0x470
3761 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901523] [<ffffffff811a17d3>] __alloc_pages_slowpath.constprop.89+0x943/0xaf0
3762 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901526] [<ffffffff811a1c1f>] __alloc_pages_nodemask+0x29f/0x2b0
3763 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901529] [<ffffffff811ed00c>] alloc_pages_current+0x8c/0x110
3764 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901531] [<ffffffff8119748b>] __page_cache_alloc+0xab/0xc0
3765 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901534] [<ffffffff81199dc0>] filemap_fault+0x160/0x440
3766 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901536] [<ffffffff812b07b6>] ext4_filemap_fault+0x36/0x50
3767 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901539] [<ffffffff811c7537>] __do_fault+0x77/0x110
3768 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901542] [<ffffffff811cb4d9>] handle_mm_fault+0x1259/0x1b80
3769 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901546] [<ffffffff8106de84>] __do_page_fault+0x1a4/0x410
3770 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901548] [<ffffffff8106e112>] do_page_fault+0x22/0x30
3771 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901552] [<ffffffff81849b38>] page_fault+0x28/0x30
3772 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901568] Mem-Info:
3773 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901572] active_anon:463535 inactive_anon:6034 isolated_anon:0
3774 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901572] active_file:247 inactive_file:446 isolated_file:0
3775 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901572] unevictable:913 dirty:0 writeback:0 unstable:0
3776 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901572] slab_reclaimable:5581 slab_unreclaimable:7888
3777 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901572] mapped:22790 shmem:30428 pagetables:3630 bounce:0
3778 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901572] free:13737 free_pcp:59 free_cma:0
3779 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901576] Node 0 DMA free:8212kB min:352kB low:440kB high:528kB active_anon:6976kB inactive_anon:48kB active_file:16kB inactive_file:128kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:372kB shmem:512kB slab_reclaimable:32kB slab_unreclaimable:148kB kernel_stack:0kB pagetables:176kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:876 all_unreclaimable? yes
3780 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901582] lowmem_reserve[]: 0 1967 1967 1967 1967
3781 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901584] Node 0 DMA32 free:46736kB min:44700kB low:55872kB high:67048kB active_anon:1847164kB inactive_anon:24088kB active_file:972kB inactive_file:1656kB unevictable:3652kB isolated(anon):0kB isolated(file):0kB present:2080768kB managed:2030696kB mlocked:3652kB dirty:0kB writeback:0kB mapped:90788kB shmem:121200kB slab_reclaimable:22292kB slab_unreclaimable:31404kB kernel_stack:5696kB pagetables:143 44kB unstable:0kB bounce:0kB free_pcp:236kB local_pcp:236kB free_cma:0kB writeback_tmp:0kB pages_scanned:20120 all_unreclaimable? yes
3782 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901589] lowmem_reserve[]: 0 0 0 0 0
3783 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901592] Node 0 DMA: 3*4kB (UM) 3*8kB (UM) 1*16kB (M) 5*32kB (UME) 3*64kB (UME) 3*128kB (UME) 3*256kB (UME) 3*512kB (UME) 1*1024kB (E) 2*2048kB (ME) 0*4096kB = 8212kB
3784 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901601] Node 0 DMA32: 366*4kB (UMEH) 809*8kB (UMEH) 789*16kB (UEH) 322*32kB (UMEH) 142*64kB (UMEH) 43*128kB (UMEH) 5*256kB (ME) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 46736kB
3785 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901610] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
3786 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901611] 31732 total pagecache pages
3787 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901612] 0 pages in swap cache
3788 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901614] Swap cache stats: add 0, delete 0, find 0/0
3789 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901614] Free swap = 0kB
3790 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901615] Total swap = 0kB
3791 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901616] 524189 pages RAM
3792 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901617] 0 pages HighMem/MovableOnly
3793 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901618] 12540 pages reserved
3794 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901619] 0 pages cma reserved
3795 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901620] 0 pages hwpoisoned
3796 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901621] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
3797 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901625] [ 428] 0 428 23692 154 17 3 0 0 lvmetad
3798 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901628] [ 458] 0 458 10902 860 22 3 0 -1000 systemd-udevd
3799 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901630] [ 469] 0 469 3050 1149 13 3 0 0 haveged
3800 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901632] [ 472] 0 472 9561 857 22 3 0 0 systemd-journal
3801 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901634] [ 689] 100 689 25080 104 18 3 0 0 systemd-timesyn
3802 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901636] [ 1043] 0 1043 4030 471 13 3 0 0 dhclient
3803 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901638] [ 1242] 0 1242 1098 310 8 3 0 0 acpid
3804 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901640] [ 1244] 0 1244 100456 3563 41 4 0 0 amazon-ssm-agen
3805 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901642] [ 1250] 0 1250 159000 581 32 3 0 0 lxcfs
3806 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901644] [ 1261] 0 1261 68965 201 37 3 0 0 accounts-daemon
3807 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901647] [ 1265] 107 1265 10721 131 26 3 0 -900 dbus-daemon
3808 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901649] [ 1273] 0 1273 1304 29 8 3 0 0 iscsid
3809 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901651] [ 1274] 0 1274 1429 877 9 3 0 -17 iscsid
3810 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901653] [ 1317] 104 1317 64097 439 28 3 0 0 rsyslogd
3811 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901655] [ 1320] 0 1320 6510 370 18 3 0 0 atd
3812 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901656] [ 1324] 0 1324 7251 326 19 3 0 0 cron
3813 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901658] [ 1326] 0 1326 7153 187 18 3 0 0 systemd-logind
3814 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901660] [ 1349] 0 1349 43655 2454 52 3 0 0 unattended-upgr
3815 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901662] [ 1356] 0 1356 3342 227 11 3 0 0 mdadm
3816 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901664] [ 1405] 0 1405 69271 205 39 3 0 0 polkitd
3817 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901666] [ 1522] 0 1522 26724 526 53 3 0 0 sendmail-mta
3818 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901668] [ 2479] 0 2479 3596 561 12 5 0 0 gonit
3819 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901670] [ 2519] 0 2519 16377 379 35 3 0 -1000 sshd
3820 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901672] [11494] 1000 11494 11296 465 27 3 0 0 systemd
3821 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901674] [11499] 1000 11499 15834 471 34 3 0 0 (sd-pam)
3822 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901676] [ 537] 0 537 75724 1259 62 3 0 0 .php-fpm.bin
3823 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901678] [ 5327] 0 5327 51057 8905 101 3 0 0 httpd.bin
3824 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901680] [17152] 1 17152 334748 40721 260 4 0 0 httpd.bin
3825 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901682] [ 990] 1 990 105420 25430 120 3 0 0 .php-fpm.bin
3826 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901684] [ 991] 1 991 158017 16922 227 3 0 0 .php-fpm.bin
3827 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901686] [ 3368] 1 3368 103879 24095 118 3 0 0 .php-fpm.bin
3828 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901687] [ 8528] 1 8528 104911 25139 119 3 0 0 .php-fpm.bin
3829 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901689] [13391] 1 13391 334623 43165 266 4 0 0 httpd.bin
3830 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901691] [13707] 0 13707 26724 598 52 3 0 0 sendmail-mta
3831 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901693] [13815] 1 13815 79148 14627 91 3 0 0 .php-fpm.bin
3832 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901695] [13816] 1 13816 79423 14907 91 3 0 0 .php-fpm.bin
3833 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901697] [13817] 1 13817 79423 14586 91 3 0 0 .php-fpm.bin
3834 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901699] [13818] 1 13818 79147 14479 91 3 0 0 .php-fpm.bin
3835 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901701] [13819] 1 13819 79423 14589 91 3 0 0 .php-fpm.bin
3836 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901703] [13823] 1 13823 79407 14762 91 3 0 0 .php-fpm.bin
3837 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901705] [13989] 0 13989 1125 230 8 3 0 0 ctl.sh
3838 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901707] [13992] 0 13992 1125 210 8 3 0 0 ctl.sh
3839 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901709] [13993] 0 13993 1125 237 8 3 0 0 ctl.sh
3840 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901711] [13994] 0 13994 1125 224 8 3 0 0 ctl.sh
3841 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901713] [13995] 0 13995 1125 254 7 3 0 0 ctl.sh
3842 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901714] [13996] 0 13996 1125 261 8 3 0 0 ctl.sh
3843 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901716] [14006] 0 14006 1125 396 8 3 0 0 mysqld_safe
3844 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901718] [14010] 0 14010 1125 387 7 3 0 0 mysqld_safe
3845 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901720] [14012] 0 14012 1125 384 8 3 0 0 mysqld_safe
3846 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901722] [14014] 0 14014 1125 388 8 3 0 0 mysqld_safe
3847 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901724] [14552] 0 14552 75229 3448 37 5 0 -900 snapd
3848 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901726] [16057] 0 16057 92857 70693 180 3 0 0 mysqld.bin
3849 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901728] [16066] 0 16066 92857 70677 179 3 0 0 mysqld.bin
3850 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901730] [16067] 0 16067 92857 70718 179 3 0 0 mysqld.bin
3851 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901732] [16068] 0 16068 107007 70665 180 3 0 0 mysqld.bin
3852 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901734] [16097] 1 16097 51090 1094 73 3 0 0 httpd.bin
3853 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901736] [16110] 1 16110 51057 1191 73 3 0 0 httpd.bin
3854 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901738] [16111] 1 16111 51057 1061 73 3 0 0 httpd.bin
3855 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901740] [16136] 0 16136 9756 46 20 3 0 0 sshd
3856 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901741] [16155] 0 16155 1821 83 8 3 0 0 sleep
3857 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901743] [16156] 0 16156 1821 35 10 3 0 0 sleep
3858 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901745] [16157] 0 16157 1821 37 9 3 0 0 sleep
3859 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901747] [16158] 0 16158 1821 52 8 3 0 0 sleep
3860 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901749] [16159] 0 16159 1821 41 9 3 0 0 sleep
3861 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901751] [16160] 0 16160 1821 36 9 3 0 0 sleep
3862 Oct 7 17:41:54 ip-172-26-1-34 kernel: [1128654.901753] Out of memory: Kill process 16067 (mysqld.bin) score 134 or sacrifice child
我不确定该如何解释这一点。有人能帮我找到导致这台机器崩溃的原因吗?
答案1
另外,你似乎没有交换文件?所以我要做的第一件事就是添加一个交换文件。(非常临时的解决方案,可能会为你争取一些时间)真正的问题是这种情况是什么时候开始发生的,为什么?使用sar -f /var/log/sa/saXX
查找过去的事件,看看是否可以关联时间。我相信你可能是对的,你可能正在运行一个 sqlquery,它可能正在查看整个表空间。自从这种情况开始发生以来,是否引入了任何 SQL 查询?