我有类似的网络应用程序
Nginx (proxy) + Tomcat (backend) + PostgreSQL (database).
该 Web 应用程序位于 Amazon Free Tier 实例上(http://aws.amazon.com/free/),PostgreSQL 经常每月崩溃 2 到 3 次。
以下是实例的日志:
[516661.377137] DMA free:2464kB min:80kB low:100kB high:120kB active_anon:5752kB inactive_anon:5900kB active_file:88kB inactive_file:164kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15868kB mlocked:0kB dirty:0kB writeback:0kB mapped:244kB shmem:248kB slab_reclaimable:8kB slab_unreclaimable:260kB kernel_stack:60kB pagetables:40kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:384 all_unreclaimable? yes
[516661.377273] lowmem_reserve[]: 0 594 594 594
[516661.377293] Normal free:2976kB min:3076kB low:3844kB high:4612kB active_anon:289468kB inactive_anon:289664kB active_file:684kB inactive_file:1208kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:608584kB mlocked:0kB dirty:0kB writeback:0kB mapped:11028kB shmem:13580kB slab_reclaimable:2144kB slab_unreclaimable:5204kB kernel_stack:1816kB pagetables:3824kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2848 all_unreclaimable? yes
[516661.377328] lowmem_reserve[]: 0 0 0 0
[516661.377344] DMA: 43*4kB 31*8kB 48*16kB 18*32kB 9*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2468kB
[516661.377380] Normal: 192*4kB 14*8kB 1*16kB 1*32kB 6*64kB 7*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2976kB
[516661.377416] 3992 total pagecache pages
[516661.377421] 0 pages in swap cache
[516661.377427] Swap cache stats: add 0, delete 0, find 0/0
[516661.377434] Free swap = 0kB
[516661.377439] Total swap = 0kB
[516661.379976] 157439 pages RAM
[516661.379990] 0 pages HighMem
[516661.379995] 3185 pages reserved
[516661.380004] 18287 pages shared
[516661.380040] 149569 pages non-shared
[516661.380047] Out of memory: kill process 18126 (postmaster) score 26011 or a child
[516661.380058] Killed process 18126 (postmaster) vsz:104044kB, anon-rss:2476kB, file-rss:7152kB
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376890] postmaster invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376908] postmaster cpuset=/ mems_allowed=0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376916] Pid: 10506, comm: postmaster Tainted: G W 2.6.35.11-83.9.amzn1.i686 #1
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376924] Call Trace:
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376938] [<c10a0ce5>] dump_header.clone.1+0x65/0x180
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376948] [<c116a899>] ? ___ratelimit+0x89/0x110
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376957] [<c10a0e53>] oom_kill_process.clone.0+0x53/0x130
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376965] [<c10a100a>] __out_of_memory+0xda/0x140
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376973] [<c10a10c2>] out_of_memory+0x52/0xc0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376982] [<c10a3c62>] __alloc_pages_nodemask+0x582/0x5a0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.376992] [<c10a5b92>] __do_page_cache_readahead+0xd2/0x1f0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377000] [<c10a5cd1>] ra_submit+0x21/0x30
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377008] [<c109fc82>] filemap_fault+0x392/0x3c0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377016] [<c10b3a97>] __do_fault+0x47/0x530
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377024] [<c10b585e>] handle_mm_fault+0x19e/0xdc0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377034] [<c12aefd0>] ? do_page_fault+0x0/0x400
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377043] [<c12af0fc>] do_page_fault+0x12c/0x400
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377051] [<c10e219d>] ? sys_select+0x3d/0xb0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377060] [<c12aefd0>] ? do_page_fault+0x0/0x400
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377069] [<c12ac637>] error_code+0x73/0x78
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377076] Mem-Info:
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377081] DMA per-cpu:
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377086] CPU 0: hi: 0, btch: 1 usd: 0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377092] Normal per-cpu:
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377098] CPU 0: hi: 186, btch: 31 usd: 30
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377107] active_anon:73805 inactive_anon:73891 isolated_anon:0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377108] active_file:193 inactive_file:343 isolated_file:0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377109] unevictable:0 dirty:0 writeback:0 unstable:0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377110] free:1360 slab_reclaimable:538 slab_unreclaimable:1366
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377111] mapped:2818 shmem:3457 pagetables:966 bounce:0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377137] DMA free:2464kB min:80kB low:100kB high:120kB active_anon:5752kB inactive_anon:5900kB active_file:88kB inactive_file:164kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15868kB mlocked:0kB dirty:0kB writeback:0kB mapped:244kB shmem:248kB slab_reclaimable:8kB slab_unreclaimable:260kB kernel_stack:60kB pagetables:40kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:384 all_unreclaimable? yes
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377273] lowmem_reserve[]: 0 594 594 594
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377293] Normal free:2976kB min:3076kB low:3844kB high:4612kB active_anon:289468kB inactive_anon:289664kB active_file:684kB inactive_file:1208kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:608584kB mlocked:0kB dirty:0kB writeback:0kB mapped:11028kB shmem:13580kB slab_reclaimable:2144kB slab_unreclaimable:5204kB kernel_stack:1816kB pagetables:3824kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2848 all_unreclaimable? yes
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377328] lowmem_reserve[]: 0 0 0 0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377344] DMA: 43*4kB 31*8kB 48*16kB 18*32kB 9*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2468kB
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377380] Normal: 192*4kB 14*8kB 1*16kB 1*32kB 6*64kB 7*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2976kB
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377416] 3992 total pagecache pages
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377421] 0 pages in swap cache
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377427] Swap cache stats: add 0, delete 0, find 0/0
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377434] Free swap = 0kB
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.377439] Total swap = 0kB
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.379976] 157439 pages RAM
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.379990] 0 pages HighMem
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.379995] 3185 pages reserved
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.380004] 18287 pages shared
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.380040] 149569 pages non-shared
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.380047] Out of memory: kill process 18126 (postmaster) score 26011 or a child
Jan 4 20:50:32 ip-10-227-10-239 kernel: [516661.380058] Killed process 18126 (postmaster) vsz:104044kB, anon-rss:2476kB, file-rss:7152kB
另外,在 Amazon CloudWatch Monitor 的网络输出流量中,我看到了最大峰值负载。
这是什么问题?有人遇到过这样的事情吗?
PS:这是内存的postgres.conf:
# - Memory -
shared_buffers = 80MB # min 128kB
# (change requires restart)
#temp_buffers = 8MB # min 800kB
#max_prepared_transactions = 0 # zero disables the feature
# (change requires restart)
# Note: Increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
# It is not advisable to set max_prepared_transactions nonzero unless you
# actively intend to use prepared transactions.
#work_mem = 1MB # min 64kB
#maintenance_work_mem = 16MB # min 1MB
#max_stack_depth = 2MB # min 100kB
# - Kernel Resource Usage -
#max_files_per_process = 1000 # min 25
# (change requires restart)
#shared_preload_libraries = '' # (change requires restart)
# - Cost-Based Vacuum Delay -
#vacuum_cost_delay = 0ms # 0-100 milliseconds
#vacuum_cost_page_hit = 1 # 0-10000 credits
#vacuum_cost_page_miss = 10 # 0-10000 credits
#vacuum_cost_page_dirty = 20 # 0-10000 credits
#vacuum_cost_limit = 200 # 1-10000 credits
# - Background Writer -
#bgwriter_delay = 200ms # 10-10000ms between rounds
#bgwriter_lru_maxpages = 100 # 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on buffers scanned/round
# - Asynchronous Behavior -
#effective_io_concurrency = 1 # 1-1000. 0 disables prefetching
答案1
我想日志已经很清楚了:Linux 内核终止了您的 PostgreSQL 实例,因为它的内存不足。这是 Linux 内核的一个标准功能 - 每当内存不足时,它不会销毁所有应用程序,而是选择其中一个应用程序终止。更多信息:http://linux-mm.org/OOM_Killer
您可能需要检查 postgresq.conf 中的内存设置,默认位于:/var/lib/pgsql/data/postgresql.conf
请注意,内存(缓冲区)大小是以页面大小为单位的(4K 或 8K,取决于情况)。