我的 Web 应用程序的 PostgreSQL 数据库经常出现故障

我的 Web 应用程序的 PostgreSQL 数据库经常出现故障

我有类似的网络应用程序

Nginx (proxy) + Tomcat (backend) + PostgreSQL (database).

该 Web 应用程序位于 Amazon Free Tier 实例上(http://aws.amazon.com/free/),PostgreSQL 经常每月崩溃 2 到 3 次。

以下是实例的日志:

[516661.377137] DMA free:2464kB min:80kB low:100kB high:120kB active_anon:5752kB inactive_anon:5900kB active_file:88kB inactive_file:164kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15868kB mlocked:0kB dirty:0kB writeback:0kB mapped:244kB shmem:248kB slab_reclaimable:8kB slab_unreclaimable:260kB kernel_stack:60kB pagetables:40kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:384 all_unreclaimable? yes
[516661.377273] lowmem_reserve[]: 0 594 594 594
[516661.377293] Normal free:2976kB min:3076kB low:3844kB high:4612kB active_anon:289468kB inactive_anon:289664kB active_file:684kB inactive_file:1208kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:608584kB mlocked:0kB dirty:0kB writeback:0kB mapped:11028kB shmem:13580kB slab_reclaimable:2144kB slab_unreclaimable:5204kB kernel_stack:1816kB pagetables:3824kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2848 all_unreclaimable? yes
[516661.377328] lowmem_reserve[]: 0 0 0 0
[516661.377344] DMA: 43*4kB 31*8kB 48*16kB 18*32kB 9*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2468kB
[516661.377380] Normal: 192*4kB 14*8kB 1*16kB 1*32kB 6*64kB 7*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2976kB
[516661.377416] 3992 total pagecache pages
[516661.377421] 0 pages in swap cache
[516661.377427] Swap cache stats: add 0, delete 0, find 0/0
[516661.377434] Free swap  = 0kB
[516661.377439] Total swap = 0kB
[516661.379976] 157439 pages RAM
[516661.379990] 0 pages HighMem
[516661.379995] 3185 pages reserved
[516661.380004] 18287 pages shared
[516661.380040] 149569 pages non-shared
[516661.380047] Out of memory: kill process 18126 (postmaster) score 26011 or a child
[516661.380058] Killed process 18126 (postmaster) vsz:104044kB, anon-rss:2476kB, file-rss:7152kB
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.376890] postmaster invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.376908] postmaster cpuset=/ mems_allowed=0
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.376916] Pid: 10506, comm: postmaster Tainted: G        W   2.6.35.11-83.9.amzn1.i686 #1
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.376924] Call Trace:
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.376938]  [<c10a0ce5>] dump_header.clone.1+0x65/0x180
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.376948]  [<c116a899>] ? ___ratelimit+0x89/0x110
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.376957]  [<c10a0e53>] oom_kill_process.clone.0+0x53/0x130
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.376965]  [<c10a100a>] __out_of_memory+0xda/0x140
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.376973]  [<c10a10c2>] out_of_memory+0x52/0xc0
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.376982]  [<c10a3c62>] __alloc_pages_nodemask+0x582/0x5a0
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.376992]  [<c10a5b92>] __do_page_cache_readahead+0xd2/0x1f0
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377000]  [<c10a5cd1>] ra_submit+0x21/0x30
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377008]  [<c109fc82>] filemap_fault+0x392/0x3c0
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377016]  [<c10b3a97>] __do_fault+0x47/0x530
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377024]  [<c10b585e>] handle_mm_fault+0x19e/0xdc0
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377034]  [<c12aefd0>] ? do_page_fault+0x0/0x400
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377043]  [<c12af0fc>] do_page_fault+0x12c/0x400
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377051]  [<c10e219d>] ? sys_select+0x3d/0xb0
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377060]  [<c12aefd0>] ? do_page_fault+0x0/0x400
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377069]  [<c12ac637>] error_code+0x73/0x78
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377076] Mem-Info:
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377081] DMA per-cpu:
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377086] CPU    0: hi:    0, btch:   1 usd:   0
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377092] Normal per-cpu:
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377098] CPU    0: hi:  186, btch:  31 usd:  30
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377107] active_anon:73805 inactive_anon:73891 isolated_anon:0
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377108]  active_file:193 inactive_file:343 isolated_file:0
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377109]  unevictable:0 dirty:0 writeback:0 unstable:0
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377110]  free:1360 slab_reclaimable:538 slab_unreclaimable:1366
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377111]  mapped:2818 shmem:3457 pagetables:966 bounce:0
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377137] DMA free:2464kB min:80kB low:100kB high:120kB active_anon:5752kB inactive_anon:5900kB active_file:88kB inactive_file:164kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15868kB mlocked:0kB dirty:0kB writeback:0kB mapped:244kB shmem:248kB slab_reclaimable:8kB slab_unreclaimable:260kB kernel_stack:60kB pagetables:40kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:384 all_unreclaimable? yes
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377273] lowmem_reserve[]: 0 594 594 594
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377293] Normal free:2976kB min:3076kB low:3844kB high:4612kB active_anon:289468kB inactive_anon:289664kB active_file:684kB inactive_file:1208kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:608584kB mlocked:0kB dirty:0kB writeback:0kB mapped:11028kB shmem:13580kB slab_reclaimable:2144kB slab_unreclaimable:5204kB kernel_stack:1816kB pagetables:3824kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2848 all_unreclaimable? yes
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377328] lowmem_reserve[]: 0 0 0 0
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377344] DMA: 43*4kB 31*8kB 48*16kB 18*32kB 9*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2468kB
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377380] Normal: 192*4kB 14*8kB 1*16kB 1*32kB 6*64kB 7*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2976kB
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377416] 3992 total pagecache pages
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377421] 0 pages in swap cache
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377427] Swap cache stats: add 0, delete 0, find 0/0
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377434] Free swap  = 0kB
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.377439] Total swap = 0kB
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.379976] 157439 pages RAM
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.379990] 0 pages HighMem
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.379995] 3185 pages reserved
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.380004] 18287 pages shared
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.380040] 149569 pages non-shared
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.380047] Out of memory: kill process 18126 (postmaster) score 26011 or a child
Jan  4 20:50:32 ip-10-227-10-239 kernel: [516661.380058] Killed process 18126 (postmaster) vsz:104044kB, anon-rss:2476kB, file-rss:7152kB

另外,在 Amazon CloudWatch Monitor 的网络输出流量中,我看到了最大峰值负载。

这是什么问题?有人遇到过这样的事情吗?

PS:这是内存的postgres.conf:

    # - Memory -

shared_buffers = 80MB                   # min 128kB
                                        # (change requires restart)
#temp_buffers = 8MB                     # min 800kB
#max_prepared_transactions = 0          # zero disables the feature
                                        # (change requires restart)
# Note:  Increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
# It is not advisable to set max_prepared_transactions nonzero unless you
# actively intend to use prepared transactions.
#work_mem = 1MB                         # min 64kB
#maintenance_work_mem = 16MB            # min 1MB
#max_stack_depth = 2MB                  # min 100kB

# - Kernel Resource Usage -

#max_files_per_process = 1000           # min 25
                                        # (change requires restart)
#shared_preload_libraries = ''          # (change requires restart)

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0ms                # 0-100 milliseconds
#vacuum_cost_page_hit = 1               # 0-10000 credits
#vacuum_cost_page_miss = 10             # 0-10000 credits
#vacuum_cost_page_dirty = 20            # 0-10000 credits
#vacuum_cost_limit = 200                # 1-10000 credits

# - Background Writer -

#bgwriter_delay = 200ms                 # 10-10000ms between rounds
#bgwriter_lru_maxpages = 100            # 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0          # 0-10.0 multipler on buffers scanned/round

# - Asynchronous Behavior -

#effective_io_concurrency = 1           # 1-1000. 0 disables prefetching

答案1

我想日志已经很清楚了:Linux 内核终止了您的 PostgreSQL 实例,因为它的内存不足。这是 Linux 内核的一个标准功能 - 每当内存不足时,它不会销毁所有应用程序,而是选择其中一个应用程序终止。更多信息:http://linux-mm.org/OOM_Killer

您可能需要检查 postgresq.conf 中的内存设置,默认位于:/var/lib/pgsql/data/postgresql.conf

请注意,内存(缓冲区)大小是以页面大小为单位的(4K 或 8K,取决于情况)。

相关内容