我的服务器信息
AWS Ubuntu 18.04
MongoDB 4.2.17
4 Cores
16 GRAM
该服务器只运行mongodb,并为mongodb分配10G内存
wiredTiger:
engineConfig:
cacheSizeGB: 10
但是这台机器每隔几天就会出现OOM的情况。
系统日志:
Oct 19 18:25:16 kernel: [8693043.690043] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/mongod.service,task=mongod,pid=26154,uid=111
Oct 19 18:25:16 kernel: [8693043.690370] Out of memory: Killed process 26154 (mongod) total-vm:17824108kB, anon-rss:15602768kB, file-rss:0kB, shmem-rss:0kB, UID:111 pgtables:32288kB oom_score_adj:0
Oct 19 18:25:16 kernel: [8693044.284593] oom_reaper: reaped process 26154 (mongod), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Mongodb Log 中发现一些:“serverStatus was very slow”
2022-10-19T17:00:00.117+0000 I INDEX [conn41] index build: done building index _id_ on ns game.small_discard_stream_20221020
2022-10-19T17:00:00.140+0000 I INDEX [conn41] index build: starting on game.small_discard_stream_20221020 properties: { v: 2, key: { game_id: 1 }, name: "game_id_1", ns: "game.small_discard_stream_20221020", background: true } using method: Hybrid
2022-10-19T17:00:00.140+0000 I INDEX [conn41] build may temporarily use up to 200 megabytes of RAM
2022-10-19T17:00:00.142+0000 I INDEX [conn41] index build: collection scan done. scanned 1 total records in 0 seconds
2022-10-19T17:00:00.143+0000 I INDEX [conn41] index build: inserted 1 keys from external sorter into index in 0 seconds
2022-10-19T17:00:00.145+0000 I INDEX [conn41] index build: done building index game_id_1 on ns game.small_discard_stream_20221020
2022-10-19T17:06:02.695+0000 I COMMAND [conn5068] command admin.$cmd command: isMaster { ismaster: 1, $clusterTime: { clusterTime: Timestamp(1666199156, 26), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, $db: "admin", $readPreference: { mode: "primary" } } numYields:0 reslen:696 locks:{} protocol:op_msg 448ms
2022-10-19T17:06:02.695+0000 I COMMAND [conn41] command game.small_discard_stream_20221020 command: listIndexes { listIndexes: "small_discard_stream_20221020", cursor: {}, lsid: { id: UUID("a6c6fd81-d9f3-400e-b7ab-41fa83ececd7") }, $clusterTime: { clusterTime: Timestamp(1666199161, 39), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, $db: "game", $readPreference: { mode: "primaryPreferred" } } numYields:0 reslen:454 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{} protocol:op_msg 494ms
2022-10-19T17:06:02.695+0000 I COMMAND [conn5066] command admin.$cmd command: isMaster { ismaster: 1, $clusterTime: { clusterTime: Timestamp(1666199157, 51), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, $db: "admin", $readPreference: { mode: "primary" } } numYields:0 reslen:696 locks:{} protocol:op_msg 448ms
2022-10-19T17:06:15.571+0000 I COMMAND [ftdc] serverStatus was very slow: { after basic: 1, after asserts: 1, after connections: 1, after electionMetrics: 3, after extra_info: 3, after flowControl: 3, after freeMonitoring: 3, after globalLock: 3, after locks: 3, after logicalSessionRecordCache: 3, after network: 3, after opLatencies: 3, after opReadConcernCounters: 3, after opcounters: 3, after opcountersRepl: 3, after oplogTruncation: 9, after repl: 107, after scramCache: 121, after security: 121, after storageEngine: 160, after tcmalloc: 193, after trafficRecording: 193, after transactions: 206, after transportSecurity: 215, after twoPhaseCommitCoordinator: 219, after wiredTiger: 682, at end: 3549 }2022-10-19T17:09:14.535+0000 I COMMAND [conn7423] command admin.$cmd command: isMaster { ismaster: 1, $clusterTime: { clusterTime: Timestamp(1666199158, 15), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, $db: "admin", $readPreference: { mode: "primary" } } numYields:0 reslen:696 locks:{} protocol:op_msg 159574ms
然后机器就没反应了,SSH、Ping也通不了,直到重启服务器才行。其实这台服务器没有太多的数据和查询。
如何避免 mongodb OOM?有人有想法吗?谢谢!
答案1
线路
Oct 19 18:25:16 kernel: [8693043.690370] Out of memory:
Killed process 26154 (mongod) total-vm:17824108kB, anon-rss:15602768kB,
file-rss:0kB, shmem-rss:0kB, UID:111 pgtables:32288kB oom_score_adj:0
明确指出 mongod 消耗的内存远远超过您允许它消耗的 10 GB。实际上大约是 15 GB,因此调用了 OOM killer - 操作系统只是耗尽了内存。
发生这种情况的原因是否在于 mongod 除了配置的 10 GB 之外还为一些辅助结构分配了更多空间,还是因为其他原因 - 这不是您当前问题的情况。