背景
一个 spring boot Java 应用程序部署在 kubernetes 集群中,每天都会被杀死几次。
我正在用于openjdk:8u181-jre
我的 Java 应用程序。
Kubernetes 版本:v1.11.5
节点操作系统:CentOS 7.4 x64
JAVA_OPTS 是根据这篇关于让 java 应用程序读取 cgroup 限制的帖子设置的。 https://developers.redhat.com/blog/2017/03/14/java-inside-docker/
env:
- name: JAVA_OPTS
value: " -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 -Xms512M"
resources:
requests:
memory: "4096Mi"
cpu: "1"
limits:
memory: "4096Mi"
cpu: "1"
集群中的节点具有 16GiB 内存。并且 pod 请求 4GiB。
错误
但应用程序有时会因 OOM 而终止。
系统事件:
Jan 16 23:29:58 localhost kernel: java invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=-998
Jan 16 23:29:58 localhost kernel: java cpuset=docker-aa640424ab783e441cbd26cd25b7817e5a36deff2f44b369153d7399020d1059.scope mems_allowed=0
Jan 16 23:29:58 localhost kernel: CPU: 7 PID: 19904 Comm: java Tainted: G OE ------------ T 3.10.0-693.2.2.el7.x86_64 #1
Jan 16 23:29:58 localhost kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Jan 16 23:29:58 localhost kernel: ffff880362700000 000000008b5adefc ffff88034078bc90 ffffffff816a3db1
Jan 16 23:29:58 localhost kernel: ffff88034078bd20 ffffffff8169f1a6 ffff8803b1642680 0000000000000001
Jan 16 23:29:58 localhost kernel: 0000000000000000 ffff880407eeaad0 ffff88034078bcd0 0000000000000046
Jan 16 23:29:58 localhost kernel: Call Trace:
Jan 16 23:29:58 localhost kernel: [<ffffffff816a3db1>] dump_stack+0x19/0x1b
Jan 16 23:29:58 localhost kernel: [<ffffffff8169f1a6>] dump_header+0x90/0x229
Jan 16 23:29:58 localhost kernel: [<ffffffff81185ee6>] ? find_lock_task_mm+0x56/0xc0
Jan 16 23:29:58 localhost kernel: [<ffffffff81186394>] oom_kill_process+0x254/0x3d0
Jan 16 23:29:58 localhost kernel: [<ffffffff811f52a6>] mem_cgroup_oom_synchronize+0x546/0x570
Jan 16 23:29:58 localhost kernel: [<ffffffff811f4720>] ? mem_cgroup_charge_common+0xc0/0xc0
Jan 16 23:29:58 localhost kernel: [<ffffffff81186c24>] pagefault_out_of_memory+0x14/0x90
Jan 16 23:29:58 localhost kernel: [<ffffffff8169d56e>] mm_fault_error+0x68/0x12b
Jan 16 23:29:58 localhost kernel: [<ffffffff816b0231>] __do_page_fault+0x391/0x450
Jan 16 23:29:58 localhost kernel: [<ffffffff810295da>] ? __switch_to+0x15a/0x510
Jan 16 23:29:58 localhost kernel: [<ffffffff816b03d6>] trace_do_page_fault+0x56/0x150
Jan 16 23:29:58 localhost kernel: [<ffffffff816afa6a>] do_async_page_fault+0x1a/0xd0
Jan 16 23:29:58 localhost kernel: [<ffffffff816ac578>] async_page_fault+0x28/0x30
Jan 16 23:29:58 localhost kernel: Task in /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice/docker-aa640424ab783e441cbd26cd25b7817e5a36deff2f44b369153d7399020d1059.scope killed as a result of limit of /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice
Jan 16 23:29:58 localhost kernel: memory: usage 4194304kB, limit 4194304kB, failcnt 7722
Jan 16 23:29:58 localhost kernel: memory+swap: usage 4194304kB, limit 9007199254740988kB, failcnt 0
Jan 16 23:29:58 localhost kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Jan 16 23:29:58 localhost kernel: Memory cgroup stats for /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Jan 16 23:29:58 localhost kernel: Memory cgroup stats for /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice/docker-58ff049ead2b1713e8a6c736b4637b64f8b6b5c9d1232101792b4d1e8cf03d6a.scope: cache:0KB rss:40KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:40KB inactive_file:0KB active_file:0KB unevictable:0KB
Jan 16 23:29:58 localhost kernel: Memory cgroup stats for /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice/docker-aa640424ab783e441cbd26cd25b7817e5a36deff2f44b369153d7399020d1059.scope: cache:32KB rss:4194232KB rss_huge:3786752KB mapped_file:8KB swap:0KB inactive_anon:0KB active_anon:4194232KB inactive_file:0KB active_file:32KB unevictable:0KB
Jan 16 23:29:58 localhost kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Jan 16 23:29:58 localhost kernel: [19357] 0 19357 254 1 4 0 -998 pause
Jan 16 23:29:58 localhost kernel: [19485] 0 19485 1071 161 7 0 -998 sh
Jan 16 23:29:58 localhost kernel: [19497] 0 19497 2008713 1051013 2203 0 -998 java
Jan 16 23:29:58 localhost kernel: Memory cgroup out of memory: Kill process 31404 (java) score 6 or sacrifice child
Jan 16 23:29:58 localhost kernel: Killed process 19497 (java) total-vm:8034852kB, anon-rss:4188424kB, file-rss:15628kB, shmem-rss:0kB
我很困惑,由于 RAMFraction 设置为 2,因此堆大小应该限制为 2GiB(估计)。但容器被杀死了。:(
您能帮我找出挖掘这个错误的正确方式或方法吗?
答案1
你提到的文章是已更新两次有一些新的信息。有很多类似的博客文章也许有这样的工具java-buildpack-memory-calculator
仍然有用。但总体结论是 Java 10 及后续版本最终将更适合用于在容器中运行。