当指定资源限制和堆大小时,Java 应用程序在 Kubernetes 中被终止

当指定资源限制和堆大小时,Java 应用程序在 Kubernetes 中被终止

背景

一个 spring boot Java 应用程序部署在 kubernetes 集群中,每天都会被杀死几次。

我正在用于openjdk:8u181-jre我的 Java 应用程序。

Kubernetes 版本:v1.11.5

节点操作系统:CentOS 7.4 x64

JAVA_OPTS 是根据这篇关于让 java 应用程序读取 cgroup 限制的帖子设置的。 https://developers.redhat.com/blog/2017/03/14/java-inside-docker/

env:
- name: JAVA_OPTS
  value: " -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=2 -Xms512M"
resources:
  requests:
    memory: "4096Mi"
    cpu: "1"
  limits:
    memory: "4096Mi"
    cpu: "1"

集群中的节点具有 16GiB 内存。并且 pod 请求 4GiB。

错误

但应用程序有时会因 OOM 而终止。

系统事件:

Jan 16 23:29:58 localhost kernel: java invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=-998
Jan 16 23:29:58 localhost kernel: java cpuset=docker-aa640424ab783e441cbd26cd25b7817e5a36deff2f44b369153d7399020d1059.scope mems_allowed=0
Jan 16 23:29:58 localhost kernel: CPU: 7 PID: 19904 Comm: java Tainted: G           OE  ------------ T 3.10.0-693.2.2.el7.x86_64 #1
Jan 16 23:29:58 localhost kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Jan 16 23:29:58 localhost kernel: ffff880362700000 000000008b5adefc ffff88034078bc90 ffffffff816a3db1
Jan 16 23:29:58 localhost kernel: ffff88034078bd20 ffffffff8169f1a6 ffff8803b1642680 0000000000000001
Jan 16 23:29:58 localhost kernel: 0000000000000000 ffff880407eeaad0 ffff88034078bcd0 0000000000000046
Jan 16 23:29:58 localhost kernel: Call Trace:
Jan 16 23:29:58 localhost kernel: [<ffffffff816a3db1>] dump_stack+0x19/0x1b
Jan 16 23:29:58 localhost kernel: [<ffffffff8169f1a6>] dump_header+0x90/0x229
Jan 16 23:29:58 localhost kernel: [<ffffffff81185ee6>] ? find_lock_task_mm+0x56/0xc0
Jan 16 23:29:58 localhost kernel: [<ffffffff81186394>] oom_kill_process+0x254/0x3d0
Jan 16 23:29:58 localhost kernel: [<ffffffff811f52a6>] mem_cgroup_oom_synchronize+0x546/0x570
Jan 16 23:29:58 localhost kernel: [<ffffffff811f4720>] ? mem_cgroup_charge_common+0xc0/0xc0
Jan 16 23:29:58 localhost kernel: [<ffffffff81186c24>] pagefault_out_of_memory+0x14/0x90
Jan 16 23:29:58 localhost kernel: [<ffffffff8169d56e>] mm_fault_error+0x68/0x12b
Jan 16 23:29:58 localhost kernel: [<ffffffff816b0231>] __do_page_fault+0x391/0x450
Jan 16 23:29:58 localhost kernel: [<ffffffff810295da>] ? __switch_to+0x15a/0x510
Jan 16 23:29:58 localhost kernel: [<ffffffff816b03d6>] trace_do_page_fault+0x56/0x150
Jan 16 23:29:58 localhost kernel: [<ffffffff816afa6a>] do_async_page_fault+0x1a/0xd0
Jan 16 23:29:58 localhost kernel: [<ffffffff816ac578>] async_page_fault+0x28/0x30
Jan 16 23:29:58 localhost kernel: Task in /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice/docker-aa640424ab783e441cbd26cd25b7817e5a36deff2f44b369153d7399020d1059.scope killed as a result of limit of /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice
Jan 16 23:29:58 localhost kernel: memory: usage 4194304kB, limit 4194304kB, failcnt 7722
Jan 16 23:29:58 localhost kernel: memory+swap: usage 4194304kB, limit 9007199254740988kB, failcnt 0
Jan 16 23:29:58 localhost kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Jan 16 23:29:58 localhost kernel: Memory cgroup stats for /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Jan 16 23:29:58 localhost kernel: Memory cgroup stats for /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice/docker-58ff049ead2b1713e8a6c736b4637b64f8b6b5c9d1232101792b4d1e8cf03d6a.scope: cache:0KB rss:40KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:40KB inactive_file:0KB active_file:0KB unevictable:0KB
Jan 16 23:29:58 localhost kernel: Memory cgroup stats for /kubepods.slice/kubepods-podc4e5c355_196b_11e9_b6ba_00163e066499.slice/docker-aa640424ab783e441cbd26cd25b7817e5a36deff2f44b369153d7399020d1059.scope: cache:32KB rss:4194232KB rss_huge:3786752KB mapped_file:8KB swap:0KB inactive_anon:0KB active_anon:4194232KB inactive_file:0KB active_file:32KB unevictable:0KB
Jan 16 23:29:58 localhost kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Jan 16 23:29:58 localhost kernel: [19357]     0 19357      254        1       4        0          -998 pause
Jan 16 23:29:58 localhost kernel: [19485]     0 19485     1071      161       7        0          -998 sh
Jan 16 23:29:58 localhost kernel: [19497]     0 19497  2008713  1051013    2203        0          -998 java
Jan 16 23:29:58 localhost kernel: Memory cgroup out of memory: Kill process 31404 (java) score 6 or sacrifice child
Jan 16 23:29:58 localhost kernel: Killed process 19497 (java) total-vm:8034852kB, anon-rss:4188424kB, file-rss:15628kB, shmem-rss:0kB

我很困惑,由于 RAMFraction 设置为 2,因此堆大小应该限制为 2GiB(估计)。但容器被杀死了。:(

您能帮我找出挖掘这个错误的正确方式或方法吗?

答案1

你提到的文章是已更新两次有一些新的信息。有很多类似的博客文章也许有这样的工具java-buildpack-memory-calculator仍然有用。但总体结论是 Java 10 及后续版本最终将更适合用于在容器中运行。

相关内容