正在运行的容器被自动删除。
我们在 Ubuntu 服务器上运行了一个大型容器,突然间这个容器消失了
- 操作系统:版本=16.04.5 LTS(Xenial Xerus)
- Docker:版本 18.06.1-ce,内部版本 e68fc7a
- 存储驱动程序:aufs
- 观察:我们注意到那天主机的内存利用率很高,几乎达到 100%,而且是在容器消失事件发生前 30 分钟开始的
分析过程中从各种来源捕获的日志如下
apport.log
:
ERROR: apport (pid 195722) Thu Jan 10 01:26:24 2019: 2 18446744073709551615, dump mode 1
ERROR: apport (pid 195722) Thu Jan 10 01:26:24 2019: ignoring implausibly big core limit, treating as unlimited
ERROR: apport (pid 195722) Thu Jan 10 01:26:25 2019: executable: /sbin/auplink (command line “auplink /var/lib/docker/aufs/mnt/ad9ab9225c557c10acf97aab3ae2f57fe56f0ad2fbae1798485b48b627a1f198 flush”)
journel.log
:
Jan 10 01:26:23 xxxxxx dockerd[5506]: time=“2019-01-10T02:26:23-05:00” level=info msg=“shim reaped” id=8bdd4adeda1b84cc59cbd474d58a19bd2a8e97e63978e0d58f4bc7f3ce3a37e3
Jan 10 01:26:23 xxxxxxx dockerd[5506]: time=“2019-01-10T02:26:23.683906749-05:00” level=info msg=“ignoring event” module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jan 10 01:26:25 xxxxxxx dockerd[5506]: time=“2019-01-10T02:26:25.020765635-05:00” level=warning msg=“Couldn’t run auplink before unmount /var/lib/docker/aufs/mnt/ad9ab9225c557c10acf97aab3ae2f57fe56f0ad2fbae1798485b48b627a1f198” error=“signal: segmentation fault (core dumped)” storage-driver=aufs
Kernel.log
:
Jan 10 02:26:23 xxxxxx kernel: [2834598.543878] docker0: port 1(veth0c49618) entered disabled state
Jan 10 02:26:23 xxxxx kernel: [2834598.544926] docker0: port 1(veth0c49618) entered disabled state
Jan 10 02:26:23 xxxxxx kernel: [2834598.547121] device veth0c49618 left promiscuous mode
Jan 10 02:26:23 xxxxxx kernel: [2834598.547124] docker0: port 1(veth0c49618) entered disabled state
Jan 10 02:26:24 xxxxxx kernel: [2834599.394252] auplink[195721]: segfault at 7ffd0d1a1108 ip 00007f06362775b9 sp 00007ffd0d1a1110 error 6 in libc-2.23.so[7f063617e000+1c0000]
Syslog.log
:
Jan 10 02:26:23 xxxxxx dockerd[5506]: time=“2019-01-10T02:26:23-05:00” level=info msg=“shim reaped” id=8bdd4adeda1b84cc59cbd474d58a19bd2a8e97e63978e0d58f4bc7f3ce3a37e3
Jan 10 02:26:23 xxxxxx dockerd[5506]: time=“2019-01-10T02:26:23.683906749-05:00” level=info msg=“ignoring event” module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Jan 10 02:26:23 xxxxxxx kernel: [2834598.509967] veth5b6df3a: renamed from eth0
Jan 10 02:26:23 xxxxxxx kernel: [2834598.543878] docker0: port 1(veth0c49618) entered disabled state
Jan 10 02:26:23 xxxxxxxx kernel: [2834598.544926] docker0: port 1(veth0c49618) entered disabled state
Jan 10 02:26:23 xxxxxx kernel: [2834598.547121] device veth0c49618 left promiscuous mode
Jan 10 02:26:23 xxxxxx kernel: [2834598.547124] docker0: port 1(veth0c49618) entered disabled state
Jan 10 02:26:24 xxxxxxx kernel: [2834599.394252] auplink[195721]: segfault at 7ffd0d1a1108 ip 00007f06362775b9 sp 00007ffd0d1a1110 error 6 in libc-2.23.so[7f063617e000+1c0000]
Jan 10 02:26:25 xxxxxxx dockerd[5506]: time=“2019-01-10T02:26:25.020765635-05:00” level=warning msg=“Couldn’t run auplink before unmount /var/lib/docker/aufs/mnt/ad9ab9225c557c10acf97aab3ae2f57fe56f0ad2fbae1798485b48b627a1f198” error=“signal: segmentation fault (core dumped)” storage-driver=aufs
在事件发生前一周,我们注意到内核日志中存在以下错误
kernel: [2492882.553551] SLUB: Unable to allocate memory on node -1 (gfp=0x2080020)
kernel: [2492882.553556] cache: kmalloc-128(1480:8bdd4adeda1b84cc59cbd474d58a19bd2a8e97e63978e0d58f4bc7f3ce3a37e3), object size