我的 GCE VM 实例因不明原因而重置?

我的 GCE VM 实例因不明原因而重置?

我在 Google Compute Engine 上运行 48vCPU、96GB RAM 的虚拟机。当我执行多个执行某些任务的 docker 容器时,虚拟机实例似乎“重置”。

没有gcloud compute operations list任何建议,因为它在重置发生的时间戳中不包含任何条目。

以下是故障机器的监控信息和系统日志。 在此处输入图片描述 在此处输入图片描述 在此处输入图片描述

Jan 14 17:18:08 vehicle-fleet-big-2 kernel: [ 3203.812836] br-0d70adaeac7e: port 46(veth5ec217a) entered disabled state
Jan 14 17:18:08 vehicle-fleet-big-2 kernel: [ 3203.813443] br-0d70adaeac7e: port 47(veth2e644f5) entered disabled state
Jan 14 17:18:08 vehicle-fleet-big-2 kernel: [ 3203.813824] br-0d70adaeac7e: port 48(veth83e9ba8) entered disabled state
Jan 14 17:18:08 vehicle-fleet-big-2 kernel: [ 3204.008971] eth0: renamed from veth7e29e09
Jan 14 17:18:08 vehicle-fleet-big-2 kernel: [ 3204.057313] IPv6: ADDRCONF(NETDEV_CHANGE): vethd8ccbfc: link becomes ready
Jan 14 17:18:08 vehicle-fleet-big-2 kernel: [ 3204.057405] br-0d70adaeac7e: port 45(vethd8ccbfc) entered blocking state
Jan 14 17:18:08 vehicle-fleet-big-2 kernel: [ 3204.057408] br-0d70adaeac7e: port 45(vethd8ccbfc) entered forwarding state
Jan 14 17:18:08 vehicle-fleet-big-2 systemd-networkd[1204]: vethd8ccbfc: Gained carrier
Jan 14 17:18:09 vehicle-fleet-big-2 kernel: [ 3205.125463] eth0: renamed from vethb02bb32
Jan 14 17:18:09 vehicle-fleet-big-2 systemd-networkd[1204]: veth5ec217a: Gained carrier
Jan 14 17:18:09 vehicle-fleet-big-2 kernel: [ 3205.161119] IPv6: ADDRCONF(NETDEV_CHANGE): veth5ec217a: link becomes ready
Jan 14 17:18:09 vehicle-fleet-big-2 kernel: [ 3205.161222] br-0d70adaeac7e: port 46(veth5ec217a) entered blocking state
Jan 14 17:18:09 vehicle-fleet-big-2 kernel: [ 3205.161225] br-0d70adaeac7e: port 46(veth5ec217a) entered forwarding state
Jan 14 17:18:10 vehicle-fleet-big-2 systemd-networkd[1204]: vethd8ccbfc: Gained IPv6LL
Jan 14 17:18:11 vehicle-fleet-big-2 kernel: [ 3206.284834] eth0: renamed from veth2ab704d
Jan 14 17:18:11 vehicle-fleet-big-2 systemd-networkd[1204]: veth83e9ba8: Gained carrier
Jan 14 17:18:11 vehicle-fleet-big-2 kernel: [ 3206.336989] IPv6: ADDRCONF(NETDEV_CHANGE): veth83e9ba8: link becomes ready
Jan 14 17:18:11 vehicle-fleet-big-2 kernel: [ 3206.337073] br-0d70adaeac7e: port 48(veth83e9ba8) entered blocking state
Jan 14 17:18:11 vehicle-fleet-big-2 kernel: [ 3206.337075] br-0d70adaeac7e: port 48(veth83e9ba8) entered forwarding state
Jan 14 17:18:11 vehicle-fleet-big-2 systemd-networkd[1204]: veth5ec217a: Gained IPv6LL
Jan 14 17:18:12 vehicle-fleet-big-2 kernel: [ 3207.220883] eth0: renamed from veth35b659d
Jan 14 17:18:12 vehicle-fleet-big-2 systemd-networkd[1204]: veth2e644f5: Gained carrier
Jan 14 17:18:12 vehicle-fleet-big-2 kernel: [ 3207.260969] IPv6: ADDRCONF(NETDEV_CHANGE): veth2e644f5: link becomes ready
Jan 14 17:18:12 vehicle-fleet-big-2 kernel: [ 3207.261052] br-0d70adaeac7e: port 47(veth2e644f5) entered blocking state
Jan 14 17:18:12 vehicle-fleet-big-2 kernel: [ 3207.261056] br-0d70adaeac7e: port 47(veth2e644f5) entered forwarding state
Jan 14 17:18:12 vehicle-fleet-big-2 systemd-networkd[1204]: veth83e9ba8: Gained IPv6LL
Jan 14 17:18:12 vehicle-fleet-big-2 systemd[1]: Stopping User Manager for UID 1001...
Jan 14 17:18:12 vehicle-fleet-big-2 systemd[17335]: Stopped target Default.
Jan 14 17:18:12 vehicle-fleet-big-2 systemd[17335]: Stopped target Basic System.
Jan 14 17:18:12 vehicle-fleet-big-2 systemd[17335]: Stopped target Sockets.
Jan 14 17:18:12 vehicle-fleet-big-2 systemd[17335]: Closed GnuPG cryptographic agent and passphrase cache.
Jan 14 17:18:12 vehicle-fleet-big-2 systemd[17335]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Jan 14 17:18:12 vehicle-fleet-big-2 systemd[17335]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Jan 14 17:18:12 vehicle-fleet-big-2 systemd[17335]: Stopped target Paths.
Jan 14 17:18:12 vehicle-fleet-big-2 systemd[17335]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Jan 14 17:18:12 vehicle-fleet-big-2 systemd[17335]: Closed GnuPG network certificate management daemon.
Jan 14 17:18:12 vehicle-fleet-big-2 systemd[17335]: Reached target Shutdown.
Jan 14 17:18:12 vehicle-fleet-big-2 systemd[17335]: Starting Exit the Session...
Jan 14 17:18:12 vehicle-fleet-big-2 systemd[17335]: Stopped target Timers.
Jan 14 17:18:12 vehicle-fleet-big-2 systemd[17335]: Received SIGRTMIN+24 from PID 21047 (kill).
Jan 14 17:18:12 vehicle-fleet-big-2 systemd[1]: Stopped User Manager for UID 1001.
Jan 14 17:18:12 vehicle-fleet-big-2 systemd[1]: Removed slice User Slice of filip.
Jan 14 17:18:13 vehicle-fleet-big-2 systemd-networkd[1204]: veth2e644f5: Gained IPv6LL
Jan 14 17:18:21 vehicle-fleet-big-2 systemd[1]: Created slice User Slice of filip.
Jan 14 17:18:21 vehicle-fleet-big-2 systemd[1]: Starting User Manager for UID 1001...
Jan 14 17:18:21 vehicle-fleet-big-2 systemd[1]: Started Session 360 of user filip.
Jan 14 17:18:21 vehicle-fleet-big-2 systemd[25738]: Reached target Paths.
Jan 14 17:18:21 vehicle-fleet-big-2 systemd[25738]: Listening on GnuPG network certificate management daemon.
Jan 14 17:18:21 vehicle-fleet-big-2 systemd[25738]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
Jan 14 17:18:21 vehicle-fleet-big-2 systemd[25738]: Listening on GnuPG cryptographic agent and passphrase cache.
Jan 14 17:18:21 vehicle-fleet-big-2 systemd[25738]: Listening on GnuPG cryptographic agent and passphrase cache (access for web browsers).
Jan 14 17:18:21 vehicle-fleet-big-2 systemd[25738]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
Jan 14 17:18:21 vehicle-fleet-big-2 systemd[25738]: Reached target Sockets.
Jan 14 17:18:21 vehicle-fleet-big-2 systemd[25738]: Reached target Timers.
Jan 14 17:18:21 vehicle-fleet-big-2 systemd[25738]: Reached target Basic System.
Jan 14 17:18:21 vehicle-fleet-big-2 systemd[25738]: Reached target Default.
Jan 14 17:18:21 vehicle-fleet-big-2 systemd[25738]: Startup finished in 73ms.
Jan 14 17:18:21 vehicle-fleet-big-2 systemd[1]: Started User Manager for UID 1001.
Jan 14 17:18:22 vehicle-fleet-big-2 systemd[1]: Stopping User Manager for UID 1001...
Jan 14 17:18:22 vehicle-fleet-big-2 systemd[25738]: Stopped target Default.
Jan 14 17:18:22 vehicle-fleet-big-2 systemd[25738]: Stopped target Basic System.
Jan 14 17:18:22 vehicle-fleet-big-2 systemd[25738]: Stopped target Paths.
Jan 14 17:18:22 vehicle-fleet-big-2 systemd[25738]: Stopped target Timers.
Jan 14 17:18:22 vehicle-fleet-big-2 systemd[25738]: Stopped target Sockets.
Jan 14 17:18:22 vehicle-fleet-big-2 systemd[25738]: Closed GnuPG network certificate management daemon.
Jan 14 17:18:22 vehicle-fleet-big-2 systemd[25738]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Jan 14 17:18:22 vehicle-fleet-big-2 systemd[25738]: Closed GnuPG cryptographic agent and passphrase cache.
Jan 14 17:18:22 vehicle-fleet-big-2 systemd[25738]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Jan 14 17:18:22 vehicle-fleet-big-2 systemd[25738]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Jan 14 17:18:22 vehicle-fleet-big-2 systemd[25738]: Reached target Shutdown.
Jan 14 17:18:22 vehicle-fleet-big-2 systemd[25738]: Starting Exit the Session...
Jan 14 17:18:22 vehicle-fleet-big-2 systemd[25738]: Received SIGRTMIN+24 from PID 27228 (kill).
Jan 14 17:18:22 vehicle-fleet-big-2 systemd[1]: Stopped User Manager for UID 1001.
Jan 14 17:18:22 vehicle-fleet-big-2 systemd[1]: Removed slice User Slice of filip.

---- HERE IS WHERE THE RESET SEEM TO HAPPEN ---

Jan 14 17:18:54 vehicle-fleet-big-2 systemd-modules-load[808]: Inserted module 'iscsi_tcp'
Jan 14 17:18:54 vehicle-fleet-big-2 systemd-modules-load[808]: Inserted module 'ib_iser'
Jan 14 17:18:54 vehicle-fleet-big-2 systemd[1]: Started Remount Root and Kernel File Systems.
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] Linux version 4.15.0-1026-gcp (buildd@lgw01-amd64-013) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #27-Ubuntu SMP Thu Dec 6 18:27:01 UTC 2018 (Ubuntu 4.15.0-1026.27-gcp 4.15.18)
Jan 14 17:18:54 vehicle-fleet-big-2 systemd[1]: Started Uncomplicated firewall.
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.15.0-1026-gcp root=UUID=3d6dfdd5-865f-4188-80fb-f09f9f8b3269 ro scsi_mod.use_blk_mq=Y console=ttyS0
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] KERNEL supported cpus:
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000]   Intel GenuineIntel
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000]   AMD AuthenticAMD
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000]   Centaur CentaurHauls
Jan 14 17:18:54 vehicle-fleet-big-2 systemd[1]: Started Set the console keyboard layout.
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
Jan 14 17:18:54 vehicle-fleet-big-2 systemd[1]: Mounted POSIX Message Queue File System.
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] e820: BIOS-provided physical RAM map:
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bfffafff] usable
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] BIOS-e820: [mem 0x00000000bfffb000-0x00000000bfffffff] reserved
Jan 14 17:18:54 vehicle-fleet-big-2 systemd[1]: Mounted Kernel Debug File System.
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] BIOS-e820: [mem 0x00000000fffbc000-0x00000000ffffffff] reserved
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000183fffffff] usable
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] NX (Execute Disable) protection: active
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] SMBIOS 2.4 present.
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] DMI: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Jan 14 17:18:54 vehicle-fleet-big-2 kernel: [    0.000000] Hypervisor detected: KVM

答案1

根据 Google Cloud Platform 关于实时迁移的文档,我会放弃任何实时迁移问题:

“实时迁移不会改变虚拟机本身的任何属性或特性。实时迁移过程只是将正在运行的虚拟机从一台主机转移到同一区域内的另一台主机。所有虚拟机属性和特性均保持不变,包括内部和外部 IP 地址、实例元数据、块存储数据和卷、操作系统和应用程序状态、网络设置、网络连接等。”https://cloud.google.com/compute/docs/instances/live-migration

因此,实时迁移不可能改变实例内部的任何内容,而且抢占选项也处于关闭状态。

正如您在运行 gcloud compute 操作列表时提到的,显示的任何操作都与此事件发生的时间戳不匹配,也许可以尝试查看活动仪表板和 Stackdriver Logging。

查看您的日志,很有趣的是为什么系统会创建一个用户切片,达到关闭目标,并在几秒钟后将其删除,我建议您使用该命令systemd-cgtop监控实例的控制组并使用最后的命令查看哪些用户连接到了您的实例、时间戳和源 IP。

最后,你使用docker执行什么样的任务?

答案2

这看起来像是从虚拟机内部发起的常规重启 - 注意所有 systemd 的“已停止”日志 - 它们表明 systemd 是故意这样做的。GCE 虚拟机死机的可能性非常低(例如,如果突然出现硬件故障),但它看起来与您在此处看到的完全不同。

还请注意Jan 14 17:18:12 vehicle-fleet-big-2 systemd[1]: Stopping User Manager for UID 1001...日志行。我建议您找出哪个用户的 UID 为 1001,并检查他们在那段时间做了什么。

相关内容