是什么导致我的重新启动导致非正常关机?

是什么导致我的重新启动导致非正常关机?

我的问题

在哪里可以找到非正常关机时的重启/关机日志?

我的情况

在我的redhat 7.9 (3.10.0-1160.62.1.el7.x86_64)升级版上redhat 6,一个简单的sudo reboot原因是导致虚拟机崩溃。我可以在下面的日志中看到,这是由于存在两个连续的 SYSTEM_BOOT 导致的非正常关闭引起的:

[root@rsvsiesigmprd02 wasadmin]# ausearch -i -m system_boot,system_shutdown | tail -7
type=SYSTEM_BOOT msg=audit(11/07/2022 06:18:15.281:6) : pid=765 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success'
----
type=SYSTEM_SHUTDOWN msg=audit(15/07/2022 03:12:06.064:5192) : pid=10697 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success'
----
type=SYSTEM_BOOT msg=audit(15/07/2022 08:38:48.151:6) : pid=767 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success'
----
type=SYSTEM_BOOT msg=audit(18/07/2022 06:17:40.034:6) : pid=767 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success'

另外,我无法通过 SSH 访问虚拟机,最后的手段是通过 VMWare 重新启动并硬关机。

我在日志中找不到任何导致此问题的信息。我检查了 :

    1. /var/日志/消息
    1. 日志控制

1. /var/log/messages 中

Jul 15 03:01:31 xxxxxxxxxx systemd: Started Session 722 of user ansible.
Jul 15 03:01:34 xxxxxxxxxx python: ansible-ansible.legacy.setup Invoked with filter=[] gather_subset=['all'] fact_path=/etc/ansible/facts.d gather_timeout=10
Jul 15 03:01:43 xxxxxxxxxx python: ansible-ansible.legacy.command Invoked with creates=None executable=None _uses_shell=False strip_empty_ends=True _raw_params=/usr/bin/package-cleanup -y --oldkernels --count=2 removes=None argv=None warn=False chdir=None stdin_add_newline=True stdin=None
Jul 15 03:01:53 xxxxxxxxxx python: ansible-ansible.legacy.yum Invoked with lock_timeout=30 update_cache=False conf_file=None exclude=[] allow_downgrade=False disable_gpg_check=False disable_excludes=None use_backend=auto state=latest disablerepo=[] skip_broken=False releasever=None autoremove=False download_dir=None enable_plugin=[] installroot=/ install_weak_deps=True name=['*'] download_only=False bugfix=False list=None install_repoquery=True update_only=False disable_plugin=[] enablerepo=[] security=True validate_certs=True
Jul 15 03:05:03 xxxxxxxxxx systemd-logind: New session 723 of user wasadmin.
Jul 15 03:05:03 xxxxxxxxxx systemd: Started Session 723 of user wasadmin.
Jul 15 03:05:04 xxxxxxxxxx systemd-logind: Removed session 723.
Jul 15 03:07:54 xxxxxxxxxx python: ansible-ansible.legacy.setup Invoked with filter=[] gather_subset=['all'] fact_path=/etc/ansible/facts.d gather_timeout=10
Jul 15 03:08:04 xxxxxxxxxx python: ansible-ansible.legacy.command Invoked with creates=None executable=None _uses_shell=False strip_empty_ends=True _raw_params=/sbin/shutdown -r +1 removes=None argv=None warn=False chdir=None stdin_add_newline=True stdin=None
Jul 15 03:08:04 xxxxxxxxxx systemd: Started Delayed Shutdown Service.
Jul 15 03:08:04 xxxxxxxxxx systemd-shutdownd: Shutting down at Fri 2022-07-15 03:09:04 CEST (reboot)...
Jul 15 03:08:04 xxxxxxxxxx systemd-shutdownd: Creating /run/nologin, blocking further logins...
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping Session 721 of user root.
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping NFS status monitor for NFSv2/3 locking....
Jul 15 03:09:04 xxxxxxxxxx systemd: Unmounting /opt/maximo/data...
Jul 15 03:09:04 xxxxxxxxxx systemd: Removed slice system-selinux\x2dpolicy\x2dmigrate\x2dlocal\x2dchanges.slice.
Jul 15 03:09:04 xxxxxxxxxx systemd: Unmounting /opt/maximo/sedif...
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopped Dump dmesg to /var/log/dmesg.
Jul 15 03:09:04 xxxxxxxxxx systemd: Unmounting /opt/maximo/agora...
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping RPC bind service...
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopped target RPC Port Mapper.
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping Session c2 of user wasadmin.
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopped target rpc_pipefs.target.
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping Session 722 of user ansible.
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopped target Timers.
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopped Daily Cleanup of Temporary Directories.
Jul 15 03:09:04 xxxxxxxxxx systemd: Closed LVM2 poll daemon socket.
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping Session 2 of user wasadmin.
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopped target Multi-User System.
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping LSB: Starts and stops WebSphere Application Server instances...
Jul 15 03:09:04 xxxxxxxxxx umount: umount.nfs: /opt/maximo/data: device is busy
Jul 15 03:09:04 xxxxxxxxxx journal: IBM Java[2311]: JVMDUMP039I Traitement de l'événement de vidage "user", détails "" à 2022/07/15 03:09:04 - Veuillez patienter.
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping LSB: Starts the Spacewalk Daemon...
Jul 15 03:09:04 xxxxxxxxxx journal: IBM Java[2311]: JVMDUMP032I La machine virtuelle Java a demandé un vidage Java en utilisant '/opt/dump/javacore.20220715.030904.2311.0001.txt' en réponse à un événement
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping Command Scheduler...
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopped target Login Prompts.
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping Getty on tty1...
Jul 15 03:09:04 xxxxxxxxxx journal: IBM Java[3796]: JVMDUMP039I Traitement de l'événement de vidage "user", détails "" à 2022/07/15 03:09:04 - Veuillez patienter.
Jul 15 03:09:04 xxxxxxxxxx journal: IBM Java[2310]: JVMDUMP039I Traitement de l'événement de vidage "user", détails "" à 2022/07/15 03:09:04 - Veuillez patienter.
Jul 15 03:09:04 xxxxxxxxxx journal: IBM Java[3220]: JVMDUMP039I Traitement de l'événement de vidage "user", détails "" à 2022/07/15 03:09:04 - Veuillez patienter.
Jul 15 03:09:04 xxxxxxxxxx RSVSIESIGMPRD01ManagerNode_was.init: Stopping WebSphere Application Server - dmgr ...
Jul 15 03:09:04 xxxxxxxxxx journal: IBM Java[3796]: JVMDUMP032I La machine virtuelle Java a demandé un vidage Java en utilisant '/opt/IBM/WebSphere/AppServer/profiles/AppSrvUI01/javacore.20220715.030904.3796.0001.txt' en réponse à un événement
Jul 15 03:09:04 xxxxxxxxxx journal: IBM Java[3220]: JVMDUMP032I La machine virtuelle Java a demandé un vidage Java en utilisant '/opt/IBM/WebSphere/AppServer/profiles/dmgr/javacore.20220715.030904.3220.0001.txt' en réponse à un événement
Jul 15 03:09:04 xxxxxxxxxx journal: IBM Java[2310]: JVMDUMP032I La machine virtuelle Java a demandé un vidage Java en utilisant '/opt/dump/javacore.20220715.030904.2310.0001.txt' en réponse à un événement
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping LSB: Starts and stops IHS...
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping ACPI Event Daemon...
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopped Resets System Activity Logs.
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping Zabbix Agent...
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping System Logging Service...
Jul 15 08:38:30 xxxxxxxxxx kernel: Initializing cgroup subsys cpuset
Jul 15 08:38:30 xxxxxxxxxx kernel: Initializing cgroup subsys cpu
Jul 15 08:38:30 xxxxxxxxxx kernel: Initializing cgroup subsys cpuacct
Jul 15 08:38:30 xxxxxxxxxx kernel: Linux version 3.10.0-1160.62.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Wed Mar 23 09:04:02 UTC 2022
Jul 15 08:38:30 xxxxxxxxxx kernel: Command line: root=/dev/mapper/vg_main-lv_root ro rd.luks=0 rd.lvm.lv=vg_main/lv_swap rd.md=0 rd.locale.LANG=fr_FR.UTF-8 vconsole.font=latarcyrheb-sun16 KEYBOARDTYPE=pc vconsole.keymap=fr-latin9 crashkernel=auto rd.lvm.lv=vg_main/lv_root rd.dm=0 rhgb quiet LANG=fr_FR.UTF-8
Jul 15 08:38:30 xxxxxxxxxx kernel: Disabled fast string operations
Jul 15 08:38:30 xxxxxxxxxx kernel: e820: BIOS-provided physical RAM map:
Jul 15 08:38:30 xxxxxxxxxx kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009f7ff] usable
Jul 15 08:38:30 xxxxxxxxxx kernel: BIOS-e820: [mem 0x000000000009f800-0x000000000009ffff] reserved
Jul 15 08:38:30 xxxxxxxxxx kernel: BIOS-e820: [mem 0x00000000000dc000-0x00000000000fffff] reserved
Jul 15 08:38:30 xxxxxxxxxx kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000bfedffff] usable
Jul 15 08:38:30 xxxxxxxxxx kernel: BIOS-e820: [mem 0x00000000bfee0000-0x00000000bfefefff] ACPI data
Jul 15 08:38:30 xxxxxxxxxx kernel: BIOS-e820: [mem 0x00000000bfeff000-0x00000000bfefffff] ACPI NVS
Jul 15 08:38:30 xxxxxxxxxx kernel: BIOS-e820: [mem 0x00000000bff00000-0x00000000bfffffff] usable
Jul 15 08:38:30 xxxxxxxxxx kernel: BIOS-e820: [mem 0x00000000f0000000-0x00000000f7ffffff] reserved
Jul 15 08:38:30 xxxxxxxxxx kernel: BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
Jul 15 08:38:30 xxxxxxxxxx kernel: BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
Jul 15 08:38:30 xxxxxxxxxx kernel: BIOS-e820: [mem 0x00000000fffe0000-0x00000000ffffffff] reserved
Jul 15 08:38:30 xxxxxxxxxx kernel: BIOS-e820: [mem 0x0000000100000000-0x000000063fffffff] usable
Jul 15 08:38:30 xxxxxxxxxx kernel: NX (Execute Disable) protection: active
Jul 15 08:38:30 xxxxxxxxxx kernel: SMBIOS 2.4 present.
Jul 15 08:38:30 xxxxxxxxxx kernel: DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Jul 15 08:38:30 xxxxxxxxxx kernel: Hypervisor detected: VMware
Jul 15 08:38:30 xxxxxxxxxx kernel: vmware: TSC freq read from hypervisor : 2893.203 MHz
Jul 15 08:38:30 xxxxxxxxxx kernel: vmware: Host bus clock speed read from hypervisor : 66000000 Hz
Jul 15 08:38:30 xxxxxxxxxx kernel: vmware: using sched offset of 7222055233 ns
Jul 15 08:38:30 xxxxxxxxxx kernel: e820: last_pfn = 0x640000 max_arch_pfn = 0x400000000
Jul 15 08:38:30 xxxxxxxxxx kernel: PAT configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- UC
Jul 15 08:38:30 xxxxxxxxxx kernel: total RAM covered: 31744M

==> 如您所见:系统从 3:00 到 8:00 停机

注意:当我这样做时,sudo reboot -f重新启动就成功了。

谢谢。

答案1

日志行换行以提高可读性:

Jul 15 03:08:04 xxxxxxxxxx python: ansible-ansible.legacy.command Invoked with creates=None \
    executable=None _uses_shell=False strip_empty_ends=True _raw_params=/sbin/shutdown -r +1 \
    removes=None argv=None warn=False chdir=None stdin_add_newline=True stdin=None

这看起来像是某人或某物正在shutdown -r +1使用 Ansible 进行调用。

关闭前的最后一条消息是:

Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping LSB: Starts and stops IHS...
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping ACPI Event Daemon...
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopped Resets System Activity Logs.
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping Zabbix Agent...
Jul 15 03:09:04 xxxxxxxxxx systemd: Stopping System Logging Service...

...这意味着受控关闭已经取得了很大进展。系统日志服务停止后,显然在重新启动之前不会有任何进一步的日志条目/var/log/messages。如果您启用了持久性 systemd 日志(通过创建/var/log/journal目录),您可能会在关闭之前看到更多条目,可以使用journalctl.

在关闭作业之前,似乎有一个 Ansible 作业正在运行yum

Jul 15 03:01:53 xxxxxxxxxx python: ansible-ansible.legacy.yum Invoked with lock_timeout=30 \
    update_cache=False conf_file=None exclude=[] allow_downgrade=False disable_gpg_check=False \
    disable_excludes=None use_backend=auto state=latest disablerepo=[] skip_broken=False \
    releasever=None autoremove=False download_dir=None enable_plugin=[] installroot=/ \
    install_weak_deps=True name=['*'] download_only=False bugfix=False list=None \
    install_repoquery=True update_only=False disable_plugin=[] enablerepo=[] security=True \
    validate_certs=True

最好查看此yum操作是否带来了任何内核更新,如果带来了,请验证 initramfs 创建是否成功。我想我以前见过 RHEL 7.x 的 initramfs 创建失败的情况,因为例如文件系统由于yum包缓存而已满,然后使用新内核的引导将失败,有时甚至在它可以挂载根之前文件系统处于读/写模式,这显然意味着该引导尝试不会写入任何日志。

由于这是一个 VMware 虚拟机,虚拟化管理程序(和/或 vSphere 管理,如果可用)的日志可能会详细说明系统崩溃后发生的情况;也许当时有一个预定的虚拟机管理程序服务中断,并且 VMware 管理员阻止虚拟机启动,直到他们的服务中断部分完成为止?

另外,我不会在重新启动后立即盲目相信记录的时间值,直到系统开始联网并启用 NTP 时间同步。在混合 Windows/Linux VMware 环境中,可能会忘记禁用 VMware 时间同步。如果环境以 Windows 虚拟机为主,则 VMware 主机可能会以本地时间运行其系统时钟...并且虚拟机虚拟时钟在启动时也会默认为该时间。

如果确实如此,那么当 NTP 同步服务启动时,您会在启动时看到时间跳跃,可能大致等于您站点当前的 UTC/DST 偏移量。如果启动后时间戳后来从 Jul 15 08:3x:xx 跳到大约 Jul 15 03:xx:xx,则表明您的 VM 配置为(错误地)假设其虚拟硬件时钟为 UTC,并且您正在运行以 Windows 为中心的 VMware 环境,位于 UTC+4(+DST,北半球)或 UTC+5(无 DST)时区。

您的非正常关机发生在系统于 15/07/2022 08:38:48.151:6 启动和下次启动于 18/07/2022 06:17:40.034:6 之间的某个时间。您的/var/log/messages摘录不涵盖该时间间隔。

要弄清楚非正常关闭发生了什么,您需要找到时间 18/07/2022 06:17:40 之前的最后日志条目

如果有/var/log/messages7 月 15 日 08:38:30 之后不再登录,然后我会考虑虚拟机可能已被黑客攻击并(不熟练地)被 rootkit 的可能性。如果事实证明这是真的,那么关机时的任何崩溃都可能是虚拟机中 Rootkit 和/或其他恶意软件的副作用,但这可能是您此时最不需要担心的......

相关内容