我正在使用一个名为 zBackup 的程序来处理大量(每个 100+>90GB)的 tar 文件。每五次左右,系统就会重新启动。我用过
last -x
显示崩溃
kayot pts/2 --.--.--.-- Wed Dec 5 14:19 still logged in
runlevel (to lvl 5) 4.15.0-42-generi Wed Dec 5 14:19 still running
reboot system boot 4.15.0-42-generi Wed Dec 5 14:18 still running
kayot pts/2 --.--.--.-- Wed Dec 5 11:59 - crash (02:19) <-- Here
runlevel (to lvl 5) 4.15.0-42-generi Wed Dec 5 11:58 - 14:19 (02:20)
reboot system boot 4.15.0-42-generi Wed Dec 5 11:58 still running
shutdown system down 4.15.0-39-generi Wed Dec 5 11:57 - 11:58 (00:00)
kayot pts/2 --.--.--.-- Wed Dec 5 11:57 - 11:57 (00:00)
kayot pts/2 --.--.--.-- Tue Dec 4 10:23 - 10:35 (00:11)
** 我用 --.--.--.-- 替换了我的 IP 地址
这与重新启动一致。
然而,检查这些时间的 kern.log 和 syslog 并不能告诉我发生了什么导致了这个问题。
Dec 5 12:05:55 core 50-motd-news[9567]: - <Removed MOTD HTML Link>
Dec 5 12:05:55 core systemd[1]: Started Message of the Day.
Dec 5 12:13:54 core systemd[1]: Starting Cleanup of Temporary Directories...
Dec 5 12:13:54 core systemd[1]: Started Cleanup of Temporary Directories.
Dec 5 12:17:01 core CRON[29385]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 5 13:17:01 core CRON[25753]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 5 14:17:01 core CRON[8785]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
<Rebooted Here>
Dec 5 14:19:01 core systemd-modules-load[511]: Inserted module 'iscsi_tcp'
Dec 5 14:19:01 core kernel: [ 0.000000] microcode: microcode updated early to revision 0x25, date = 2018-04-02
Dec 5 14:19:01 core systemd-modules-load[511]: Inserted module 'ib_iser'
没有每小时的 cron 任务。
Dec 5 11:59:03 core kernel: [ 62.374831] audit: type=1400 audit(1544029143.830:51): apparmor="STATUS" operation="profile_load" label="lxd-website_</var/lib/lxd>//&:lxd-website_<var-lib-lxd>:unconfined" name="/usr/lib/NetworkManager/nm-dhcp-helper" pid=5802 comm="apparmor_parser"
Dec 5 11:59:03 core kernel: [ 62.374833] audit: type=1400 audit(1544029143.830:52): apparmor="STATUS" operation="profile_load" label="lxd-website_</var/lib/lxd>//&:lxd-website_<var-lib-lxd>:unconfined" name="/usr/lib/connman/scripts/dhclient-script" pid=5802 comm="apparmor_parser"
<Rebooted Here>
Dec 5 14:19:01 core kernel: [ 0.000000] microcode: microcode updated early to revision 0x25, date = 2018-04-02
Dec 5 14:19:01 core kernel: [ 0.000000] Linux version 4.15.0-42-generic (buildd@lgw01-amd64-023) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #45-Ubuntu SMP Thu Nov 15 19:32:57 UTC 2018 (Ubuntu 4.15.0-42.45-generic 4.15.18)
系统没有过热。我在 SuperMicro IPMI 工具中观察了 CPU 温度,但它们没有上升。
我的操作系统是 Ubuntu 18.04 LTS。该数据位于 ZFS 上并被写入同一个 ZFS。
我的下一步诊断步骤应该是什么?