我在台式机上安装了 ubuntu 18.04 来运行 jira、confluence 和其他几个服务。它已经运行了大约 1 年,但最近开始崩溃。
下面的命令清楚地显示一天中发生了很多次重启。
$ last -x reboot
reboot system boot 5.4.0-42-generic Fri Aug 28 09:51 still running
reboot system boot 5.4.0-42-generic Fri Aug 28 09:47 - 09:48 (00:00)
reboot system boot 5.4.0-42-generic Fri Aug 28 09:41 - 09:48 (00:07)
reboot system boot 5.4.0-42-generic Fri Aug 28 09:10 - 09:48 (00:38)
reboot system boot 5.4.0-42-generic Fri Aug 28 09:06 - 09:48 (00:42)
reboot system boot 5.4.0-42-generic Fri Aug 28 08:58 - 09:48 (00:49)
reboot system boot 5.4.0-42-generic Fri Aug 28 08:44 - 09:48 (01:03)
reboot system boot 5.4.0-42-generic Thu Aug 27 14:39 - 09:48 (19:08)
reboot system boot 5.4.0-42-generic Thu Aug 27 14:14 - 09:48 (19:34)
reboot system boot 5.4.0-42-generic Thu Aug 27 13:01 - 09:48 (20:46)
reboot system boot 5.4.0-42-generic Thu Aug 27 12:49 - 09:48 (20:59)
reboot system boot 5.4.0-42-generic Thu Aug 27 11:05 - 09:48 (22:42)
reboot system boot 5.4.0-42-generic Thu Aug 27 10:24 - 09:48 (23:23)
reboot system boot 5.4.0-42-generic Thu Aug 27 09:00 - 09:48 (1+00:48)
reboot system boot 5.4.0-42-generic Thu Aug 27 08:54 - 09:48 (1+00:54)
为了弄清楚发生了什么,我尝试了(此处将登录名替换为“########”)
$ last -x | head | tac
reboot system boot 5.4.0-42-generic Fri Aug 28 09:41 - 09:48 (00:07)
######## :0 :0 Fri Aug 28 09:41 - crash (00:06)
runlevel (to lvl 5) 5.4.0-42-generic Fri Aug 28 09:41 - 09:48 (00:06)
reboot system boot 5.4.0-42-generic Fri Aug 28 09:47 - 09:48 (00:00)
######## :0 :0 Fri Aug 28 09:47 - down (00:00)
shutdown system down 5.4.0-42-generic Fri Aug 28 09:48 - 09:51 (00:03)
reboot system boot 5.4.0-42-generic Fri Aug 28 09:51 still running
######## :0 :0 Fri Aug 28 09:51 still logged in
runlevel (to lvl 5) 5.4.0-42-generic Fri Aug 28 09:51 still running
######## pts/1 192.168.1.233 Fri Aug 28 09:59 still logged in
这是 var/log 文件检查
$ grep -iv ': starting\|kernel: .*: Power Button\|watching system buttons\|Stopped Cleaning Up\|Started Crash recovery kernel' /var/log/messages /var/log/syslog /var/log/apcupsd* | grep -iw 'recover[a-z]*\|power[a-z]*\|shut[a-z ]*down\|rsyslogd\|ups'
grep: /var/log/messages: No such file or directory
/var/log/syslog:Aug 28 00:06:34 server1 rsyslogd: [origin software="rsyslogd" swVersion="8.32.0" x-pid="978" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
/var/log/syslog:Aug 28 08:45:05 server1 systemd[1]: Started Update UTMP about System Boot/Shutdown.
/var/log/syslog:Aug 28 08:45:05 server1 apparmor[746]: Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
/var/log/syslog:Aug 28 08:45:05 server1 apparmor[746]: Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
/var/log/syslog:Aug 28 08:45:05 server1 dbus-daemon[941]: dbus[941]: Unknown group "power" in message bus configuration file
/var/log/syslog:Aug 28 08:45:05 server1 systemd[1]: Started Restore /etc/resolv.conf if the system crashed before the ppp link was shut down.
/var/log/syslog:Aug 28 08:45:05 server1 rsyslogd: imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog' (fd 3) from systemd. [v8.32.0]
/var/log/syslog:Aug 28 08:45:05 server1 rsyslogd: rsyslogd's groupid changed to 106
/var/log/syslog:Aug 28 08:45:05 server1 rsyslogd: rsyslogd's userid changed to 102
/var/log/syslog:Aug 28 08:45:05 server1 rsyslogd: [origin software="rsyslogd" swVersion="8.32.0" x-pid="967" x-info="http://www.rsyslog.com"] start
/var/log/syslog:Aug 28 08:45:05 server1 kernel: [ 0.323794] ACPI: Power Resource [FN00] (off)
/var/log/syslog:Aug 28 08:45:05 server1 kernel: [ 0.323870] ACPI: Power Resource [FN01] (off)
/var/log/syslog:Aug 28 08:45:05 server1 kernel: [ 0.323939] ACPI: Power Resource [FN02] (off)
/var/log/syslog:Aug 28 08:45:05 server1 kernel: [ 0.324007] ACPI: Power Resource [FN03] (off)
/var/log/syslog:Aug 28 08:45:05 server1 kernel: [ 0.324081] ACPI: Power Resource [FN04] (off)
/var/log/syslog:Aug 28 08:45:05 server1 kernel: [ 5.842330] EXT4-fs (sda): recovery complete
/var/log/syslog:Aug 28 08:45:05 server1 kernel: [ 11.612946] EXT4-fs (sdc1): recovery complete
/var/log/syslog:Aug 28 08:45:05 server1 NetworkManager[977]: <info> [1598571905.6255] Read config: /etc/NetworkManager/NetworkManager.conf (lib: 10-dns-resolved.conf, 20-connectivity-ubuntu.conf, no-mac-addr-change.conf) (run: 10-globally-managed-devices.conf) (etc: default-wifi-powersave-on.conf)
/var/log/syslog:Aug 28 08:45:05 server1 systemd[1]: Started Unattended Upgrades Shutdown.
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) config/udev: Adding input device Power Button (/dev/input/event1)
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (**) Power Button: Applying InputClass "libinput keyboard catchall"
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) Using input driver 'libinput' for 'Powe
Button'
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (**) Power Button: always reports core events
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event1 - Power Button: is tagged by udev as: Keyboard
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event1 - Power Button: device is a keyboard
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event1 - Power Button: device removed
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) XINPUT: Adding extended input device "Power Button" (type: KEYBOARD, id 6)
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event1 - Power Button: is tagged by udev as: Keyboard
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event1 - Power Button: device is a keyboard
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) config/udev: Adding input device Power Button (/dev/input/event0)
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (**) Power Button: Applying InputClass "libinput keyboard catchall"
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) Using input driver 'libinput' for 'Powe
Button'
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (**) Power Button: always reports core events
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event0 - Power Button: is tagged by udev as: Keyboard
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event0 - Power Button: device is a keyboard
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event0 - Power Button: device removed
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) XINPUT: Adding extended input device "Power Button" (type: KEYBOARD, id 8)
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event0 - Power Button: is tagged by udev as: Keyboard
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event0 - Power Button: device is a keyboard
/var/log/syslog:Aug 28 08:45:07 server1 systemd[1]: Started Daemon for power management.
/var/log/syslog:Aug 28 08:45:08 server1 boltd[1541]: power: force power support: no
/var/log/syslog:Aug 28 08:45:10 server1 set-cpufreq[935]: Setting powersave scheduler for all CPUs
grep: /var/log/apcupsd*: No such file or directory
这是否只是需要更换电源的迹象?或者其他原因可能导致频繁重启?
附加信息
$ sudo lshw -C memory
[sudo] password for omoroserv:
*-firmware
description: BIOS
vendor: American Megatrends Inc.
physical id: 0
version: F1
date: 01/19/2015
size: 64KiB
capacity: 4032KiB
capabilities: pci upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
*-cache:0
description: L1 cache
physical id: 4
slot: CPU Internal L1
size: 256KiB
capacity: 256KiB
capabilities: internal write-back
configuration: level=1
*-cache:1
description: L2 cache
physical id: 5
slot: CPU Internal L2
size: 1MiB
capacity: 1MiB
capabilities: internal write-back unified
configuration: level=2
*-cache:2
description: L3 cache
physical id: 6
slot: CPU Internal L3
size: 6MiB
capacity: 6MiB
capabilities: internal write-back unified
configuration: level=3
*-memory
description: System Memory
physical id: 7
slot: System board or motherboard
size: 32GiB
*-bank:0
description: DIMM DDR3 Synchronous 1600 MHz (0.6 ns)
vendor: Samsung
physical id: 0
serial: 23E50AE5
slot: ChannelA-DIMM0
size: 8GiB
width: 64 bits
clock: 1600MHz (0.6ns)
*-bank:1
description: DIMM DDR3 Synchronous 1600 MHz (0.6 ns)
product: M378B1G73DB0-CK0
vendor: Samsung
physical id: 1
serial: 1552F348
slot: ChannelA-DIMM1
size: 8GiB
width: 64 bits
clock: 1600MHz (0.6ns)
*-bank:2
description: DIMM DDR3 Synchronous 1600 MHz (0.6 ns)
vendor: Samsung
physical id: 2
serial: 23E50AE5
slot: ChannelB-DIMM0
size: 8GiB
width: 64 bits
clock: 1600MHz (0.6ns)
*-bank:3
description: DIMM DDR3 Synchronous 1600 MHz (0.6 ns)
product: M378B1G73QH0-CK0
vendor: Samsung
physical id: 3
serial: 19344C99
slot: ChannelB-DIMM1
size: 8GiB
width: 64 bits
clock: 1600MHz (0.6ns)
PCI 信息
$ lspci
00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05)
00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05)
00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d5)
00:1c.2 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #3 (rev d5)
00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation B85 Express LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 05)
00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 05)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
掉期信息
$ sysctl vm.swappiness
vm.swappiness = 60
内存信息
$ free -m
total used free shared buff/cache available
Mem: 32017 9504 18313 566 4199 21503
Swap: 2047 0 2047