为什么我的 ubuntu 服务器不断重启?

为什么我的 ubuntu 服务器不断重启?

我在台式机上安装了 ubuntu 18.04 来运行 jira、confluence 和其他几个服务。它已经运行了大约 1 年,但最近开始崩溃。

下面的命令清楚地显示一天中发生了很多次重启。

$ last -x reboot
reboot   system boot  5.4.0-42-generic Fri Aug 28 09:51   still running
reboot   system boot  5.4.0-42-generic Fri Aug 28 09:47 - 09:48  (00:00)
reboot   system boot  5.4.0-42-generic Fri Aug 28 09:41 - 09:48  (00:07)
reboot   system boot  5.4.0-42-generic Fri Aug 28 09:10 - 09:48  (00:38)
reboot   system boot  5.4.0-42-generic Fri Aug 28 09:06 - 09:48  (00:42)
reboot   system boot  5.4.0-42-generic Fri Aug 28 08:58 - 09:48  (00:49)
reboot   system boot  5.4.0-42-generic Fri Aug 28 08:44 - 09:48  (01:03)
reboot   system boot  5.4.0-42-generic Thu Aug 27 14:39 - 09:48  (19:08)
reboot   system boot  5.4.0-42-generic Thu Aug 27 14:14 - 09:48  (19:34)
reboot   system boot  5.4.0-42-generic Thu Aug 27 13:01 - 09:48  (20:46)
reboot   system boot  5.4.0-42-generic Thu Aug 27 12:49 - 09:48  (20:59)
reboot   system boot  5.4.0-42-generic Thu Aug 27 11:05 - 09:48  (22:42)
reboot   system boot  5.4.0-42-generic Thu Aug 27 10:24 - 09:48  (23:23)
reboot   system boot  5.4.0-42-generic Thu Aug 27 09:00 - 09:48 (1+00:48)
reboot   system boot  5.4.0-42-generic Thu Aug 27 08:54 - 09:48 (1+00:54)

为了弄清楚发生了什么,我尝试了(此处将登录名替换为“########”)

$ last -x | head | tac
reboot   system boot  5.4.0-42-generic Fri Aug 28 09:41 - 09:48  (00:07)
######## :0           :0               Fri Aug 28 09:41 - crash  (00:06)
runlevel (to lvl 5)   5.4.0-42-generic Fri Aug 28 09:41 - 09:48  (00:06)
reboot   system boot  5.4.0-42-generic Fri Aug 28 09:47 - 09:48  (00:00)
######## :0           :0               Fri Aug 28 09:47 - down   (00:00)
shutdown system down  5.4.0-42-generic Fri Aug 28 09:48 - 09:51  (00:03)
reboot   system boot  5.4.0-42-generic Fri Aug 28 09:51   still running
######## :0           :0               Fri Aug 28 09:51   still logged in
runlevel (to lvl 5)   5.4.0-42-generic Fri Aug 28 09:51   still running
######## pts/1        192.168.1.233    Fri Aug 28 09:59   still logged in

这是 var/log 文件检查

$ grep -iv ': starting\|kernel: .*: Power Button\|watching system buttons\|Stopped Cleaning Up\|Started Crash recovery kernel'   /var/log/messages /var/log/syslog /var/log/apcupsd*   | grep -iw 'recover[a-z]*\|power[a-z]*\|shut[a-z ]*down\|rsyslogd\|ups'
grep: /var/log/messages: No such file or directory
/var/log/syslog:Aug 28 00:06:34 server1 rsyslogd:  [origin software="rsyslogd" swVersion="8.32.0" x-pid="978" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
/var/log/syslog:Aug 28 08:45:05 server1 systemd[1]: Started Update UTMP about System Boot/Shutdown.
/var/log/syslog:Aug 28 08:45:05 server1 apparmor[746]: Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
/var/log/syslog:Aug 28 08:45:05 server1 apparmor[746]: Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
/var/log/syslog:Aug 28 08:45:05 server1 dbus-daemon[941]: dbus[941]: Unknown group "power" in message bus configuration file
/var/log/syslog:Aug 28 08:45:05 server1 systemd[1]: Started Restore /etc/resolv.conf if the system crashed before the ppp link was shut down.
/var/log/syslog:Aug 28 08:45:05 server1 rsyslogd: imuxsock: Acquired UNIX socket '/run/systemd/journal/syslog' (fd 3) from systemd.  [v8.32.0]
/var/log/syslog:Aug 28 08:45:05 server1 rsyslogd: rsyslogd's groupid changed to 106
/var/log/syslog:Aug 28 08:45:05 server1 rsyslogd: rsyslogd's userid changed to 102
/var/log/syslog:Aug 28 08:45:05 server1 rsyslogd:  [origin software="rsyslogd" swVersion="8.32.0" x-pid="967" x-info="http://www.rsyslog.com"] start
/var/log/syslog:Aug 28 08:45:05 server1 kernel: [    0.323794] ACPI: Power Resource [FN00] (off)
/var/log/syslog:Aug 28 08:45:05 server1 kernel: [    0.323870] ACPI: Power Resource [FN01] (off)
/var/log/syslog:Aug 28 08:45:05 server1 kernel: [    0.323939] ACPI: Power Resource [FN02] (off)
/var/log/syslog:Aug 28 08:45:05 server1 kernel: [    0.324007] ACPI: Power Resource [FN03] (off)
/var/log/syslog:Aug 28 08:45:05 server1 kernel: [    0.324081] ACPI: Power Resource [FN04] (off)
/var/log/syslog:Aug 28 08:45:05 server1 kernel: [    5.842330] EXT4-fs (sda): recovery complete
/var/log/syslog:Aug 28 08:45:05 server1 kernel: [   11.612946] EXT4-fs (sdc1): recovery complete
/var/log/syslog:Aug 28 08:45:05 server1 NetworkManager[977]: <info>  [1598571905.6255] Read config: /etc/NetworkManager/NetworkManager.conf (lib: 10-dns-resolved.conf, 20-connectivity-ubuntu.conf, no-mac-addr-change.conf) (run: 10-globally-managed-devices.conf) (etc: default-wifi-powersave-on.conf)
/var/log/syslog:Aug 28 08:45:05 server1 systemd[1]: Started Unattended Upgrades Shutdown.
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) config/udev: Adding input device Power Button (/dev/input/event1)
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (**) Power Button: Applying InputClass "libinput keyboard catchall"
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) Using input driver 'libinput' for 'Powe
 Button'
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (**) Power Button: always reports core events
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event1  - Power Button: is tagged by udev as: Keyboard
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event1  - Power Button: device is a keyboard
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event1  - Power Button: device removed
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) XINPUT: Adding extended input device "Power Button" (type: KEYBOARD, id 6)
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event1  - Power Button: is tagged by udev as: Keyboard
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event1  - Power Button: device is a keyboard
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) config/udev: Adding input device Power Button (/dev/input/event0)
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (**) Power Button: Applying InputClass "libinput keyboard catchall"
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) Using input driver 'libinput' for 'Powe
 Button'
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (**) Power Button: always reports core events
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event0  - Power Button: is tagged by udev as: Keyboard
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event0  - Power Button: device is a keyboard
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event0  - Power Button: device removed
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) XINPUT: Adding extended input device "Power Button" (type: KEYBOARD, id 8)
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event0  - Power Button: is tagged by udev as: Keyboard
/var/log/syslog:Aug 28 08:45:06 server1 /usr/lib/gdm3/gdm-x-session[1179]: (II) event0  - Power Button: device is a keyboard
/var/log/syslog:Aug 28 08:45:07 server1 systemd[1]: Started Daemon for power management.
/var/log/syslog:Aug 28 08:45:08 server1 boltd[1541]: power: force power support: no
/var/log/syslog:Aug 28 08:45:10 server1 set-cpufreq[935]: Setting powersave scheduler for all CPUs
grep: /var/log/apcupsd*: No such file or directory

这是否只是需要更换电源的迹象?或者其他原因可能导致频繁重启?

附加信息

$ sudo lshw -C memory
[sudo] password for omoroserv:
  *-firmware
       description: BIOS
       vendor: American Megatrends Inc.
       physical id: 0
       version: F1
       date: 01/19/2015
       size: 64KiB
       capacity: 4032KiB
       capabilities: pci upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
  *-cache:0
       description: L1 cache
       physical id: 4
       slot: CPU Internal L1
       size: 256KiB
       capacity: 256KiB
       capabilities: internal write-back
       configuration: level=1
  *-cache:1
       description: L2 cache
       physical id: 5
       slot: CPU Internal L2
       size: 1MiB
       capacity: 1MiB
       capabilities: internal write-back unified
       configuration: level=2
  *-cache:2
       description: L3 cache
       physical id: 6
       slot: CPU Internal L3
       size: 6MiB
       capacity: 6MiB
       capabilities: internal write-back unified
       configuration: level=3
  *-memory
       description: System Memory
       physical id: 7
       slot: System board or motherboard
       size: 32GiB
     *-bank:0
          description: DIMM DDR3 Synchronous 1600 MHz (0.6 ns)
          vendor: Samsung
          physical id: 0
          serial: 23E50AE5
          slot: ChannelA-DIMM0
          size: 8GiB
          width: 64 bits
          clock: 1600MHz (0.6ns)
     *-bank:1
          description: DIMM DDR3 Synchronous 1600 MHz (0.6 ns)
          product: M378B1G73DB0-CK0
          vendor: Samsung
          physical id: 1
          serial: 1552F348
          slot: ChannelA-DIMM1
          size: 8GiB
          width: 64 bits
          clock: 1600MHz (0.6ns)
     *-bank:2
          description: DIMM DDR3 Synchronous 1600 MHz (0.6 ns)
          vendor: Samsung
          physical id: 2
          serial: 23E50AE5
          slot: ChannelB-DIMM0
          size: 8GiB
          width: 64 bits
          clock: 1600MHz (0.6ns)
     *-bank:3
          description: DIMM DDR3 Synchronous 1600 MHz (0.6 ns)
          product: M378B1G73QH0-CK0
          vendor: Samsung
          physical id: 3
          serial: 19344C99
          slot: ChannelB-DIMM1
          size: 8GiB
          width: 64 bits
          clock: 1600MHz (0.6ns)

PCI 信息

$ lspci
00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05)
00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05)
00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d5)
00:1c.2 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #3 (rev d5)
00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation B85 Express LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 05)
00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 05)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)

掉期信息

$ sysctl vm.swappiness
vm.swappiness = 60

内存信息

$ free -m
              total        used        free      shared  buff/cache   available
Mem:          32017        9504       18313         566        4199       21503
Swap:          2047           0        2047

相关内容