首先,我通过安装 Ubuntu Desktop 来设置我的机器,然后运行以下命令,稍后将其变成“服务器”。
sudo apt install ubuntu-server
reboot
sudo systemctl set-default multi-user.target
reboot
sudo apt purge ubuntu-desktop -y && sudo apt autoremove -y && sudo apt autoclean
reboot
一切似乎都运行良好,除了晚上睡觉时持续“崩溃”。系统没有响应,我无法通过 ssh 进入系统,Docker 中运行的服务不再运行(无法访问网站,cron-jobs 不运行等)。
Grafana 图表描述了持续崩溃的情况:
我不太擅长调试 Linux 系统问题,但我收集了一些我认为重要的日志。
消息
[ 1.354981] Call Trace:
[ 1.354983] <IRQ>
[ 1.354985] show_stack+0x52/0x5c
[ 1.354989] dump_stack_lvl+0x4a/0x63
[ 1.354993] dump_stack+0x10/0x16
[ 1.354995] __report_bad_irq+0x3a/0xb3
[ 1.354997] note_interrupt.cold+0xb/0x60
[ 1.354999] handle_irq_event+0xa8/0xb0
[ 1.355003] handle_fasteoi_irq+0x7d/0x1d0
[ 1.355005] __common_interrupt+0x52/0xe0
[ 1.355008] common_interrupt+0x89/0xa0
[ 1.355011] </IRQ>
[ 1.355012] <TASK>
[ 1.355012] asm_common_interrupt+0x27/0x40
[ 1.355014] RIP: 0010:lock_page_memcg+0x29/0xb0
[ 1.355018] Code: 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 49 89 fc 53 48 8b 47 08 48 8d 50 ff a8 01 4c 0f 45 e2 66 90 eb 3d 8b 83 40 0e 00 00 <85> c0 7e 4c 4c 8d ab 40 04 00 00 4c 89 ef e8 54 31 a4 00 48 89 c6
[ 1.355019] RSP: 0018:ffffb5b640473b28 EFLAGS: 00000286
[ 1.355022] RAX: 0000000000000000 RBX: ffff910c0007c000 RCX: 0000000000000025
[ 1.355023] RDX: ffffe6d0845bcf47 RSI: 0000000000000000 RDI: ffffe6d0845bcf80
[ 1.355024] RBP: ffffb5b640473b40 R08: 00007fcafdd97000 R09: 0000000000000000
[ 1.355025] R10: 00007fcafde00000 R11: ffff910c03f66000 R12: ffffe6d0845bcf80
[ 1.355026] R13: 0000000116f3e025 R14: ffffb5b640473da0 R15: 00007fcafdd98000
[ 1.355028] page_remove_rmap+0x16/0x100
[ 1.355031] zap_pte_range+0x206/0x8b0
[ 1.355035] zap_pmd_range.isra.0+0x1cf/0x2c0
[ 1.355037] unmap_page_range+0x248/0x3e0
[ 1.355039] unmap_single_vma+0x81/0xf0
[ 1.355041] unmap_vmas+0x77/0xf0
[ 1.355044] exit_mmap+0xa2/0x200
[ 1.355046] mmput+0x63/0x150
[ 1.355049] exit_mm+0x154/0x1d0
[ 1.355051] do_exit+0x1a7/0x3c0
[ 1.355053] do_group_exit+0x3b/0xb0
[ 1.355055] __x64_sys_exit_group+0x18/0x20
[ 1.355057] do_syscall_64+0x5c/0xc0
[ 1.355059] ? irqentry_exit+0x1d/0x30
[ 1.355061] ? common_interrupt+0x55/0xa0
[ 1.355063] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 1.355065] RIP: 0033:0x7fcafdd08ca1
[ 1.355067] Code: Unable to access opcode bytes at RIP 0x7fcafdd08c77.
[ 1.355068] RSP: 002b:00007ffc1f431a98 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 1.355069] RAX: ffffffffffffffda RBX: 00007fcafde33a00 RCX: 00007fcafdd08ca1
[ 1.355070] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001
[ 1.355071] RBP: 0000000000000001 R08: ffffffffffffff80 R09: 0000000000000038
[ 1.355072] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fcafde33a00
[ 1.355073] R13: 0000000000000000 R14: 00007fcafde38ee8 R15: 00007fcafde38f00
[ 1.355075] </TASK>
[ 1.355075] handlers:
[ 1.355077] [<00000000c191d8a3>] amd_gpio_irq_handler
[ 1.355082] Disabling IRQ #7
系统日志
...skipping...
Dec 13 12:14:35 kennel NetworkManager[1021]: <info> [1670894075.2920] manager: (veth84282fc): new Veth device (/org/freedesktop/NetworkManager/Devices/35)
Dec 13 12:14:35 kennel dockerd[1218]: time="2022-12-13T12:14:35.296651488+11:00" level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers: [nameserver 8.8.8.8 nameserver 8.8.4.4]"
Dec 13 12:14:35 kennel dockerd[1218]: time="2022-12-13T12:14:35.297141340+11:00" level=info msg="IPv6 enabled; Adding default IPv6 external servers: [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844]"
Dec 13 12:14:35 kennel containerd[1103]: time="2022-12-13T12:14:35.322832442+11:00" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
Dec 13 12:14:35 kennel containerd[1103]: time="2022-12-13T12:14:35.322889591+11:00" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
Dec 13 12:14:35 kennel containerd[1103]: time="2022-12-13T12:14:35.322902686+11:00" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
Dec 13 12:14:35 kennel containerd[1103]: time="2022-12-13T12:14:35.323032974+11:00" level=info msg="starting signal loop" namespace=moby path=/run/containerd/io.containerd.runtime.v2.task/moby/dd357ccc17ffdeeb07bc36e8035b56804e650d220e1c19e7242a778bc4f24bd5 pid=5897 runtime=io.containerd.runc.v2
Dec 13 12:14:35 kennel systemd[1]: Started libcontainer container dd357ccc17ffdeeb07bc36e8035b56804e650d220e1c19e7242a778bc4f24bd5.
Dec 13 12:14:35 kennel kernel: [ 144.255249] eth0: renamed from veth8e1c512
Dec 13 12:14:35 kennel NetworkManager[1021]: <info> [1670894075.5298] device (veth84282fc): carrier: link connected
Dec 13 12:14:35 kennel kernel: [ 144.267360] IPv6: ADDRCONF(NETDEV_CHANGE): veth84282fc: link becomes ready
Dec 13 12:14:35 kennel kernel: [ 144.267401] br-1e2ee8958cad: port 2(veth84282fc) entered blocking state
Dec 13 12:14:35 kennel kernel: [ 144.267404] br-1e2ee8958cad: port 2(veth84282fc) entered forwarding state
Dec 13 12:14:37 kennel avahi-daemon[1019]: Joining mDNS multicast group on interface veth84282fc.IPv6 with address fe80::8838:71ff:feab:bb3f.
Dec 13 12:14:37 kennel avahi-daemon[1019]: New relevant interface veth84282fc.IPv6 for mDNS.
Dec 13 12:14:37 kennel avahi-daemon[1019]: Registering new address record for fe80::8838:71ff:feab:bb3f on veth84282fc.*.
Dec 13 12:15:01 kennel CRON[6080]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 13 12:16:25 kennel systemd[1]: Started Run anacron jobs.
Dec 13 12:16:25 kennel anacron[6413]: Anacron 2.3 started on 2022-12-13
Dec 13 12:16:25 kennel anacron[6413]: Normal exit (0 jobs run)
Dec 13 12:16:25 kennel systemd[1]: anacron.service: Deactivated successfully.
Dec 13 12:17:01 kennel CRON[6595]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 13 12:17:23 kennel systemd[1]: Starting Download data for packages that failed at package install time...
Dec 13 12:17:24 kennel systemd[1]: update-notifier-download.service: Deactivated successfully.
Dec 13 12:17:24 kennel systemd[1]: Finished Download data for packages that failed at package install time.
Dec 13 12:22:35 kennel dockerd[1218]: time="2022-12-13T12:22:35.202390159+11:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 13 12:22:35 kennel dockerd[1218]: time="2022-12-13T12:22:35.202397052+11:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 13 12:22:35 kennel dockerd[1218]: time="2022-12-13T12:22:35.203625352+11:00" level=error msg="Error running exec 06d9b0a452155eecb976ae7be2014c00d2ab9e00f48b7109ef9a6b7ad8a9075e in container: OCI runtime exec failed: exec failed: unable to start container process: exec: \"sh\": executable file not found in $PATH: unknown"
Dec 13 12:25:01 kennel CRON[8769]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 13 12:27:21 kennel systemd[1]: Starting Cleanup of Temporary Directories...
Dec 13 12:27:21 kennel systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
Dec 13 12:27:21 kennel systemd[1]: Finished Cleanup of Temporary Directories.
Dec 13 12:30:01 kennel CRON[10070]: (root) CMD ([ -x /etc/init.d/anacron ] && if [ ! -d /run/systemd/system ]; then /usr/sbin/invoke-rc.d anacron start >/dev/null; fi)
Dec 13 12:30:21 kennel systemd[1]: Started Run anacron jobs.
Dec 13 12:30:21 kennel anacron[10132]: Anacron 2.3 started on 2022-12-13
Dec 13 12:30:21 kennel anacron[10132]: Normal exit (0 jobs run)
Dec 13 12:30:21 kennel systemd[1]: anacron.service: Deactivated successfully.
Dec 13 12:32:35 kennel dockerd[1218]: time="2022-12-13T12:32:35.205264507+11:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 13 12:32:35 kennel dockerd[1218]: time="2022-12-13T12:32:35.205275098+11:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 13 12:32:35 kennel dockerd[1218]: time="2022-12-13T12:32:35.206744341+11:00" level=error msg="Error running exec 77e3bdcb14c9c5bef4c9e3996c60b9fc8e5e2adc078546aa2d94a2da55cb986b in container: OCI runtime exec failed: exec failed: unable to start container process: exec: \"sh\": executable file not found in $PATH: unknown"
Dec 13 12:35:01 kennel CRON[11439]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 13 12:42:35 kennel dockerd[1218]: time="2022-12-13T12:42:35.247134363+11:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 13 12:42:35 kennel dockerd[1218]: time="2022-12-13T12:42:35.247135706+11:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 13 12:42:35 kennel dockerd[1218]: time="2022-12-13T12:42:35.249104404+11:00" level=error msg="Error running exec 172680896cc232d28ec1218cca3d2cfa0c8e1b301546f3cc47cb70a20678201d in container: OCI runtime exec failed: exec failed: unable to start container process: exec: \"sh\": executable file not found in $PATH: unknown"
Dec 13 12:45:01 kennel CRON[14119]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 13 12:52:35 kennel dockerd[1218]: time="2022-12-13T12:52:35.192001020+11:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 13 12:52:35 kennel dockerd[1218]: time="2022-12-13T12:52:35.192018964+11:00" level=error msg="stream copy error: reading from a closed fifo"
Dec 13 12:52:35 kennel dockerd[1218]: time="2022-12-13T12:52:35.193291795+11:00" level=error msg="Error running exec 40be18819fa72f7e2288b626fc16b876fd292da9e31243417990524a2b3e4aa0 in container: OCI runtime exec failed: exec failed: unable to start container process: exec: \"sh\": executable file not found in $PATH: unknown"
Dec 13 12:55:01 kennel CRON[16742]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 13 12:57:46 kennel dbus-daemon[5223]: [session uid=1000 pid=5223] Activating via systemd: service name='org.freedesktop.Tracker3.Miner.Extract' unit='tracker-extract-3.service' requested by ':1.7' (uid=1000 pid=5250 comm="/usr/libexec/tracker-miner-fs-3 " label="unconfined")
Dec 13 12:57:46 kennel systemd[5198]: Starting Tracker metadata extractor...
Dec 13 12:57:46 kennel dbus-daemon[5223]: [session uid=1000 pid=5223] Successfully activated service 'org.freedesktop.Tracker3.Miner.Extract'
Dec 13 12:57:46 kennel systemd[5198]: Started Tracker metadata extractor.
Dec 13 12:57:46 kennel dbus-daemon[5223]: [session uid=1000 pid=5223] Activating via systemd: service name='org.gtk.vfs.Metadata' unit='gvfs-metadata.service' requested by ':1.15' (uid=1000 pid=17693 comm="/usr/libexec/tracker-extract-3 " label="unconfined")
Dec 13 12:57:46 kennel systemd[5198]: Starting Virtual filesystem metadata service...
Dec 13 12:57:46 kennel dbus-daemon[5223]: [session uid=1000 pid=5223] Successfully activated service 'org.gtk.vfs.Metadata'
Dec 13 12:57:46 kennel systemd[5198]: Started Virtual filesystem metadata service.