我有一台无头式 Ubuntu 服务器,每隔 1-7 天就会崩溃一次。它在 5 月 1 日崩溃,然后在 5 月 3 日上午 8:30 左右再次崩溃。我已扫描日志以查找信息,但没有任何结果。以下是 /var/log/syslog 的相关代码片段:
May 3 07:12:13 marvin snapd[879]: autorefresh.go:397: auto-refresh: all snaps are up-to-date
May 3 07:17:01 marvin CRON[23226]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
May 3 07:30:01 marvin CRON[28582]: (root) CMD ([ -x /etc/init.d/anacron ] && if [ ! -d /run/systemd/system ]; then /usr/sbin/invoke-rc.d anacron start >/dev/null; fi)
May 3 08:04:23 marvin systemd[1]: Started Run anacron jobs.
May 3 08:04:23 marvin anacron[10633]: Anacron 2.3 started on 2020-05-03
May 3 08:04:23 marvin anacron[10633]: Normal exit (0 jobs run)
May 3 08:17:01 marvin CRON[15912]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
May 3 14:23:35 marvin systemd-modules-load[290]: Inserted module 'lp'
May 3 14:23:35 marvin systemd-modules-load[290]: Inserted module 'ppdev'
May 3 14:23:35 marvin systemd-modules-load[290]: Inserted module 'parport_pc'
May 3 14:23:35 marvin systemd[1]: Started Uncomplicated firewall.
May 3 14:23:35 marvin systemd[1]: Started Load Kernel Modules.
14:23 处的日志行是我回到家后设法重启服务器时出现的。当服务器“崩溃”时,电源灯仍然亮着,但它不响应 ping,连接显示器时屏幕上什么也没有显示。
该服务器仅用作 Plex 媒体服务器,从使用 NFS 安装的 NAS 流式传输视频。Plex 在 Docker 容器中运行,我还有一些其他小型容器在运行,例如 OpenVPN。我正在运行 Ubuntu 18.04.4。我不知道这是否有帮助,但这是我的硬件的转储:
max@marvin:~$ sudo lshw -short
H/W path Device Class Description
=========================================================
system To Be Filled By O.E.M. (To Be Filled By O.E.M.)
/0 bus H110M-STX
/0/0 memory 64KiB BIOS
/0/8 memory 128KiB L1 cache
/0/9 memory 128KiB L1 cache
/0/a memory 1MiB L2 cache
/0/b memory 8MiB L3 cache
/0/c processor Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
/0/d memory 16GiB System Memory
/0/d/0 memory 8GiB SODIMM DDR4 Synchronous 2133 MHz (0.5 ns)
/0/d/1 memory 8GiB SODIMM DDR4 Synchronous 2133 MHz (0.5 ns)
/0/100 bridge Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers
/0/100/1 bridge Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16)
/0/100/1/0 storage NVMe SSD Controller SM961/PM961
/0/100/2 display HD Graphics 530
/0/100/14 bus 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller
/0/100/14/0 usb1 bus xHCI Host Controller
/0/100/14/1 usb2 bus xHCI Host Controller
/0/100/14.2 generic 100 Series/C230 Series Chipset Family Thermal Subsystem
/0/100/16 communication 100 Series/C230 Series Chipset Family MEI Controller #1
/0/100/17 storage Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode]
/0/100/1c bridge 100 Series/C230 Series Chipset Family PCI Express Root Port #5
/0/100/1f bridge H110 Chipset LPC/eSPI Controller
/0/100/1f.2 memory Memory controller
/0/100/1f.3 multimedia 100 Series/C230 Series Chipset Family HD Audio Controller
/0/100/1f.4 bus 100 Series/C230 Series Chipset Family SMBus
/0/100/1f.6 enp0s31f6 network Ethernet Connection (2) I219-V
/0/1 scsi1 storage
/0/1/0.0.0 /dev/sda disk 500GB Samsung SSD 860
/0/1/0.0.0/1 /dev/sda1 volume 465GiB EXT4 volume
我有点绞尽脑汁想弄清楚,因此如果能得到任何帮助我将非常感激。
编辑 1:添加 的输出ls -la /var/crash
。那里什么也没有。
max@marvin:~$ ls -la /var/crash
total 8
drwxrwsrwt 2 root whoopsie 4096 Oct 14 2019 .
drwxr-xr-x 15 root root 4096 May 3 2018 ..
编辑 2:附加信息。我注意到,有时sensors
报告的值相隔几秒钟,差别非常大。下面的两个输出是连续运行的。
max@marvin:~$ sudo dmidecode -s bios-version
P1.10
max@marvin:~$ sensors
pch_skylake-virtual-0
Adapter: Virtual device
temp1: +53.5°C
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +58.0°C (high = +80.0°C, crit = +100.0°C)
Core 0: +44.0°C (high = +80.0°C, crit = +100.0°C)
Core 1: +47.0°C (high = +80.0°C, crit = +100.0°C)
Core 2: +58.0°C (high = +80.0°C, crit = +100.0°C)
Core 3: +43.0°C (high = +80.0°C, crit = +100.0°C)
max@marvin:~$ sensors
pch_skylake-virtual-0
Adapter: Virtual device
temp1: +53.5°C
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +44.0°C (high = +80.0°C, crit = +100.0°C)
Core 0: +40.0°C (high = +80.0°C, crit = +100.0°C)
Core 1: +42.0°C (high = +80.0°C, crit = +100.0°C)
Core 2: +38.0°C (high = +80.0°C, crit = +100.0°C)
Core 3: +39.0°C (high = +80.0°C, crit = +100.0°C)
该计算机是 ASRock DeskMini 110,配备 i7-6700K、Corsair 的 16GB DDR4 2400MHz SODIMM 内存、Noctua NH-L9I(旋转良好)、250GB 三星 960 EVO NVME 驱动器和 500GB 三星 SATA SSD(我认为是 860 EVO,我记不太清楚)。
以下是 的输出top
。如果您想要实际的屏幕截图,请告诉我。
top - 10:58:47 up 20:35, 2 users, load average: 0.61, 0.37, 0.33
Tasks: 345 total, 1 running, 270 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.2 us, 0.8 sy, 0.0 ni, 96.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 16120772 total, 942036 free, 4121516 used, 11057220 buff/cache
KiB Swap: 2097148 total, 371812 free, 1725336 used. 11667140 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7137 max 20 0 2872064 747632 3372 S 11.2 4.6 109:14.82 java
15027 root 20 0 4504 772 704 S 4.9 0.0 0:00.15 sh
8826 911 20 0 508872 43296 4804 S 2.0 0.3 15:14.61 deluged
6401 max 20 0 4612692 998252 18232 S 1.0 6.2 24:04.40 Plex Media Serv
1184 root 20 0 3327792 26348 15320 S 0.7 0.2 6:23.03 containerd
1321 root 20 0 4151696 49524 12680 S 0.7 0.3 6:49.66 dockerd
6438 max 35 15 1862372 204256 5680 S 0.7 1.3 2:39.45 Plex Script Hos
9831 911 20 0 147104 101632 6036 S 0.7 0.6 59:57.74 python3
22107 max 20 0 77320 6224 5324 S 0.7 0.0 3:07.15 systemd
1 root 20 0 226000 8380 6008 S 0.3 0.1 4:22.96 systemd
847 root 20 0 70704 5840 5020 S 0.3 0.0 0:52.57 systemd-logind
873 message+ 20 0 50848 4732 3336 S 0.3 0.0 2:54.19 dbus-daemon
4441 root 20 0 11828 3384 2348 S 0.3 0.0 0:58.45 containerd-shim
6894 911 20 0 603768 18392 4824 S 0.3 0.1 4:25.21 deluged
8202 911 20 0 168528 102360 3364 S 0.3 0.6 4:43.52 python
10469 911 20 0 4918084 305300 6312 S 0.3 1.9 182:10.26 sabnzbdplus
2 root 20 0 0 0 0 S 0.0 0.0 0:00.07 kthreadd
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq
7 root 20 0 0 0 0 S 0.0 0.0 0:01.52 ksoftirqd/0
8 root 20 0 0 0 0 I 0.0 0.0 0:39.70 rcu_sched
9 root 20 0 0 0 0 I 0.0 0.0 0:00.00 rcu_bh
10 root rt 0 0 0 0 S 0.0 0.0 0:00.11 migration/0
11 root rt 0 0 0 0 S 0.0 0.0 0:00.11 watchdog/0
12 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/0
13 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/1
14 root rt 0 0 0 0 S 0.0 0.0 0:00.10 watchdog/1
15 root rt 0 0 0 0 S 0.0 0.0 0:00.14 migration/1
16 root 20 0 0 0 0 S 0.0 0.0 0:01.34 ksoftirqd/1
18 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/1:0H
19 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/2
20 root rt 0 0 0 0 S 0.0 0.0 0:00.12 watchdog/2
21 root rt 0 0 0 0 S 0.0 0.0 0:00.12 migration/2
22 root 20 0 0 0 0 S 0.0 0.0 1:47.23 ksoftirqd/2
24 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/2:0H
25 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/3
26 root rt 0 0 0 0 S 0.0 0.0 0:00.12 watchdog/3
27 root rt 0 0 0 0 S 0.0 0.0 0:00.14 migration/3
28 root 20 0 0 0 0 S 0.0 0.0 0:01.39 ksoftirqd/3
30 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/3:0H
31 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/4
32 root rt 0 0 0 0 S 0.0 0.0 0:00.11 watchdog/4
33 root rt 0 0 0 0 S 0.0 0.0 0:00.15 migration/4
34 root 20 0 0 0 0 S 0.0 0.0 0:01.28 ksoftirqd/4
36 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/4:0H
37 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/5
38 root rt 0 0 0 0 S 0.0 0.0 0:00.11 watchdog/5
39 root rt 0 0 0 0 S 0.0 0.0 0:00.14 migration/5
40 root 20 0 0 0 0 S 0.0 0.0 0:01.19 ksoftirqd/5
42 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/5:0H
43 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/6
44 root rt 0 0 0 0 S 0.0 0.0 0:00.12 watchdog/6
45 root rt 0 0 0 0 S 0.0 0.0 0:00.11 migration/6
46 root 20 0 0 0 0 S 0.0 0.0 0:01.89 ksoftirqd/6
48 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/6:0H
49 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/7
50 root rt 0 0 0 0 S 0.0 0.0 0:00.12 watchdog/7
51 root rt 0 0 0 0 S 0.0 0.0 0:00.14 migration/7
52 root 20 0 0 0 0 S 0.0 0.0 0:01.23 ksoftirqd/7
54 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/7:0H
55 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdevtmpfs
56 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 netns
57 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_tasks_kthre
java
顶部的过程是一个 minecraft 服务器;无论它是否运行,机器都已经挂起了。
编辑3:
根据要求提供更多信息。
max@marvin:/mnt/ssd/syrupy$ cat /etc/network/interfaces
# interfaces(5) file used by ifup(8) and ifdown(8)
auto lo
iface lo inet loopback
max@marvin:/mnt/ssd/syrupy$ cat /etc/netplan/*.yaml
# Let NetworkManager manage all devices on this system
network:
version: 2
renderer: NetworkManager
编辑4:
max@marvin:/etc$ ps auxc | grep -i therm
root 128 0.0 0.0 0 0 ? I< 10:19 0:00 acpi_thermal_pm
root 858 0.0 0.0 187000 9336 ? Ssl 10:19 0:02 thermald
提前致谢!
答案1
BIOS
笔记:在执行 BIOS 更新之前做好备份
您的 BIOS 是 P1.10。如果我没有看错 ASRock 网站,版本 8.10 是最新版本。请检查这里. 确保这是适合您的确切型号的 BIOS 更新的正确位置。
交换
KiB Swap: 2097148 total, 371812 free, 1725336 used.
您的交换使用率很高,而且 /swapfile 只有 2G,我们可能需要增加它。您的某个应用程序也有可能使用了如此高的交换。
如果grep -i swap /etc/fstab
显示这个...
/swapfile none swap sw 0 0
那么您使用的是 /swapfile,而不是交换分区。
我们把它从2G扩大到4G吧……
笔记:命令使用不当dd
可能导致数据丢失。建议复制/粘贴。
sudo swapoff -a # turn off swap
sudo rm -i /swapfile # remove old /swapfile
sudo dd if=/dev/zero of=/swapfile bs=1M count=4096
sudo chmod 600 /swapfile # set proper file protections
sudo mkswap /swapfile # init /swapfile
sudo swapon /swapfile # turn on swap
free -h # confirm 16G RAM and 4G swap
reboot # reboot and verify operation
三星固态硬盘
如果你使用的是 Windows,请访问这里,并下载他们的三星魔术师,并检查你的SSD上的固件。
更新#1:
NCQ 错误
grep -i FPDMA /var/log/syslog*
看看是否还有更多这样的...
May 7 12:29:22 marvin kernel: [ 70.409155] ata2.00: exception Emask 0x10 SAct 0x1 SErr 0x400100 action 0x6 frozen
May 7 12:29:22 marvin kernel: [ 70.409210] ata2.00: irq_stat 0x08000000, interface fatal error
May 7 12:29:22 marvin kernel: [ 70.409246] ata2: SError: { UnrecovData Handshk }
May 7 12:29:22 marvin kernel: [ 70.409276] ata2.00: failed command: WRITE FPDMA QUEUED
May 7 12:29:22 marvin kernel: [ 70.409311] ata2.00: cmd 61/40:00:68:08:04/05:00:1d:00:00/40 tag 0 ncq dma 688128 out
May 7 12:29:22 marvin kernel: [ 70.409311] res 40/00:00:68:08:04/00:00:1d:00:00/40 Emask 0x10 (ATA bus error)
May 7 12:29:22 marvin kernel: [ 70.409402] ata2.00: status: { DRDY }
May 7 12:29:22 marvin kernel: [ 70.409426] ata2: hard resetting link
May 7 12:29:22 marvin kernel: [ 70.723340] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May 7 12:29:22 marvin kernel: [ 70.723562] ata2.00: supports DRM functions and may not be fully accessible
May 7 12:29:22 marvin kernel: [ 70.725647] ata2.00: supports DRM functions and may not be fully accessible
May 7 12:29:22 marvin kernel: [ 70.727418] ata2.00: configured for UDMA/133
May 7 12:29:22 marvin kernel: [ 70.727430] ata2: EH complete
May 7 12:29:22 marvin kernel: [ 70.727498] ata2.00: Enabling discard_zeroes_data
本机命令队列 (NCQ) 是串行 ATA 协议的扩展,允许硬盘驱动器内部优化接收的读写命令的执行顺序。
编辑sudo -H gedit /etc/default/grub
并更改以下行以包含此额外参数。然后执行sudo update-grub
将更改写入磁盘。重新启动。监视器挂起,并观察/var/log/syslog
或dmesg
是否继续出现错误消息。
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash libata.force=noncq"
postconf/postfix
grep -i postfix /var/log/syslog*
查看硬后缀错误...
May 7 12:28:21 marvin ifup[787]: postconf: fatal: open /etc/postfix/main.cf: No such file or directory
韦特
syslog 中有很多 veth* 流量。我不知道这是否正常。我不熟悉 veth 设备,但我相信它与 docker 容器有关。
更新 #2:
在查看今天的系统日志时,我注意到了以下几点……
May 12 01:25:50 marvin kernel: [387411.971440] CPU5: Core temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.971441] CPU1: Core temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.971442] CPU4: Package temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.971443] CPU6: Package temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.971445] CPU3: Package temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.971445] CPU7: Package temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.971446] CPU0: Package temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.971447] CPU2: Package temperature above threshold, cpu clock throttled (total events = 51750)
May 12 01:25:50 marvin kernel: [387411.971447] CPU1: Package temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.971448] CPU5: Package temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.973408] CPU5: Core temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973409] CPU1: Core temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973409] CPU3: Package temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973410] CPU7: Package temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973411] CPU4: Package temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973411] CPU0: Package temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973412] CPU5: Package temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973413] CPU1: Package temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973450] CPU6: Package temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973451] CPU2: Package temperature/speed normal
检查你的粉丝。
并在重启之前...
May 12 10:19:59 marvin blkmapd[310]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
并在重启期间...
May 12 10:19:59 marvin kernel: [ 2.720842] systemd[1]: /etc/systemd/system/docker.service.d/override.conf:2: Unknown lvalue 'After' in section 'Service'
检查 override.conf 中的语法错误。
重启后...
May 12 10:19:59 marvin thermald[858]: sysfs read failed constraint_0_max_power_uw
进来terminal
,给我看看ps auxc | grep -i therm
。
May 12 10:20:00 marvin nfsdcltrack[1019]: Failed to init database: -13
May 12 10:20:00 marvin systemd[1]: Started OpenBSD Secure Shell server.
May 12 10:20:00 marvin kernel: [ 4.139130] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
May 12 10:20:00 marvin kernel: [ 4.139744] NFSD: starting 90-second grace period (net f00000a9)
May 12 10:20:00 marvin dbus-daemon[883]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
磁盘错误...检查 ata2.00 上的 sata 电缆...Samsung Magician
从以下位置下载这里并检查您的三星 SSD 的固件...
May 12 10:21:01 marvin kernel: [ 64.800038] ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
May 12 10:21:01 marvin kernel: [ 64.800044] ata2.00: irq_stat 0x08000000, interface fatal error
May 12 10:21:01 marvin kernel: [ 64.800047] ata2: SError: { UnrecovData Handshk }
May 12 10:21:01 marvin kernel: [ 64.800050] ata2.00: failed command: WRITE DMA EXT
May 12 10:21:01 marvin kernel: [ 64.800054] ata2.00: cmd 35/00:70:d0:55:61/00:01:0c:00:00/e0 tag 16 dma 188416 out
May 12 10:21:01 marvin kernel: [ 64.800054] res 50/00:00:cf:55:61/00:00:0c:00:00/e0 Emask 0x10 (ATA bus error)
May 12 10:21:01 marvin kernel: [ 64.800059] ata2.00: status: { DRDY }
May 12 10:21:01 marvin kernel: [ 64.800062] ata2: hard resetting link
May 12 10:21:01 marvin kernel: [ 65.024521] eth0: renamed from veth43e132c
May 12 10:21:01 marvin NetworkManager[894]: <info> [1589304061.8663] devices removed (path: /sys/devices/virtual/net/veth43e132c, iface: veth43e132c)
May 12 10:21:01 marvin NetworkManager[894]: <info> [1589304061.8667] device (vethfcd015c): carrier: link connected
May 12 10:21:01 marvin kernel: [ 65.048579] IPv6: ADDRCONF(NETDEV_CHANGE): vethfcd015c: link becomes ready
May 12 10:21:01 marvin kernel: [ 65.048596] br-7eb901c13937: port 4(vethfcd015c) entered blocking state
May 12 10:21:01 marvin kernel: [ 65.048597] br-7eb901c13937: port 4(vethfcd015c) entered forwarding state
May 12 10:21:01 marvin kernel: [ 65.117033] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May 12 10:21:01 marvin kernel: [ 65.119121] ata2.00: supports DRM functions and may not be fully accessible
May 12 10:21:01 marvin kernel: [ 65.120914] ata2.00: supports DRM functions and may not be fully accessible
May 12 10:21:01 marvin kernel: [ 65.122321] ata2.00: configured for UDMA/133
May 12 10:21:01 marvin kernel: [ 65.122340] ata2: EH complete
May 12 10:21:01 marvin kernel: [ 65.135296] ata2.00: Enabling discard_zeroes_data
May 12 10:21:42 marvin kernel: [ 105.460013] ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
May 12 10:21:42 marvin kernel: [ 105.460018] ata2.00: irq_stat 0x08000000, interface fatal error
May 12 10:21:42 marvin kernel: [ 105.460021] ata2: SError: { UnrecovData Handshk }
May 12 10:21:42 marvin kernel: [ 105.460024] ata2.00: failed command: WRITE DMA EXT
May 12 10:21:42 marvin kernel: [ 105.460028] ata2.00: cmd 35/00:00:38:36:31/00:0a:28:00:00/e0 tag 19 dma 1310720 out
May 12 10:21:42 marvin kernel: [ 105.460028] res 50/00:00:e7:15:d9/00:00:26:00:00/e0 Emask 0x10 (ATA bus error)
May 12 10:21:42 marvin kernel: [ 105.460032] ata2.00: status: { DRDY }
May 12 10:21:42 marvin kernel: [ 105.460036] ata2: hard resetting link
May 12 10:21:42 marvin kernel: [ 105.774643] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May 12 10:21:42 marvin kernel: [ 105.776825] ata2.00: supports DRM functions and may not be fully accessible
May 12 10:21:42 marvin kernel: [ 105.778751] ata2.00: supports DRM functions and may not be fully accessible
May 12 10:21:42 marvin kernel: [ 105.780215] ata2.00: configured for UDMA/133
May 12 10:21:42 marvin kernel: [ 105.780225] ata2: EH complete
May 12 10:21:42 marvin kernel: [ 105.791444] ata2.00: Enabling discard_zeroes_data
May 12 10:21:50 marvin kernel: [ 113.896069] ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
May 12 10:21:50 marvin kernel: [ 113.896083] ata2.00: irq_stat 0x08000000, interface fatal error
May 12 10:21:50 marvin kernel: [ 113.896093] ata2: SError: { UnrecovData Handshk }
May 12 10:21:50 marvin kernel: [ 113.896102] ata2.00: failed command: WRITE DMA EXT
May 12 10:21:50 marvin kernel: [ 113.896115] ata2.00: cmd 35/00:00:00:f2:36/00:06:28:00:00/e0 tag 14 dma 786432 out
May 12 10:21:50 marvin kernel: [ 113.896115] res 50/00:00:ff:f1:36/00:00:28:00:00/e0 Emask 0x10 (ATA bus error)
May 12 10:21:50 marvin kernel: [ 113.896128] ata2.00: status: { DRDY }
May 12 10:21:50 marvin kernel: [ 113.896137] ata2: hard resetting link
May 12 10:21:51 marvin kernel: [ 114.210830] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May 12 10:21:51 marvin kernel: [ 114.213040] ata2.00: supports DRM functions and may not be fully accessible
May 12 10:21:51 marvin kernel: [ 114.215030] ata2.00: supports DRM functions and may not be fully accessible
May 12 10:21:51 marvin kernel: [ 114.216557] ata2.00: configured for UDMA/133
May 12 10:21:51 marvin kernel: [ 114.216575] ata2: EH complete
May 12 10:21:51 marvin kernel: [ 114.222011] ata2.00: Enabling discard_zeroes_data
May 12 10:21:51 marvin kernel: [ 114.296031] ata2: limiting SATA link speed to 3.0 Gbps
May 12 10:21:51 marvin kernel: [ 114.296034] ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
May 12 10:21:51 marvin kernel: [ 114.296039] ata2.00: irq_stat 0x08000000, interface fatal error
May 12 10:21:51 marvin kernel: [ 114.296042] ata2: SError: { UnrecovData Handshk }
May 12 10:21:51 marvin kernel: [ 114.296045] ata2.00: failed command: WRITE DMA EXT
May 12 10:21:51 marvin kernel: [ 114.296049] ata2.00: cmd 35/00:88:48:68:d8/00:09:26:00:00/e0 tag 23 dma 1249280 out
May 12 10:21:51 marvin kernel: [ 114.296049] res 50/00:00:47:68:d8/00:00:26:00:00/e0 Emask 0x10 (ATA bus error)
May 12 10:21:51 marvin kernel: [ 114.296054] ata2.00: status: { DRDY }
May 12 10:21:51 marvin kernel: [ 114.296057] ata2: hard resetting link
May 12 10:21:51 marvin kernel: [ 114.611184] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
May 12 10:21:51 marvin kernel: [ 114.613520] ata2.00: supports DRM functions and may not be fully accessible
May 12 10:21:51 marvin kernel: [ 114.615475] ata2.00: supports DRM functions and may not be fully accessible
May 12 10:21:51 marvin kernel: [ 114.617036] ata2.00: configured for UDMA/133
May 12 10:21:51 marvin kernel: [ 114.617057] ata2: EH complete
May 12 10:21:51 marvin kernel: [ 114.633135] ata2.00: Enabling discard_zeroes_data
更新 #3
SATA 数据/电源线已更换,时间将告诉我们这是否是最终的解决方案。