每 1-7 天发生一次无头崩溃

每 1-7 天发生一次无头崩溃

我有一台无头式 Ubuntu 服务器,每隔 1-7 天就会崩溃一次。它在 5 月 1 日崩溃,然后在 5 月 3 日上午 8:30 左右再次崩溃。我已扫描日志以查找信息,但没有任何结果。以下是 /var/log/syslog 的相关代码片段:

May  3 07:12:13 marvin snapd[879]: autorefresh.go:397: auto-refresh: all snaps are up-to-date
May  3 07:17:01 marvin CRON[23226]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May  3 07:30:01 marvin CRON[28582]: (root) CMD ([ -x /etc/init.d/anacron ] && if [ ! -d /run/systemd/system ]; then /usr/sbin/invoke-rc.d anacron start >/dev/null; fi)
May  3 08:04:23 marvin systemd[1]: Started Run anacron jobs.
May  3 08:04:23 marvin anacron[10633]: Anacron 2.3 started on 2020-05-03
May  3 08:04:23 marvin anacron[10633]: Normal exit (0 jobs run)
May  3 08:17:01 marvin CRON[15912]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May  3 14:23:35 marvin systemd-modules-load[290]: Inserted module 'lp'
May  3 14:23:35 marvin systemd-modules-load[290]: Inserted module 'ppdev'
May  3 14:23:35 marvin systemd-modules-load[290]: Inserted module 'parport_pc'
May  3 14:23:35 marvin systemd[1]: Started Uncomplicated firewall.
May  3 14:23:35 marvin systemd[1]: Started Load Kernel Modules.

14:23 处的日志行是我回到家后设法重启服务器时出现的。当服务器“崩溃”时,电源灯仍然亮着,但它不响应 ping,连接显示器时屏幕上什么也没有显示。

该服务器仅用作 Plex 媒体服务器,从使用 NFS 安装的 NAS 流式传输视频。Plex 在 Docker 容器中运行,我还有一些其他小型容器在运行,例如 OpenVPN。我正在运行 Ubuntu 18.04.4。我不知道这是否有帮助,但这是我的硬件的转储:

max@marvin:~$ sudo lshw -short
H/W path      Device           Class          Description
=========================================================
                               system         To Be Filled By O.E.M. (To Be Filled By O.E.M.)
/0                             bus            H110M-STX
/0/0                           memory         64KiB BIOS
/0/8                           memory         128KiB L1 cache
/0/9                           memory         128KiB L1 cache
/0/a                           memory         1MiB L2 cache
/0/b                           memory         8MiB L3 cache
/0/c                           processor      Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
/0/d                           memory         16GiB System Memory
/0/d/0                         memory         8GiB SODIMM DDR4 Synchronous 2133 MHz (0.5 ns)
/0/d/1                         memory         8GiB SODIMM DDR4 Synchronous 2133 MHz (0.5 ns)
/0/100                         bridge         Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers
/0/100/1                       bridge         Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16)
/0/100/1/0                     storage        NVMe SSD Controller SM961/PM961
/0/100/2                       display        HD Graphics 530
/0/100/14                      bus            100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller
/0/100/14/0   usb1             bus            xHCI Host Controller
/0/100/14/1   usb2             bus            xHCI Host Controller
/0/100/14.2                    generic        100 Series/C230 Series Chipset Family Thermal Subsystem
/0/100/16                      communication  100 Series/C230 Series Chipset Family MEI Controller #1
/0/100/17                      storage        Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode]
/0/100/1c                      bridge         100 Series/C230 Series Chipset Family PCI Express Root Port #5
/0/100/1f                      bridge         H110 Chipset LPC/eSPI Controller
/0/100/1f.2                    memory         Memory controller
/0/100/1f.3                    multimedia     100 Series/C230 Series Chipset Family HD Audio Controller
/0/100/1f.4                    bus            100 Series/C230 Series Chipset Family SMBus
/0/100/1f.6   enp0s31f6        network        Ethernet Connection (2) I219-V
/0/1          scsi1            storage
/0/1/0.0.0    /dev/sda         disk           500GB Samsung SSD 860
/0/1/0.0.0/1  /dev/sda1        volume         465GiB EXT4 volume

我有点绞尽脑汁想弄清楚,因此如果能得到任何帮助我将非常感激。

编辑 1:添加 的输出ls -la /var/crash。那里什么也没有。

max@marvin:~$ ls -la /var/crash
total 8
drwxrwsrwt  2 root whoopsie 4096 Oct 14  2019 .
drwxr-xr-x 15 root root     4096 May  3  2018 ..

编辑 2:附加信息。我注意到,有时sensors报告的值相隔几秒钟,差别非常大。下面的两个输出是连续运行的。

max@marvin:~$ sudo dmidecode -s bios-version
P1.10

max@marvin:~$ sensors
pch_skylake-virtual-0
Adapter: Virtual device
temp1:        +53.5°C

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +58.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:        +44.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:        +47.0°C  (high = +80.0°C, crit = +100.0°C)
Core 2:        +58.0°C  (high = +80.0°C, crit = +100.0°C)
Core 3:        +43.0°C  (high = +80.0°C, crit = +100.0°C)

max@marvin:~$ sensors
pch_skylake-virtual-0
Adapter: Virtual device
temp1:        +53.5°C

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +44.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:        +40.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:        +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 2:        +38.0°C  (high = +80.0°C, crit = +100.0°C)
Core 3:        +39.0°C  (high = +80.0°C, crit = +100.0°C)

该计算机是 ASRock DeskMini 110,配备 i7-6700K、Corsair 的 16GB DDR4 2400MHz SODIMM 内存、Noctua NH-L9I(旋转良好)、250GB 三星 960 EVO NVME 驱动器和 500GB 三星 SATA SSD(我认为是 860 EVO,我记不太清楚)。

以下是 的输出top。如果您想要实际的屏幕截图,请告诉我。

top - 10:58:47 up 20:35,  2 users,  load average: 0.61, 0.37, 0.33
Tasks: 345 total,   1 running, 270 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.2 us,  0.8 sy,  0.0 ni, 96.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 16120772 total,   942036 free,  4121516 used, 11057220 buff/cache
KiB Swap:  2097148 total,   371812 free,  1725336 used. 11667140 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 7137 max       20   0 2872064 747632   3372 S  11.2  4.6 109:14.82 java
15027 root      20   0    4504    772    704 S   4.9  0.0   0:00.15 sh
 8826 911       20   0  508872  43296   4804 S   2.0  0.3  15:14.61 deluged
 6401 max       20   0 4612692 998252  18232 S   1.0  6.2  24:04.40 Plex Media Serv
 1184 root      20   0 3327792  26348  15320 S   0.7  0.2   6:23.03 containerd
 1321 root      20   0 4151696  49524  12680 S   0.7  0.3   6:49.66 dockerd
 6438 max       35  15 1862372 204256   5680 S   0.7  1.3   2:39.45 Plex Script Hos
 9831 911       20   0  147104 101632   6036 S   0.7  0.6  59:57.74 python3
22107 max       20   0   77320   6224   5324 S   0.7  0.0   3:07.15 systemd
    1 root      20   0  226000   8380   6008 S   0.3  0.1   4:22.96 systemd
  847 root      20   0   70704   5840   5020 S   0.3  0.0   0:52.57 systemd-logind
  873 message+  20   0   50848   4732   3336 S   0.3  0.0   2:54.19 dbus-daemon
 4441 root      20   0   11828   3384   2348 S   0.3  0.0   0:58.45 containerd-shim
 6894 911       20   0  603768  18392   4824 S   0.3  0.1   4:25.21 deluged
 8202 911       20   0  168528 102360   3364 S   0.3  0.6   4:43.52 python
10469 911       20   0 4918084 305300   6312 S   0.3  1.9 182:10.26 sabnzbdplus
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.07 kthreadd
    4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/0:0H
    6 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 mm_percpu_wq
    7 root      20   0       0      0      0 S   0.0  0.0   0:01.52 ksoftirqd/0
    8 root      20   0       0      0      0 I   0.0  0.0   0:39.70 rcu_sched
    9 root      20   0       0      0      0 I   0.0  0.0   0:00.00 rcu_bh
   10 root      rt   0       0      0      0 S   0.0  0.0   0:00.11 migration/0
   11 root      rt   0       0      0      0 S   0.0  0.0   0:00.11 watchdog/0
   12 root      20   0       0      0      0 S   0.0  0.0   0:00.00 cpuhp/0
   13 root      20   0       0      0      0 S   0.0  0.0   0:00.00 cpuhp/1
   14 root      rt   0       0      0      0 S   0.0  0.0   0:00.10 watchdog/1
   15 root      rt   0       0      0      0 S   0.0  0.0   0:00.14 migration/1
   16 root      20   0       0      0      0 S   0.0  0.0   0:01.34 ksoftirqd/1
   18 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/1:0H
   19 root      20   0       0      0      0 S   0.0  0.0   0:00.00 cpuhp/2
   20 root      rt   0       0      0      0 S   0.0  0.0   0:00.12 watchdog/2
   21 root      rt   0       0      0      0 S   0.0  0.0   0:00.12 migration/2
   22 root      20   0       0      0      0 S   0.0  0.0   1:47.23 ksoftirqd/2
   24 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/2:0H
   25 root      20   0       0      0      0 S   0.0  0.0   0:00.00 cpuhp/3
   26 root      rt   0       0      0      0 S   0.0  0.0   0:00.12 watchdog/3
   27 root      rt   0       0      0      0 S   0.0  0.0   0:00.14 migration/3
   28 root      20   0       0      0      0 S   0.0  0.0   0:01.39 ksoftirqd/3
   30 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/3:0H
   31 root      20   0       0      0      0 S   0.0  0.0   0:00.00 cpuhp/4
   32 root      rt   0       0      0      0 S   0.0  0.0   0:00.11 watchdog/4
   33 root      rt   0       0      0      0 S   0.0  0.0   0:00.15 migration/4
   34 root      20   0       0      0      0 S   0.0  0.0   0:01.28 ksoftirqd/4
   36 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/4:0H
   37 root      20   0       0      0      0 S   0.0  0.0   0:00.00 cpuhp/5
   38 root      rt   0       0      0      0 S   0.0  0.0   0:00.11 watchdog/5
   39 root      rt   0       0      0      0 S   0.0  0.0   0:00.14 migration/5
   40 root      20   0       0      0      0 S   0.0  0.0   0:01.19 ksoftirqd/5
   42 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/5:0H
   43 root      20   0       0      0      0 S   0.0  0.0   0:00.00 cpuhp/6
   44 root      rt   0       0      0      0 S   0.0  0.0   0:00.12 watchdog/6
   45 root      rt   0       0      0      0 S   0.0  0.0   0:00.11 migration/6
   46 root      20   0       0      0      0 S   0.0  0.0   0:01.89 ksoftirqd/6
   48 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/6:0H
   49 root      20   0       0      0      0 S   0.0  0.0   0:00.00 cpuhp/7
   50 root      rt   0       0      0      0 S   0.0  0.0   0:00.12 watchdog/7
   51 root      rt   0       0      0      0 S   0.0  0.0   0:00.14 migration/7
   52 root      20   0       0      0      0 S   0.0  0.0   0:01.23 ksoftirqd/7
   54 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker/7:0H
   55 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kdevtmpfs
   56 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 netns
   57 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_tasks_kthre

java顶部的过程是一个 minecraft 服务器;无论它是否运行,机器都已经挂起了。

编辑3:

根据要求提供更多信息。

max@marvin:/mnt/ssd/syrupy$ cat /etc/network/interfaces
# interfaces(5) file used by ifup(8) and ifdown(8)
auto lo
iface lo inet loopback

max@marvin:/mnt/ssd/syrupy$ cat /etc/netplan/*.yaml
# Let NetworkManager manage all devices on this system
network:
  version: 2
  renderer: NetworkManager

编辑4:

max@marvin:/etc$ ps auxc | grep -i therm
root       128  0.0  0.0      0     0 ?        I<   10:19   0:00 acpi_thermal_pm
root       858  0.0  0.0 187000  9336 ?        Ssl  10:19   0:02 thermald

提前致谢!

答案1

BIOS

笔记:在执行 BIOS 更新之前做好备份

您的 BIOS 是 P1.10。如果我没有看错 ASRock 网站,版本 8.10 是最新版本。请检查这里. 确保这是适合您的确切型号的 BIOS 更新的正确位置。

交换

KiB Swap:  2097148 total,   371812 free,  1725336 used.

您的交换使用率很高,而且 /swapfile 只有 2G,我们可能需要增加它。您的某个应用程序也有可能使用了如此高的交换。

如果grep -i swap /etc/fstab显示这个...

/swapfile    none    swap    sw      0   0

那么您使用的是 /swapfile,而不是交换分区。

我们把它从2G扩大到4G吧……

笔记:命令使用不当dd可能导致数据丢失。建议复制/粘贴。

sudo swapoff -a           # turn off swap
sudo rm -i /swapfile      # remove old /swapfile

sudo dd if=/dev/zero of=/swapfile bs=1M count=4096

sudo chmod 600 /swapfile  # set proper file protections
sudo mkswap /swapfile     # init /swapfile
sudo swapon /swapfile     # turn on swap
free -h                   # confirm 16G RAM and 4G swap
reboot                    # reboot and verify operation

三星固态硬盘

如果你使用的是 Windows,请访问这里,并下载他们的三星魔术师,并检查你的SSD上的固件。

更新#1:

NCQ 错误

grep -i FPDMA /var/log/syslog*看看是否还有更多这样的...

May  7 12:29:22 marvin kernel: [   70.409155] ata2.00: exception Emask 0x10 SAct 0x1 SErr 0x400100 action 0x6 frozen
May  7 12:29:22 marvin kernel: [   70.409210] ata2.00: irq_stat 0x08000000, interface fatal error
May  7 12:29:22 marvin kernel: [   70.409246] ata2: SError: { UnrecovData Handshk }
May  7 12:29:22 marvin kernel: [   70.409276] ata2.00: failed command: WRITE FPDMA QUEUED
May  7 12:29:22 marvin kernel: [   70.409311] ata2.00: cmd 61/40:00:68:08:04/05:00:1d:00:00/40 tag 0 ncq dma 688128 out
May  7 12:29:22 marvin kernel: [   70.409311]          res 40/00:00:68:08:04/00:00:1d:00:00/40 Emask 0x10 (ATA bus error)
May  7 12:29:22 marvin kernel: [   70.409402] ata2.00: status: { DRDY }
May  7 12:29:22 marvin kernel: [   70.409426] ata2: hard resetting link
May  7 12:29:22 marvin kernel: [   70.723340] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May  7 12:29:22 marvin kernel: [   70.723562] ata2.00: supports DRM functions and may not be fully accessible
May  7 12:29:22 marvin kernel: [   70.725647] ata2.00: supports DRM functions and may not be fully accessible
May  7 12:29:22 marvin kernel: [   70.727418] ata2.00: configured for UDMA/133
May  7 12:29:22 marvin kernel: [   70.727430] ata2: EH complete
May  7 12:29:22 marvin kernel: [   70.727498] ata2.00: Enabling discard_zeroes_data

本机命令队列 (NCQ) 是串行 ATA 协议的扩展,允许硬盘驱动器内部优化接收的读写命令的执行顺序。

编辑sudo -H gedit /etc/default/grub并更改以下行以包含此额外参数。然后执行sudo update-grub将更改写入磁盘。重新启动。监视器挂起,并观察/var/log/syslogdmesg是否继续出现错误消息。

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash libata.force=noncq"

postconf/postfix

grep -i postfix /var/log/syslog*查看硬后缀错误...

May  7 12:28:21 marvin ifup[787]: postconf: fatal: open /etc/postfix/main.cf: No such file or directory

韦特

syslog 中有很多 veth* 流量。我不知道这是否正常。我不熟悉 veth 设备,但我相信它与 docker 容器有关。

更新 #2:

在查看今天的系统日志时,我注意到了以下几点……

May 12 01:25:50 marvin kernel: [387411.971440] CPU5: Core temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.971441] CPU1: Core temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.971442] CPU4: Package temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.971443] CPU6: Package temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.971445] CPU3: Package temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.971445] CPU7: Package temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.971446] CPU0: Package temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.971447] CPU2: Package temperature above threshold, cpu clock throttled (total events = 51750)
May 12 01:25:50 marvin kernel: [387411.971447] CPU1: Package temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.971448] CPU5: Package temperature above threshold, cpu clock throttled (total events = 51751)
May 12 01:25:50 marvin kernel: [387411.973408] CPU5: Core temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973409] CPU1: Core temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973409] CPU3: Package temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973410] CPU7: Package temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973411] CPU4: Package temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973411] CPU0: Package temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973412] CPU5: Package temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973413] CPU1: Package temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973450] CPU6: Package temperature/speed normal
May 12 01:25:50 marvin kernel: [387411.973451] CPU2: Package temperature/speed normal

检查你的粉丝。


并在重启之前...

May 12 10:19:59 marvin blkmapd[310]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory

并在重启期间...

May 12 10:19:59 marvin kernel: [    2.720842] systemd[1]: /etc/systemd/system/docker.service.d/override.conf:2: Unknown lvalue 'After' in section 'Service'

检查 override.conf 中的语法错误。


重启后...

May 12 10:19:59 marvin thermald[858]: sysfs read failed constraint_0_max_power_uw

进来terminal,给我看看ps auxc | grep -i therm


May 12 10:20:00 marvin nfsdcltrack[1019]: Failed to init database: -13
May 12 10:20:00 marvin systemd[1]: Started OpenBSD Secure Shell server.
May 12 10:20:00 marvin kernel: [    4.139130] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
May 12 10:20:00 marvin kernel: [    4.139744] NFSD: starting 90-second grace period (net f00000a9)
May 12 10:20:00 marvin dbus-daemon[883]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'

磁盘错误...检查 ata2.00 上的 sata 电缆...Samsung Magician从以下位置下载这里并检查您的三星 SSD 的固件...

May 12 10:21:01 marvin kernel: [   64.800038] ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
May 12 10:21:01 marvin kernel: [   64.800044] ata2.00: irq_stat 0x08000000, interface fatal error
May 12 10:21:01 marvin kernel: [   64.800047] ata2: SError: { UnrecovData Handshk }
May 12 10:21:01 marvin kernel: [   64.800050] ata2.00: failed command: WRITE DMA EXT
May 12 10:21:01 marvin kernel: [   64.800054] ata2.00: cmd 35/00:70:d0:55:61/00:01:0c:00:00/e0 tag 16 dma 188416 out
May 12 10:21:01 marvin kernel: [   64.800054]          res 50/00:00:cf:55:61/00:00:0c:00:00/e0 Emask 0x10 (ATA bus error)
May 12 10:21:01 marvin kernel: [   64.800059] ata2.00: status: { DRDY }
May 12 10:21:01 marvin kernel: [   64.800062] ata2: hard resetting link
May 12 10:21:01 marvin kernel: [   65.024521] eth0: renamed from veth43e132c
May 12 10:21:01 marvin NetworkManager[894]: <info>  [1589304061.8663] devices removed (path: /sys/devices/virtual/net/veth43e132c, iface: veth43e132c)
May 12 10:21:01 marvin NetworkManager[894]: <info>  [1589304061.8667] device (vethfcd015c): carrier: link connected
May 12 10:21:01 marvin kernel: [   65.048579] IPv6: ADDRCONF(NETDEV_CHANGE): vethfcd015c: link becomes ready
May 12 10:21:01 marvin kernel: [   65.048596] br-7eb901c13937: port 4(vethfcd015c) entered blocking state
May 12 10:21:01 marvin kernel: [   65.048597] br-7eb901c13937: port 4(vethfcd015c) entered forwarding state
May 12 10:21:01 marvin kernel: [   65.117033] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May 12 10:21:01 marvin kernel: [   65.119121] ata2.00: supports DRM functions and may not be fully accessible
May 12 10:21:01 marvin kernel: [   65.120914] ata2.00: supports DRM functions and may not be fully accessible
May 12 10:21:01 marvin kernel: [   65.122321] ata2.00: configured for UDMA/133
May 12 10:21:01 marvin kernel: [   65.122340] ata2: EH complete
May 12 10:21:01 marvin kernel: [   65.135296] ata2.00: Enabling discard_zeroes_data

May 12 10:21:42 marvin kernel: [  105.460013] ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
May 12 10:21:42 marvin kernel: [  105.460018] ata2.00: irq_stat 0x08000000, interface fatal error
May 12 10:21:42 marvin kernel: [  105.460021] ata2: SError: { UnrecovData Handshk }
May 12 10:21:42 marvin kernel: [  105.460024] ata2.00: failed command: WRITE DMA EXT
May 12 10:21:42 marvin kernel: [  105.460028] ata2.00: cmd 35/00:00:38:36:31/00:0a:28:00:00/e0 tag 19 dma 1310720 out
May 12 10:21:42 marvin kernel: [  105.460028]          res 50/00:00:e7:15:d9/00:00:26:00:00/e0 Emask 0x10 (ATA bus error)
May 12 10:21:42 marvin kernel: [  105.460032] ata2.00: status: { DRDY }
May 12 10:21:42 marvin kernel: [  105.460036] ata2: hard resetting link
May 12 10:21:42 marvin kernel: [  105.774643] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May 12 10:21:42 marvin kernel: [  105.776825] ata2.00: supports DRM functions and may not be fully accessible
May 12 10:21:42 marvin kernel: [  105.778751] ata2.00: supports DRM functions and may not be fully accessible
May 12 10:21:42 marvin kernel: [  105.780215] ata2.00: configured for UDMA/133
May 12 10:21:42 marvin kernel: [  105.780225] ata2: EH complete
May 12 10:21:42 marvin kernel: [  105.791444] ata2.00: Enabling discard_zeroes_data

May 12 10:21:50 marvin kernel: [  113.896069] ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
May 12 10:21:50 marvin kernel: [  113.896083] ata2.00: irq_stat 0x08000000, interface fatal error
May 12 10:21:50 marvin kernel: [  113.896093] ata2: SError: { UnrecovData Handshk }
May 12 10:21:50 marvin kernel: [  113.896102] ata2.00: failed command: WRITE DMA EXT
May 12 10:21:50 marvin kernel: [  113.896115] ata2.00: cmd 35/00:00:00:f2:36/00:06:28:00:00/e0 tag 14 dma 786432 out
May 12 10:21:50 marvin kernel: [  113.896115]          res 50/00:00:ff:f1:36/00:00:28:00:00/e0 Emask 0x10 (ATA bus error)
May 12 10:21:50 marvin kernel: [  113.896128] ata2.00: status: { DRDY }
May 12 10:21:50 marvin kernel: [  113.896137] ata2: hard resetting link
May 12 10:21:51 marvin kernel: [  114.210830] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May 12 10:21:51 marvin kernel: [  114.213040] ata2.00: supports DRM functions and may not be fully accessible
May 12 10:21:51 marvin kernel: [  114.215030] ata2.00: supports DRM functions and may not be fully accessible
May 12 10:21:51 marvin kernel: [  114.216557] ata2.00: configured for UDMA/133
May 12 10:21:51 marvin kernel: [  114.216575] ata2: EH complete
May 12 10:21:51 marvin kernel: [  114.222011] ata2.00: Enabling discard_zeroes_data
May 12 10:21:51 marvin kernel: [  114.296031] ata2: limiting SATA link speed to 3.0 Gbps
May 12 10:21:51 marvin kernel: [  114.296034] ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
May 12 10:21:51 marvin kernel: [  114.296039] ata2.00: irq_stat 0x08000000, interface fatal error
May 12 10:21:51 marvin kernel: [  114.296042] ata2: SError: { UnrecovData Handshk }
May 12 10:21:51 marvin kernel: [  114.296045] ata2.00: failed command: WRITE DMA EXT
May 12 10:21:51 marvin kernel: [  114.296049] ata2.00: cmd 35/00:88:48:68:d8/00:09:26:00:00/e0 tag 23 dma 1249280 out
May 12 10:21:51 marvin kernel: [  114.296049]          res 50/00:00:47:68:d8/00:00:26:00:00/e0 Emask 0x10 (ATA bus error)
May 12 10:21:51 marvin kernel: [  114.296054] ata2.00: status: { DRDY }
May 12 10:21:51 marvin kernel: [  114.296057] ata2: hard resetting link
May 12 10:21:51 marvin kernel: [  114.611184] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
May 12 10:21:51 marvin kernel: [  114.613520] ata2.00: supports DRM functions and may not be fully accessible
May 12 10:21:51 marvin kernel: [  114.615475] ata2.00: supports DRM functions and may not be fully accessible
May 12 10:21:51 marvin kernel: [  114.617036] ata2.00: configured for UDMA/133
May 12 10:21:51 marvin kernel: [  114.617057] ata2: EH complete
May 12 10:21:51 marvin kernel: [  114.633135] ata2.00: Enabling discard_zeroes_data

更新 #3

SATA 数据/电源线已更换,时间将告诉我们这是否是最终的解决方案。

相关内容