有时我在启动计算机(运行 Debian)时遇到奇怪的问题。所以我发出了“dmesg”命令。在它的输出中我看到了很多错误。但是,当我在硬盘上运行扩展 SMART 测试(使用“smartctl -t long /dev/sda”命令)时,结果是我的磁盘没有损坏。
这些错误的原因可能是什么?
以下是错误:
(...)
[ 505.918537] ata3.00: exception Emask 0x50 SAct 0x400 SErr 0x280900 action 0x6 frozen
[ 505.918549] ata3.00: irq_stat 0x08000000, interface fatal error
[ 505.918558] ata3: SError: { UnrecovData HostInt 10B8B BadCRC }
[ 505.918566] ata3.00: failed command: READ FPDMA QUEUED
[ 505.918579] ata3.00: cmd 60/40:50:20:5b:60/00:00:0b:00:00/40 tag 10 ncq 32768 in
res 40/00:54:20:5b:60/00:00:0b:00:00/40 Emask 0x50 (ATA bus error)
[ 505.918586] ata3.00: status: { DRDY }
[ 505.918595] ata3: hard resetting link
[ 506.410055] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 506.422648] ata3.00: configured for UDMA/133
[ 506.422679] ata3: EH complete
[ 1633.123880] md: bind<sdb3>
[ 1633.187966] RAID1 conf printout:
[ 1633.187977] --- wd:1 rd:2
[ 1633.187984] disk 0, wo:0, o:1, dev:sda3
[ 1633.187989] disk 1, wo:1, o:1, dev:sdb3
[ 1633.188866] md: recovery of RAID array md0
[ 1633.188871] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[ 1633.188875] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[ 1633.188890] md: using 128k window, over a total of 1943618560k.
[ 1634.167341] ata3.00: exception Emask 0x50 SAct 0x7f80 SErr 0x280900 action 0x6 frozen
[ 1634.167353] ata3.00: irq_stat 0x08000000, interface fatal error
[ 1634.167361] ata3: SError: { UnrecovData HostInt 10B8B BadCRC }
[ 1634.167369] ata3.00: failed command: READ FPDMA QUEUED
[ 1634.167382] ata3.00: cmd 60/00:38:00:00:6f/02:00:01:00:00/40 tag 7 ncq 262144 in
res 40/00:6c:00:0c:6f/00:00:01:00:00/40 Emask 0x50 (ATA bus error)
[ 1634.167389] ata3.00: status: { DRDY }
[ 1634.167395] ata3.00: failed command: READ FPDMA QUEUED
[ 1634.167407] ata3.00: cmd 60/00:40:00:02:6f/02:00:01:00:00/40 tag 8 ncq 262144 in
res 40/00:6c:00:0c:6f/00:00:01:00:00/40 Emask 0x50 (ATA bus error)
[ 1634.167413] ata3.00: status: { DRDY }
[ 1634.167418] ata3.00: failed command: READ FPDMA QUEUED
[ 1634.167429] ata3.00: cmd 60/00:48:00:04:6f/02:00:01:00:00/40 tag 9 ncq 262144 in
res 40/00:6c:00:0c:6f/00:00:01:00:00/40 Emask 0x50 (ATA bus error)
[ 1634.167435] ata3.00: status: { DRDY }
[ 1634.167439] ata3.00: failed command: READ FPDMA QUEUED
[ 1634.167451] ata3.00: cmd 60/00:50:00:06:6f/02:00:01:00:00/40 tag 10 ncq 262144 in
res 40/00:6c:00:0c:6f/00:00:01:00:00/40 Emask 0x50 (ATA bus error)
[ 1634.167457] ata3.00: status: { DRDY }
[ 1634.167462] ata3.00: failed command: READ FPDMA QUEUED
[ 1634.167473] ata3.00: cmd 60/00:58:00:08:6f/02:00:01:00:00/40 tag 11 ncq 262144 in
res 40/00:6c:00:0c:6f/00:00:01:00:00/40 Emask 0x50 (ATA bus error)
[ 1634.167479] ata3.00: status: { DRDY }
[ 1634.167484] ata3.00: failed command: READ FPDMA QUEUED
[ 1634.167495] ata3.00: cmd 60/00:60:00:0a:6f/02:00:01:00:00/40 tag 12 ncq 262144 in
res 40/00:6c:00:0c:6f/00:00:01:00:00/40 Emask 0x50 (ATA bus error)
[ 1634.167500] ata3.00: status: { DRDY }
[ 1634.167505] ata3.00: failed command: READ FPDMA QUEUED
[ 1634.167516] ata3.00: cmd 60/80:68:00:0c:6f/00:00:01:00:00/40 tag 13 ncq 65536 in
res 40/00:6c:00:0c:6f/00:00:01:00:00/40 Emask 0x50 (ATA bus error)
[ 1634.167522] ata3.00: status: { DRDY }
[ 1634.167527] ata3.00: failed command: READ FPDMA QUEUED
[ 1634.167538] ata3.00: cmd 60/00:70:80:0c:6f/02:00:01:00:00/40 tag 14 ncq 262144 in
res 40/00:6c:00:0c:6f/00:00:01:00:00/40 Emask 0x50 (ATA bus error)
[ 1634.167544] ata3.00: status: { DRDY }
[ 1634.167553] ata3: hard resetting link
[ 1634.658816] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1634.672645] ata3.00: configured for UDMA/133
[ 1634.672696] ata3: EH complete
[ 1637.687898] ata3.00: exception Emask 0x50 SAct 0x3ff000 SErr 0x280900 action 0x6 frozen
[ 1637.687910] ata3.00: irq_stat 0x08000000, interface fatal error
[ 1637.687918] ata3: SError: { UnrecovData HostInt 10B8B BadCRC }
[ 1637.687926] ata3.00: failed command: READ FPDMA QUEUED
[ 1637.687940] ata3.00: cmd 60/00:60:80:a7:af/02:00:02:00:00/40 tag 12 ncq 262144 in
res 40/00:ac:00:b4:af/00:00:02:00:00/40 Emask 0x50 (ATA bus error)
[ 1637.687947] ata3.00: status: { DRDY }
[ 1637.687953] ata3.00: failed command: READ FPDMA QUEUED
[ 1637.687965] ata3.00: cmd 60/00:68:80:a9:af/02:00:02:00:00/40 tag 13 ncq 262144 in
res 40/00:ac:00:b4:af/00:00:02:00:00/40 Emask 0x50 (ATA bus error)
[ 1637.687971] ata3.00: status: { DRDY }
[ 1637.687976] ata3.00: failed command: READ FPDMA QUEUED
[ 1637.687987] ata3.00: cmd 60/80:70:80:ab:af/01:00:02:00:00/40 tag 14 ncq 196608 in
res 40/00:ac:00:b4:af/00:00:02:00:00/40 Emask 0x50 (ATA bus error)
[ 1637.687993] ata3.00: status: { DRDY }
[ 1637.687998] ata3.00: failed command: READ FPDMA QUEUED
[ 1637.688009] ata3.00: cmd 60/00:78:00:ad:af/02:00:02:00:00/40 tag 15 ncq 262144 in
res 40/00:ac:00:b4:af/00:00:02:00:00/40 Emask 0x50 (ATA bus error)
[ 1637.688015] ata3.00: status: { DRDY }
[ 1637.688020] ata3.00: failed command: READ FPDMA QUEUED
[ 1637.688031] ata3.00: cmd 60/80:80:00:af:af/00:00:02:00:00/40 tag 16 ncq 65536 in
res 40/00:ac:00:b4:af/00:00:02:00:00/40 Emask 0x50 (ATA bus error)
[ 1637.688037] ata3.00: status: { DRDY }
[ 1637.688042] ata3.00: failed command: READ FPDMA QUEUED
[ 1637.688053] ata3.00: cmd 60/00:88:80:af:af/01:00:02:00:00/40 tag 17 ncq 131072 in
res 40/00:ac:00:b4:af/00:00:02:00:00/40 Emask 0x50 (ATA bus error)
[ 1637.688059] ata3.00: status: { DRDY }
[ 1637.688064] ata3.00: failed command: READ FPDMA QUEUED
[ 1637.688075] ata3.00: cmd 60/80:90:80:b0:af/00:00:02:00:00/40 tag 18 ncq 65536 in
res 40/00:ac:00:b4:af/00:00:02:00:00/40 Emask 0x50 (ATA bus error)
[ 1637.688081] ata3.00: status: { DRDY }
[ 1637.688085] ata3.00: failed command: READ FPDMA QUEUED
[ 1637.688096] ata3.00: cmd 60/00:98:00:b1:af/02:00:02:00:00/40 tag 19 ncq 262144 in
res 40/00:ac:00:b4:af/00:00:02:00:00/40 Emask 0x50 (ATA bus error)
[ 1637.688102] ata3.00: status: { DRDY }
[ 1637.688107] ata3.00: failed command: READ FPDMA QUEUED
[ 1637.688118] ata3.00: cmd 60/00:a0:00:b3:af/01:00:02:00:00/40 tag 20 ncq 131072 in
res 40/00:ac:00:b4:af/00:00:02:00:00/40 Emask 0x50 (ATA bus error)
[ 1637.688124] ata3.00: status: { DRDY }
[ 1637.688129] ata3.00: failed command: READ FPDMA QUEUED
[ 1637.688140] ata3.00: cmd 60/00:a8:00:b4:af/01:00:02:00:00/40 tag 21 ncq 131072 in
res 40/00:ac:00:b4:af/00:00:02:00:00/40 Emask 0x50 (ATA bus error)
[ 1637.688146] ata3.00: status: { DRDY }
[ 1637.688154] ata3: hard resetting link
[ 1638.179398] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1638.192977] ata3.00: configured for UDMA/133
[ 1638.193029] ata3: EH complete
[ 1640.259492] md: export_rdev(sdb1)
[ 1640.326109] md: bind<sdb1>
[ 1640.346712] RAID1 conf printout:
[ 1640.346724] --- wd:1 rd:2
[ 1640.346731] disk 0, wo:0, o:1, dev:sda1
[ 1640.346736] disk 1, wo:1, o:1, dev:sdb1
[ 1640.346893] md: delaying recovery of md1 until md0 has finished (they share one or more physical units)
[ 1657.987964] ata3.00: exception Emask 0x50 SAct 0x40000 SErr 0x280900 action 0x6 frozen
[ 1657.987975] ata3.00: irq_stat 0x08000000, interface fatal error
[ 1657.987984] ata3: SError: { UnrecovData HostInt 10B8B BadCRC }
[ 1657.987992] ata3.00: failed command: READ FPDMA QUEUED
[ 1657.988006] ata3.00: cmd 60/00:90:00:30:2e/03:00:09:00:00/40 tag 18 ncq 393216 in
res 40/00:94:00:30:2e/00:00:09:00:00/40 Emask 0x50 (ATA bus error)
[ 1657.988013] ata3.00: status: { DRDY }
[ 1657.988022] ata3: hard resetting link
[ 1658.479548] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1658.493107] ata3.00: configured for UDMA/133
[ 1658.493147] ata3: EH complete
[ 1670.547791] ata3: limiting SATA link speed to 1.5 Gbps
[ 1670.547805] ata3.00: exception Emask 0x50 SAct 0x7f SErr 0x280900 action 0x6 frozen
[ 1670.547812] ata3.00: irq_stat 0x08000000, interface fatal error
[ 1670.547820] ata3: SError: { UnrecovData HostInt 10B8B BadCRC }
[ 1670.547826] ata3.00: failed command: READ FPDMA QUEUED
[ 1670.547839] ata3.00: cmd 60/80:00:00:1f:2e/01:00:0c:00:00/40 tag 0 ncq 196608 in
res 40/00:2c:00:26:2e/00:00:0c:00:00/40 Emask 0x50 (ATA bus error)
[ 1670.547846] ata3.00: status: { DRDY }
[ 1670.547852] ata3.00: failed command: READ FPDMA QUEUED
[ 1670.547863] ata3.00: cmd 60/80:08:80:20:2e/00:00:0c:00:00/40 tag 1 ncq 65536 in
res 40/00:2c:00:26:2e/00:00:0c:00:00/40 Emask 0x50 (ATA bus error)
[ 1670.547869] ata3.00: status: { DRDY }
[ 1670.547875] ata3.00: failed command: READ FPDMA QUEUED
[ 1670.547886] ata3.00: cmd 60/00:10:00:21:2e/02:00:0c:00:00/40 tag 2 ncq 262144 in
res 40/00:2c:00:26:2e/00:00:0c:00:00/40 Emask 0x50 (ATA bus error)
[ 1670.547892] ata3.00: status: { DRDY }
[ 1670.547896] ata3.00: failed command: READ FPDMA QUEUED
[ 1670.547907] ata3.00: cmd 60/00:18:00:23:2e/02:00:0c:00:00/40 tag 3 ncq 262144 in
res 40/00:2c:00:26:2e/00:00:0c:00:00/40 Emask 0x50 (ATA bus error)
[ 1670.547913] ata3.00: status: { DRDY }
[ 1670.547918] ata3.00: failed command: READ FPDMA QUEUED
[ 1670.547929] ata3.00: cmd 60/00:20:00:25:2e/01:00:0c:00:00/40 tag 4 ncq 131072 in
res 40/00:2c:00:26:2e/00:00:0c:00:00/40 Emask 0x50 (ATA bus error)
[ 1670.547935] ata3.00: status: { DRDY }
[ 1670.547940] ata3.00: failed command: READ FPDMA QUEUED
[ 1670.547951] ata3.00: cmd 60/00:28:00:26:2e/02:00:0c:00:00/40 tag 5 ncq 262144 in
res 40/00:2c:00:26:2e/00:00:0c:00:00/40 Emask 0x50 (ATA bus error)
[ 1670.547957] ata3.00: status: { DRDY }
[ 1670.547961] ata3.00: failed command: READ FPDMA QUEUED
[ 1670.547972] ata3.00: cmd 60/00:30:00:28:2e/02:00:0c:00:00/40 tag 6 ncq 262144 in
res 40/00:2c:00:26:2e/00:00:0c:00:00/40 Emask 0x50 (ATA bus error)
[ 1670.547978] ata3.00: status: { DRDY }
[ 1670.547987] ata3: hard resetting link
[ 1671.039264] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 1671.053386] ata3.00: configured for UDMA/133
[ 1671.053444] ata3: EH complete
[ 2422.512002] md: md0: recovery done.
[ 2422.547344] md: recovery of RAID array md1
[ 2422.547355] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[ 2422.547360] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[ 2422.547378] md: using 128k window, over a total of 4877312k.
[ 2422.668465] RAID1 conf printout:
[ 2422.668474] --- wd:2 rd:2
[ 2422.668480] disk 0, wo:0, o:1, dev:sda3
[ 2422.668486] disk 1, wo:0, o:1, dev:sdb3
[ 2469.990451] md: md1: recovery done.
[ 2470.049986] RAID1 conf printout:
[ 2470.049997] --- wd:2 rd:2
[ 2470.050003] disk 0, wo:0, o:1, dev:sda1
[ 2470.050009] disk 1, wo:0, o:1, dev:sdb1
[ 3304.445149] PM: Hibernation mode set to 'platform'
[ 3304.782375] PM: Syncing filesystems ... done.
[ 3307.028591] Freezing user space processes ... (elapsed 0.001 seconds) done.
(...)
答案1
首先,请记住SMART 表示您的硬盘运行状况良好并不一定意味着该硬盘是健康。SMART 报告是援助,不是绝对的真理。
如果您感兴趣的是做什么,而不是为什么,那么请随意向下滚动到最后几段;然而,临时文本将告诉您为什么我认为我提出的是正确的行动方针,以及如何从您发布的内容中得出这一点。
话虽如此,让我们看看其中一个错误告诉我们什么。
[ 1670.547805] ata3.00: exception Emask 0x50 SAct 0x7f SErr 0x280900 action 0x6 frozen
[ 1670.547812] ata3.00: irq_stat 0x08000000, interface fatal error
[ 1670.547820] ata3: SError: { UnrecovData HostInt 10B8B BadCRC }
[ 1670.547826] ata3.00: failed command: READ FPDMA QUEUED
[ 1670.547839] ata3.00: cmd 60/80:00:00:1f:2e/01:00:0c:00:00/40 tag 0 ncq 196608 in
res 40/00:2c:00:26:2e/00:00:0c:00:00/40 Emask 0x50 (ATA bus error)
[ 1670.547846] ata3.00: status: { DRDY }
[ 1670.547852] ata3.00: failed command: READ FPDMA QUEUED
(我希望我得到了应该组合在一起的零件,但是你得到了一捆这样的零件,所以无论如何都应该没问题。)
这Linux ata Wiki 有一个页面解释了如何读取这些错误。特别,
- 值
status
表示DRDY
“设备就绪。如果一切正常,通常为 1”。看到状态值DRDY
是完全正常且符合预期的。 SError
有多个组件值,您可以看到这些值(在这个特定的代码片段中):UnrecovData
“发生数据完整性错误,接口未恢复”HostInt
“主机总线适配器内部错误”10B8B
“发生 10b 到 8b 解码错误”BadCRC
“发生链路层 CRC 错误”
10b8b 编码将 8 位编码为 10 位,以帮助信号同步和错误检测,用于物理布线,而不一定用于驱动器本身。驱动器很可能使用其他形式的 FEC 或 ECC 编码,并且那里的错误通常会显示为某种形式的 I/O 错误,其error
值可能为UNC
(“无法纠正的错误 - 通常是由于磁盘上的坏扇区造成的”) ,可能在行尾的括号中带有“媒体错误”(“软件检测到媒体错误”)res
。后者不是你所看到的,所以虽然我们不能完全排除它,但它似乎不太可能。
“链路层”是驱动器本身的控制器和磁盘驱动器接口芯片(可能是驱动器的一部分)之间的物理电缆和电路板走线。南桥在计算机主板上,但也可能位于板外 HBA 上)。
主机总线适配器也称为 HBA,是连接到存储设备的电路。俗称“磁盘控制器”,这个术语对于现代系统来说有点用词不当。 HBA 最明显的部分通常是连接端口,目前最常见的是 SATA 或某些 SAS 外形规格。
UnrecovData
和标志HostInt
基本上告诉我们“出现了严重错误,并且无法恢复或没有尝试恢复”。相反的情况可能是RecovData
,这表示“发生了数据完整性错误,但接口已恢复”。 (顺便说一句,我可能会使用HBAInt
代替HostInt
,因为“主机”指的是 HBA,而不是整个系统。)
10B8B
和的组合BadCRC
都指向物理链路层,让我怀疑布线问题。
这种怀疑还得到了以下事实的支持:SMART 自检(除了状态报告之外完全在驱动器内部)没有发现制造商认为严重到足以保证在结果中报告的错误。如果驱动器在存储或读取数据时出现问题,长 SMART 自检尤其应该报告该问题。
长话短说:
因此,我要做的第一件事就是简单地拔掉并重新插入两端的 SATA 电缆;它可能会稍微松动,导致其间歇性失去电气接触。看看是否可以解决问题。甚至可能值得对计算机中的所有 SATA 布线(而不仅仅是受影响的磁盘)执行此操作。如果您使用的是板外 HBA,我还会移除并重新安装该卡,主要是因为当您已经在搞乱布线时,这是一个很容易尝试的事情。
如果失败的话,尝试扔掉并更换 SATA 电缆,最好使用高质量的电缆。高质量的电缆会稍微贵一些,但我发现,如果它有助于避免这样的麻烦,通常是值得的。没有人喜欢看到他们的存储出现错误!
答案2
就我而言,我注意到我在两个不同的磁盘控制器上插入了两个磁盘:第一个是 PCI-Express,第二个是普通 PCI。当我将两个 RAID 磁盘插入同一控制器时,我消除了 BadCRC 错误。