两台家庭服务器硬盘出现故障

2024-6-15 • tag-icon

我家里有两台旧电脑，目前运行的是 Ubuntu Server 12.04 LTS，它们似乎时不时都会出现硬盘故障。两台电脑的问题都一样，系统进入只读模式，有时会完全崩溃，我必须重新启动机器才能恢复正常。重新启动时，我收到一条错误消息，提示硬盘出现故障，希望“修复”该问题，但是它经过了“修复”步骤，启动了系统，然后几天都没问题，然后砰的一声，又崩溃了。

值得一提的是，我在两个系统上都重新安装了 Ubuntu 12.04 LTS 两次，但都没有成功。我认为这与 HDD 无关，因为两台机器上都出现了这种情况。一台是我的旧 PC，另一台是我以前的笔记本电脑（如果您需要规格，请告诉我）第一台是 32 位，第二台是 64 位，我已经安装了正确的 ubuntu 架构。我对 Linux 还很陌生，我在 Google 上搜索过，搜索了整个网络，找不到任何可以帮助我解决问题的信息。

以下是 dmesg 日志中的一些相关（在我看来）错误（两台机器都有相同的错误，如果您需要两台机器的整个 dmesg 日志文件，请告诉我）：

[    2.239578] ata3.00: ATA-7: Hitachi HDT725025VLA380, V5DOA58A, max UDMA/133
[    2.245936] ata3.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
[    2.253166] ata3.00: configured for UDMA/133
[    2.272299] sd 3:0:0:0: [sda] 488397168 512-byte logical blocks: (250 GB/232 GiB)
[    2.285467] sd 3:0:0:0: Attached scsi generic sg0 type 0
[    2.285537] sd 3:0:0:0: [sda] Write Protect is off
[    2.285541] sd 3:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    2.285576] sd 3:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    3.241596] EXT4-fs (sda1): INFO: recovery required on readonly filesystem
[    3.248461] EXT4-fs (sda1): write access will be enabled during recovery
[    3.725449] EXT4-fs (sda1): recovery complete
[    3.752546] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[    5.794220] init: ureadahead main process (287) terminated with status 5
[    7.288860] Adding 2094076k swap on /dev/sda5.  Priority:-1 extents:1 across:2094076k FS
[    9.660321] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro

编辑：在 boot.log 中发现 2 行可能也相关：

fsck from util-linux 2.20.1
/dev/sda1: clean, 110596/15138816 files, 1810722/60525568 blocks

编辑：以下是 smartctl -A /dev/sda 的输出

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   001   016    Pre-fail  Always   In_the_past 1
  2 Throughput_Performance  0x0005   158   100   050    Pre-fail  Offline      -       211
  3 Spin_Up_Time            0x0007   123   100   024    Pre-fail  Always       -       295 (Average 314)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       2448
  5 Reallocated_Sector_Ct   0x0033   001   001   005    Pre-fail  Always   FAILING_NOW 509 (0, 382)
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   132   100   020    Pre-fail  Offline      -       33
  9 Power_On_Hours          0x0012   096   096   000    Old_age   Always       -       28704
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1164
192 Power-Off_Retract_Count 0x0032   098   098   000    Old_age   Always       -       2655
193 Load_Cycle_Count        0x0012   098   098   000    Old_age   Always       -       2655
194 Temperature_Celsius     0x0002   142   122   000    Old_age   Always       -       42 (Min/Max 13/49)
196 Reallocated_Event_Count 0x0032   087   087   000    Old_age   Always       -       390
197 Current_Pending_Sector  0x0022   021   021   000    Old_age   Always       -       1517
198 Offline_Uncorrectable   0x0008   079   079   000    Old_age   Offline      -       531
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always       -       0

答案1

文件系统经常在磁盘相关的硬件错误上执行此操作。通常的罪魁祸首是坏块，尽管 reallocated_sector_count 可能是其他原因。无论哪种方式，我都会避免使用 fdisk，并尝试将您的驱动器 ddrescuing 到新驱动器，可以在此处找到相当不错的说明：http://www.forensicswiki.org/wiki/Ddrescue- 不用说，一定要确保您拥有正确的设备名称。如果您在尝试将数据复制到新驱动器之前通过访问 /dev/disk/by-uuid 获取磁盘 uuid，则可以放心，您不会因为操作系统重新排序驱动器（有时会发生这种情况）而意外地用空白磁盘覆盖数据。

答案1

相关内容