修复因 USB 适配器而导致扇区大小错误且可能存在块错误的分区表

2024-11-9 • tag-icon

我有一块 4TB WD 硬盘（扇区大小为 512e/4096），最初使用 Raspberry Pi 3 上的 usb-sata 外壳/适配器将其格式化为 GPT/ext4。显然，负载对于 Rpi3 来说太大了，而 usb 端口电压不一致最终烧坏了外壳。此外壳控制器是将较大磁盘的扇区大小“转换”为 4096 以适应操作系统的控制器之一，而磁盘本身具有模拟的 512e 扇区大小到外壳。因此，尝试在没有该外壳的情况下访问它将无法正确找到分区。

我可以使用 Testdisk 访问数据，并将正确的扇区大小设置为 4096，但在采取措施写入新分区表之前，我决定使用与最初创建文件系统时使用的型号完全相同的新机箱再次检查磁盘。显然，电压波动或机箱突然断电导致磁盘文件系统损坏。我不确定我应该从迄今为止收集的信息中得出什么结论，以及我应该采取哪些步骤来恢复文件系统而不丢失数据。以下是我测试的内容：

测试盘将扇区大小的更正几何形状设置为 4096，并且 GPT 的分区表找到如下分区：

Disk /dev/sdc - 4000 GB / 3726 GiB - CHS 476930 64 32
     Partition               Start        End    Size in sectors
   P Linux filesys. data          256  976754638  976754383

使用带扇区转换功能的外壳，fdisk -l显示不正确且不可读的分区表，如下所示：

GPT PMBR size mismatch (976754643 != 976754378) will be corrected by write.
Disk /dev/sdc: 3,64 TiB, 4000785936384 bytes, 976754379 sectors
Disk model: Storage Device  
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device     Boot Start       End   Sectors  Size Id Type
/dev/sdc1           1 976754378 976754378  3,6T ee GPT

dumpe2fs确认错误检测的问题：

dumpe2fs: Bad magic number in super-block while trying to open /dev/sdc
Couldn't find valid filesystem superblock.
/dev/sdc contains `DOS/MBR boot sector; partition 1 : ID=0xee, start-CHS (0x0,0,2), end-CHS (0x3ff,255,63), startsector 1, 976754643 sectors, extended partition table (last)' data

还hdparm -N /dev/sdc报告 HPA 错误

/dev/sdc:  max sectors   = 7814035055/1(7814037168?), HPA setting seems invalid (buggy kernel device driver?)

通过运行磁盘分区我得到以下信息：

Warning! Disk size is smaller than the main header indicates! Loading
secondary header from the last sector of the disk! You should use 'v' to
verify disk integrity, and perhaps options on the experts' menu to repair
the disk.
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

Warning! One or more CRCs don't match. You should repair the disk!
Main header: OK
Backup header: ERROR
Main partition table: OK
Backup partition table: ERROR

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: damaged

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************

所以我检查了磁盘的 SMART 信息。我注意到的第一件事是这个错误在消息：

23:51:48 2023] sd 3:0:0:0: [sdc] Optimal transfer size 33553920 bytes not a multiple of preferred minimum block size (4096 bytes)
23:53:49 2023] Buffer I/O error on dev sdc, logical block 0, async page read
23:53:49 2023] sd 3:0:0:0: [sdc] tag#21 device offline or changed
23:53:49 2023] I/O error, dev sdc, sector 1 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2

在整个测试过程中，它仅对逻辑块 0 至 7 和扇区 0 至 7 重复一次。否则，测试本身会报告磁盘正常：

SMART Status not supported: Incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
...
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   171   167   021    Pre-fail  Always       -       6441
  4 Start_Stop_Count        0x0032   093   093   000    Old_age   Always       -       7730
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       7018
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       54
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       39
193 Load_Cycle_Count        0x0032   193   193   000    Old_age   Always       -       21893
194 Temperature_Celsius     0x0022   117   109   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing

我尝试通过复制来修复超级块错误fsck -b和fsck.ext4 -p -b 32768 -B 4096 /dev/sdc可用的超级块备份（使用mke2fs -n），但都失败了，因为无法读取为 ext4 有效信息：

/sbin/e2fsck: Bad magic number in super-block while trying to open /dev/sdc

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:...

回到磁盘分区我尝试验证并得到以下结果：

Command (? for help): v

Caution: The CRC for the backup partition table is invalid. This table may
be corrupt. This program will automatically create a new backup partition
table when you save your partitions.

Problem: The secondary header's self-pointer indicates that it doesn't reside
at the end of the disk. If you've added a disk to a RAID array, use the 'e'
option on the experts' menu to adjust the secondary header's and partition
table's locations.

Problem: Disk is too small to hold all the data!
(Disk size is 976754379 sectors, needs to be 976754644 sectors.)
The 'e' option on the experts' menu may fix this problem.

Warning: There is a gap between the main partition table (ending sector 5)
and the first usable sector (256). This is helpful in some exotic configurations,
but is unusual. The util-linux fdisk program often creates disks like this.
Using 'j' on the experts' menu can adjust this gap.

Problem: GPT claims the disk is larger than it is! (Claimed last usable
sector is 976754638, but backup header is at
976754643 and disk size is 976754379 sectors.
The 'e' option on the experts' menu will probably fix this problem

Problem: partition 1 is too big for the disk.

Partition(s) in the protective MBR are too big for the disk! Creating a
fresh protective or hybrid MBR is recommended.

Caution: Partition 1 doesn't end on a 256-sector boundary. This may
result in problems with some disk encryption tools.

Identified 6 problems!

尝试了专家埃选项，然后再次验证五：

Expert command (? for help): e
Relocating backup data structures to the end of the disk

Expert command (? for help): v

Caution: The CRC for the backup partition table is invalid. This table may
be corrupt. This program will automatically create a new backup partition
table when you save your partitions.

Warning: There is a gap between the main partition table (ending sector 5)
and the first usable sector (256). This is helpful in some exotic configurations,
but is unusual. The util-linux fdisk program often creates disks like this.
Using 'j' on the experts' menu can adjust this gap.

Problem: partition 1 is too big for the disk.

Warning! Secondary partition table overlaps the last partition by
515 blocks!
You will need to delete this partition or resize it in another utility.

Caution: Partition 1 doesn't end on a 256-sector boundary. This may
result in problems with some disk encryption tools.

Identified 3 problems!

我没有将更改写入磁盘，因为它似乎会引入进一步的问题，如第二次验证所示，并且我没有多余的磁盘来备份 4TB，所以我唯一的希望就是在不格式化的情况下修复文件系统。

那么，您认为这里发生了什么？有什么办法可以解决这个问题吗？谢谢！

编辑：

我刚刚注意到 dmesg 中的逻辑块/扇区错误可能与我用来运行 SMARTctl 的基座集线器有关。在不连接任何磁盘的情况下保持集线器打开会产生类似的消息，所以可能是这种情况。尽管如此，由于 HPA 损坏和 gdisk 信息错误，我仍然不清楚磁盘的状况。如何修复它而不损坏磁盘中的数据到不可恢复的程度？

编辑1

显然，某些型号的 USB 外壳会“劫持”磁盘中的部分结尾字节，以用于自己的操作。从外壳中移除磁盘会再次向操作系统显示这些“隐藏”字节。我的猜测是，特别是在 GPT 情况下，分区表现在看起来是错误的、大小不合适或已损坏。信息来自这里：https://www.reddit.com/r/DataHoarder/comments/ejqdeh/psa_do_not_swap_your_nonempty_drives_into/

编辑：

编辑1

相关内容