xfs:无法读取超级块

xfs:无法读取超级块

我收到以下错误:

[root@mediaserv ~]# mount /dev/mapper/media1 /media
mount: /media: can't read superblock on /dev/mapper/media1.

这是 Fedora 33。我有一个由 8 个 8TB WD Red 硬盘组成的 RAID5,运行在 Adaptec 7805Q RAID 控制器上,即 /dev/sdc。我上面有一个 GPT 分区,即 /dev/sdc1,使用 LUKSv2 和 XFS 文件系统加密。

[root@mediaserv ~]# lsblk /dev/sdc
NAME       MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sdc          8:32   1 50.9T  0 disk
└─sdc1       8:33   1 50.9T  0 part
  └─media1 253:0    0 50.9T  0 crypt
[root@mediaserv ~]#

RAID 最终处于降级模式。很有可能我在安装新风扇时碰到了第一个驱动器上的电缆。无论如何,在启动后,它以降级模式运行了几个小时才被发现。我关闭了它,从救援映像启动到单用户模式,然后让它运行以重建阵列。这花了大约 14 个小时。

重新启动后,系统提示我输入分区的 LUK 密码,但密码就那样停在那里。我让它运行了大约 8 个小时,不确定后台是否有问题正在修复。

我再次从救援启动。注释掉文件系统/etc/crypttab,并且/etc/fstab能够在没有/media安装文件系统的情况下登录系统。

我能够cryptsetup luksOpen /dev/sdc1 media1成功运行;该分区似乎解密且没有错误。

当我运行 mount 命令(上面)时,我得到以下内容/var/log/messages

Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#340 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#340 Sense Key : Hardware Error [current]
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#340 Add. Sense: Internal target failure
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#340 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan  5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#341 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#341 Sense Key : Hardware Error [current]
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#341 Add. Sense: Internal target failure
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#341 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan  5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan  5 10:23:00 mediaserv kernel: Buffer I/O error on dev dm-0, logical block 0, async page read
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#342 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#342 Sense Key : Hardware Error [current]
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#342 Add. Sense: Internal target failure
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#342 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan  5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan  5 10:23:00 mediaserv kernel: EXT4-fs (dm-0): unable to read superblock
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#343 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#343 Sense Key : Hardware Error [current]
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#343 Add. Sense: Internal target failure
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#343 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan  5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan  5 10:23:00 mediaserv kernel: EXT4-fs (dm-0): unable to read superblock
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#344 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#344 Sense Key : Hardware Error [current]
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#344 Add. Sense: Internal target failure
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#344 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan  5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan  5 10:23:00 mediaserv kernel: EXT4-fs (dm-0): unable to read superblock
Jan  5 10:23:00 mediaserv kernel: ISOFS: unsupported/invalid hardware sector size 4096
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#345 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#345 Sense Key : Hardware Error [current]
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#345 Add. Sense: Internal target failure
Jan  5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#345 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan  5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan  5 10:23:00 mediaserv kernel: FAT-fs (dm-0): unable to read boot sector

我已尝试运行xfs_repair,但尚未尝试该-L选项。

[root@mediaserv ~]# xfs_repair /dev/mapper/media1
Phase 1 - find and verify superblock...
superblock read failed, offset 0, size 524288, ag 0, rval -1

fatal error -- Remote I/O error

我不确定下一步该去哪里,我担心我可能会运行错误的命令并造成更多损害。任何帮助都将不胜感激。

谢谢!

-麦克风

编辑:

经过进一步调查,我认为这不是超级块问题,我认为错误是因为我没有在 mount 命令中指定文件系统类型。重新正确运行后,我得到:

[root@mediaserv ~]# mount -t xfs /dev/mapper/media1 /media
mount: /media: mount(2) system call failed: Remote I/O error.

这会将以下内容放入我的/var/log/messages

Jan  5 12:15:43 mediaserv kernel: sd 12:0:0:0: [sdc] tag#838 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan  5 12:15:43 mediaserv kernel: sd 12:0:0:0: [sdc] tag#838 Sense Key : Hardware Error [current]
Jan  5 12:15:43 mediaserv kernel: sd 12:0:0:0: [sdc] tag#838 Add. Sense: Internal target failure
Jan  5 12:15:43 mediaserv kernel: sd 12:0:0:0: [sdc] tag#838 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan  5 12:15:43 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0
Jan  5 12:15:43 mediaserv kernel: XFS (dm-0): SB validate failed with error -121.

我不确定该如何解释。从扇区 34816 开始的数据有问题?

编辑#2:

关于 RAID 阵列的健康状况。正如我所提到的,它确实因驱动器丢失而进入了降级模式。在 RAID 重建期间,我将其停止服务并进入单用户模式。以下是重建后 Adaptec 工具的输出(我已将其缩减为更简洁):

arcconf getconfig 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
   Controller Status                        : Optimal
   Controller Mode                          : RAID (Expose RAW)
   Controller Model                         : Adaptec ASR7805Q
   Performance Mode                         : Big Block Bypass
   --------------------------------------------------------
   RAID Properties
   --------------------------------------------------------
   Logical devices/Failed/Degraded          : 1/0/0
   Copyback                                 : Disabled
   Automatic Failover                       : Enabled
   Background consistency check             : Disabled
   Background consistency check period      : 0
----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical Device number 0
   Logical Device name                      : media
   Block Size of member drives              : 4K Bytes
   RAID level                               : 5
   Status of Logical Device                 : Optimal
   Size                                     : 53387257 MB
   Parity space                             : 7626751 MB
   Stripe-unit size                         : 1024 KB
   Interface Type                           : Serial ATA
   Device Type                              : HDD
   Read-cache setting                       : Enabled
   Read-cache status                        : On
   Write-cache setting                      : On when protected by battery/ZMM
   Write-cache status                       : On
   maxCache read cache setting              : Enabled
   maxCache read cache status               : Off
   maxCache write cache setting             : Disabled
   maxCache write cache status              : Off
   Partitioned                              : Yes
   Protected by Hot-Spare                   : No
   Bootable                                 : Yes
   Failed stripes                           : Yes
   Power settings                           : Disabled
----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
      Device #0
         Device is a Hard drive
         State                              : Online
         Block Size                         : 4K Bytes
      Device #1
         Device is a Hard drive
         State                              : Online
         Block Size                         : 4K Bytes
      Device #2
         Device is a Hard drive
         State                              : Online
         Block Size                         : 4K Bytes
      Device #3
         Device is a Hard drive
         State                              : Online
         Block Size                         : 4K Bytes
      Device #4
         Device is a Hard drive
         State                              : Online
         Block Size                         : 4K Bytes
      Device #5
         Device is a Hard drive
         State                              : Online
         Block Size                         : 4K Bytes
      Device #6
         Device is a Hard drive
         State                              : Online
         Block Size                         : 4K Bytes
      Device #7
         Device is a Hard drive
         State                              : Online
         Block Size                         : 4K Bytes

这是阵列中每个驱动器的 SMART 状态:

[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,0" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,1" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,2" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,3" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,4" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,5" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,6" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,7" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED

然而,就在几个小时前,我仔细查看日志时发现了以下内容:

Jan  4 08:25:25 mediaserv kernel: sd 12:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=9s
Jan  4 08:25:25 mediaserv kernel: sd 12:0:0:0: [sdc] tag#0 Sense Key : Hardware Error [current]
Jan  4 08:25:25 mediaserv kernel: sd 12:0:0:0: [sdc] tag#0 Add. Sense: Internal target failure
Jan  4 08:25:25 mediaserv kernel: sd 12:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 01 60 2f 5c bf 00 00 00 20 00 00
Jan  4 08:25:25 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 47269471736 op 0x0:(READ) flags 0x80700 phys_seg 5 prio class 0

以上五个顺序出现,且仍在日志中继续,并且在机器丢失文件系统时同时出现以下情况:

Jan  4 08:26:32 mediaserv kernel: aacraid: Host adapter abort request.#012aacraid: Outstanding commands on (12,0,0,0):
Jan  4 08:26:32 mediaserv kernel: aacraid: Host adapter abort request.#012aacraid: Outstanding commands on (12,0,0,0):
Jan  4 08:26:32 mediaserv kernel: aacraid: Host adapter abort request.#012aacraid: Outstanding commands on (12,0,0,0):
Jan  4 08:26:55 mediaserv kernel: aacraid: Host adapter abort request.#012aacraid: Outstanding commands on (12,0,0,0):
Jan  4 08:26:55 mediaserv kernel: aacraid: Host bus reset request. SCSI hang ?
Jan  4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: outstanding cmd: midlevel-0
Jan  4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: outstanding cmd: lowlevel-0
Jan  4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: outstanding cmd: error handler-0
Jan  4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: outstanding cmd: firmware-56
Jan  4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: outstanding cmd: kernel-0
Jan  4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: Controller reset type is 3
Jan  4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: Issuing IOP reset
Jan  4 08:27:30 mediaserv kernel: aacraid 0000:02:00.0: IOP reset succeeded
Jan  4 08:27:30 mediaserv kernel: aacraid: Comm Interface type2 enabled
Jan  4 08:27:56 mediaserv kernel: aacraid 0000:02:00.0: Scheduling bus rescan

值得注意的是,阵列进入了降级模式,然后 10 小时 15 分钟后发生了上述情况。因此,阵列问题和 xfs 文件系统问题相隔数小时。虽然阵列和驱动器现在报告正常,但我仍然接收上面的“失败结果”块。

相关内容