我收到以下错误:
[root@mediaserv ~]# mount /dev/mapper/media1 /media
mount: /media: can't read superblock on /dev/mapper/media1.
这是 Fedora 33。我有一个由 8 个 8TB WD Red 硬盘组成的 RAID5,运行在 Adaptec 7805Q RAID 控制器上,即 /dev/sdc。我上面有一个 GPT 分区,即 /dev/sdc1,使用 LUKSv2 和 XFS 文件系统加密。
[root@mediaserv ~]# lsblk /dev/sdc
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdc 8:32 1 50.9T 0 disk
└─sdc1 8:33 1 50.9T 0 part
└─media1 253:0 0 50.9T 0 crypt
[root@mediaserv ~]#
RAID 最终处于降级模式。很有可能我在安装新风扇时碰到了第一个驱动器上的电缆。无论如何,在启动后,它以降级模式运行了几个小时才被发现。我关闭了它,从救援映像启动到单用户模式,然后让它运行以重建阵列。这花了大约 14 个小时。
重新启动后,系统提示我输入分区的 LUK 密码,但密码就那样停在那里。我让它运行了大约 8 个小时,不确定后台是否有问题正在修复。
我再次从救援启动。注释掉文件系统/etc/crypttab
,并且/etc/fstab
能够在没有/media
安装文件系统的情况下登录系统。
我能够cryptsetup luksOpen /dev/sdc1 media1
成功运行;该分区似乎解密且没有错误。
当我运行 mount 命令(上面)时,我得到以下内容/var/log/messages
:
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#340 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#340 Sense Key : Hardware Error [current]
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#340 Add. Sense: Internal target failure
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#340 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan 5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#341 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#341 Sense Key : Hardware Error [current]
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#341 Add. Sense: Internal target failure
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#341 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan 5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan 5 10:23:00 mediaserv kernel: Buffer I/O error on dev dm-0, logical block 0, async page read
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#342 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#342 Sense Key : Hardware Error [current]
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#342 Add. Sense: Internal target failure
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#342 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan 5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan 5 10:23:00 mediaserv kernel: EXT4-fs (dm-0): unable to read superblock
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#343 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#343 Sense Key : Hardware Error [current]
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#343 Add. Sense: Internal target failure
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#343 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan 5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan 5 10:23:00 mediaserv kernel: EXT4-fs (dm-0): unable to read superblock
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#344 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#344 Sense Key : Hardware Error [current]
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#344 Add. Sense: Internal target failure
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#344 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan 5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan 5 10:23:00 mediaserv kernel: EXT4-fs (dm-0): unable to read superblock
Jan 5 10:23:00 mediaserv kernel: ISOFS: unsupported/invalid hardware sector size 4096
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#345 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#345 Sense Key : Hardware Error [current]
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#345 Add. Sense: Internal target failure
Jan 5 10:23:00 mediaserv kernel: sd 12:0:0:0: [sdc] tag#345 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan 5 10:23:00 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan 5 10:23:00 mediaserv kernel: FAT-fs (dm-0): unable to read boot sector
我已尝试运行xfs_repair
,但尚未尝试该-L
选项。
[root@mediaserv ~]# xfs_repair /dev/mapper/media1
Phase 1 - find and verify superblock...
superblock read failed, offset 0, size 524288, ag 0, rval -1
fatal error -- Remote I/O error
我不确定下一步该去哪里,我担心我可能会运行错误的命令并造成更多损害。任何帮助都将不胜感激。
谢谢!
-麦克风
编辑:
经过进一步调查,我认为这不是超级块问题,我认为错误是因为我没有在 mount 命令中指定文件系统类型。重新正确运行后,我得到:
[root@mediaserv ~]# mount -t xfs /dev/mapper/media1 /media
mount: /media: mount(2) system call failed: Remote I/O error.
这会将以下内容放入我的/var/log/messages
:
Jan 5 12:15:43 mediaserv kernel: sd 12:0:0:0: [sdc] tag#838 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
Jan 5 12:15:43 mediaserv kernel: sd 12:0:0:0: [sdc] tag#838 Sense Key : Hardware Error [current]
Jan 5 12:15:43 mediaserv kernel: sd 12:0:0:0: [sdc] tag#838 Add. Sense: Internal target failure
Jan 5 12:15:43 mediaserv kernel: sd 12:0:0:0: [sdc] tag#838 CDB: Read(16) 88 00 00 00 00 00 00 00 11 00 00 00 00 01 00 00
Jan 5 12:15:43 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 34816 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0
Jan 5 12:15:43 mediaserv kernel: XFS (dm-0): SB validate failed with error -121.
我不确定该如何解释。从扇区 34816 开始的数据有问题?
编辑#2:
关于 RAID 阵列的健康状况。正如我所提到的,它确实因驱动器丢失而进入了降级模式。在 RAID 重建期间,我将其停止服务并进入单用户模式。以下是重建后 Adaptec 工具的输出(我已将其缩减为更简洁):
arcconf getconfig 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
Controller Status : Optimal
Controller Mode : RAID (Expose RAW)
Controller Model : Adaptec ASR7805Q
Performance Mode : Big Block Bypass
--------------------------------------------------------
RAID Properties
--------------------------------------------------------
Logical devices/Failed/Degraded : 1/0/0
Copyback : Disabled
Automatic Failover : Enabled
Background consistency check : Disabled
Background consistency check period : 0
----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical Device number 0
Logical Device name : media
Block Size of member drives : 4K Bytes
RAID level : 5
Status of Logical Device : Optimal
Size : 53387257 MB
Parity space : 7626751 MB
Stripe-unit size : 1024 KB
Interface Type : Serial ATA
Device Type : HDD
Read-cache setting : Enabled
Read-cache status : On
Write-cache setting : On when protected by battery/ZMM
Write-cache status : On
maxCache read cache setting : Enabled
maxCache read cache status : Off
maxCache write cache setting : Disabled
maxCache write cache status : Off
Partitioned : Yes
Protected by Hot-Spare : No
Bootable : Yes
Failed stripes : Yes
Power settings : Disabled
----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
Device #0
Device is a Hard drive
State : Online
Block Size : 4K Bytes
Device #1
Device is a Hard drive
State : Online
Block Size : 4K Bytes
Device #2
Device is a Hard drive
State : Online
Block Size : 4K Bytes
Device #3
Device is a Hard drive
State : Online
Block Size : 4K Bytes
Device #4
Device is a Hard drive
State : Online
Block Size : 4K Bytes
Device #5
Device is a Hard drive
State : Online
Block Size : 4K Bytes
Device #6
Device is a Hard drive
State : Online
Block Size : 4K Bytes
Device #7
Device is a Hard drive
State : Online
Block Size : 4K Bytes
这是阵列中每个驱动器的 SMART 状态:
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,0" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,1" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,2" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,3" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,4" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,5" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,6" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
[root@mediaserv ~]# smartctl -a -d "aacraid,0,0,7" /dev/sdc | grep health
SMART overall-health self-assessment test result: PASSED
然而,就在几个小时前,我仔细查看日志时发现了以下内容:
Jan 4 08:25:25 mediaserv kernel: sd 12:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=9s
Jan 4 08:25:25 mediaserv kernel: sd 12:0:0:0: [sdc] tag#0 Sense Key : Hardware Error [current]
Jan 4 08:25:25 mediaserv kernel: sd 12:0:0:0: [sdc] tag#0 Add. Sense: Internal target failure
Jan 4 08:25:25 mediaserv kernel: sd 12:0:0:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 01 60 2f 5c bf 00 00 00 20 00 00
Jan 4 08:25:25 mediaserv kernel: blk_update_request: critical target error, dev sdc, sector 47269471736 op 0x0:(READ) flags 0x80700 phys_seg 5 prio class 0
以上五个顺序出现,且仍在日志中继续,并且在机器丢失文件系统时同时出现以下情况:
Jan 4 08:26:32 mediaserv kernel: aacraid: Host adapter abort request.#012aacraid: Outstanding commands on (12,0,0,0):
Jan 4 08:26:32 mediaserv kernel: aacraid: Host adapter abort request.#012aacraid: Outstanding commands on (12,0,0,0):
Jan 4 08:26:32 mediaserv kernel: aacraid: Host adapter abort request.#012aacraid: Outstanding commands on (12,0,0,0):
Jan 4 08:26:55 mediaserv kernel: aacraid: Host adapter abort request.#012aacraid: Outstanding commands on (12,0,0,0):
Jan 4 08:26:55 mediaserv kernel: aacraid: Host bus reset request. SCSI hang ?
Jan 4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: outstanding cmd: midlevel-0
Jan 4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: outstanding cmd: lowlevel-0
Jan 4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: outstanding cmd: error handler-0
Jan 4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: outstanding cmd: firmware-56
Jan 4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: outstanding cmd: kernel-0
Jan 4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: Controller reset type is 3
Jan 4 08:26:55 mediaserv kernel: aacraid 0000:02:00.0: Issuing IOP reset
Jan 4 08:27:30 mediaserv kernel: aacraid 0000:02:00.0: IOP reset succeeded
Jan 4 08:27:30 mediaserv kernel: aacraid: Comm Interface type2 enabled
Jan 4 08:27:56 mediaserv kernel: aacraid 0000:02:00.0: Scheduling bus rescan
值得注意的是,阵列进入了降级模式,然后 10 小时 15 分钟后发生了上述情况。因此,阵列问题和 xfs 文件系统问题相隔数小时。虽然阵列和驱动器现在报告正常,但我仍然接收上面的“失败结果”块。