Ubuntu MDADM RAID 驱动器故障?

Ubuntu MDADM RAID 驱动器故障?

我有一台运行 Ubuntu 的 QNAP 和 MD 软件 raid 中的驱动器,但重新启动后,服务器进入紧急模式(我通过从 fstab 中删除 raid 来恢复)。

(前面的 LED 没有指示驱动器有任何错误,但我认为这些是原始 QNAP 安装控制的软件,但我现在运行的是 Ubuntu 16.04。)

此后我尝试手动安装 md0,但无济于事。

server@server:~$ sudo mount -t ext4 /dev/md0 /home/media
mount: wrong fs type, bad option, bad superblock on /dev/md0,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.
server@server:~$ dmesg | tail
[   42.878727] igb 0000:0c:00.0 enp12s0: igb: enp12s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   42.878998] IPv6: ADDRCONF(NETDEV_CHANGE): enp12s0: link becomes ready
[   45.695936] pcieport 0000:03:00.0: System wakeup enabled by ACPI
[   45.698592] pcieport 0000:03:00.0: System wakeup enabled by ACPI
[   50.744434] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[   50.753617] NFSD: starting 90-second grace period (net ffffffff81ef5e80)
[  397.457988] EXT4-fs (md0): unable to read superblock

此后我去检查驱动器,fdisk 没有显示任何相关错误。/dev/sde 是我的启动 SSD,它已使用与驱动器实际相同的较小图像进行了映像处理,因此出现了一些错误,但这并不相关。

sudo fdisk -l
Disk /dev/sda: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdc: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sdd: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


GPT PMBR size mismatch (125045423 != 250069679) will be corrected by w(rite).
Disk /dev/sde: 119.2 GiB, 128035676160 bytes, 250069680 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 812B4B47-96F5-4815-9A7A-39420846C178

Device         Start       End  Sectors  Size Type
/dev/sde1       2048   1050623  1048576  512M EFI System
/dev/sde2    1050624  59643903 58593280   28G Linux filesystem
/dev/sde3  116881408 125044735  8163328  3.9G Linux swap
/dev/sde4   59643904 116881407 57237504 27.3G Linux filesystem

Partition table entries are not in disk order.

所以我去检查了 mdadm,它告诉我突袭处于“不活动”状态。

server@server:~$ sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
     Raid Level : raid0
  Total Devices : 4
    Persistence : Superblock is persistent

          State : inactive

           Name : lavie-server:0  (local to host lavie-server)
           UUID : 6d7fc4d9:6ca640d1:14235985:d87224f7
         Events : 256957

    Number   Major   Minor   RaidDevice

       -       8        0        -        /dev/sda
       -       8       16        -        /dev/sdb
       -       8       32        -        /dev/sdc
       -       8       48        -        /dev/sdd

然后我开始检查每个驱动器(其他三个驱动器的阵列状态中似乎缺少一个,但在一个驱动器上,所有 4 个都出现了):

sudo mdadm --examine /dev/sda
/dev/sda:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 6d7fc4d9:6ca640d1:14235985:d87224f7
           Name : lavie-server:0  (local to host lavie-server)
  Creation Time : Wed May 10 12:13:27 2017
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
     Array Size : 11720661504 (11177.69 GiB 12001.96 GB)
  Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=688 sectors
          State : active
    Device UUID : d17b6e14:6cfa14ec:d39da457:eb30892e

Internal Bitmap : 8 sectors from superblock
    Update Time : Fri Dec 28 14:35:42 2018
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : f137feca - correct
         Events : 256957

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AA.A ('A' == active, '.' == missing, 'R' == replacing)

(下一个)

sudo mdadm --examine /dev/sdb
/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 6d7fc4d9:6ca640d1:14235985:d87224f7
           Name : lavie-server:0  (local to host lavie-server)
  Creation Time : Wed May 10 12:13:27 2017
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
     Array Size : 11720661504 (11177.69 GiB 12001.96 GB)
  Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=688 sectors
          State : active
    Device UUID : 7111c8f9:b25a4240:7c06be59:ef2a90b5

Internal Bitmap : 8 sectors from superblock
    Update Time : Fri Dec 28 14:35:42 2018
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : b3df575f - correct
         Events : 256957

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AA.A ('A' == active, '.' == missing, 'R' == replacing)

下一个

sudo mdadm --examine /dev/sdc
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 6d7fc4d9:6ca640d1:14235985:d87224f7
           Name : lavie-server:0  (local to host lavie-server)
  Creation Time : Wed May 10 12:13:27 2017
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
     Array Size : 11720661504 (11177.69 GiB 12001.96 GB)
  Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=688 sectors
          State : clean
    Device UUID : 44b103a3:825be8ea:3d05c937:5f9dfa12

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Dec 24 13:49:34 2018
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 432ce5b5 - correct
         Events : 47370

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

(下一个)

sudo mdadm --examine /dev/sdd
/dev/sdd:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 6d7fc4d9:6ca640d1:14235985:d87224f7
           Name : lavie-server:0  (local to host lavie-server)
  Creation Time : Wed May 10 12:13:27 2017
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
     Array Size : 11720661504 (11177.69 GiB 12001.96 GB)
  Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=688 sectors
          State : active
    Device UUID : 64a2131e:910e477d:1e3f89c3:fe1fc2e7

Internal Bitmap : 8 sectors from superblock
    Update Time : Fri Dec 28 14:35:42 2018
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : b12cc9e4 - correct
         Events : 256957

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AA.A ('A' == active, '.' == missing, 'R' == replacing)

所以我怀疑 /dev/sdc 有问题,因为它显示“干净”而不是“活动”。

LSHW 的状态:

sudo lshw -class disk -class storage
  *-usb:1                 
       description: Mass storage device
       product: AS2115
       vendor: ASMedia
       physical id: 4
       bus info: usb@1:4
       logical name: scsi8
       version: 0.01
       serial: 00000000000000000000
       capabilities: usb-2.10 scsi emulated scsi-host
       configuration: driver=usb-storage speed=480Mbit/s
     *-disk
          description: SCSI Disk
          product: 2115
          vendor: ASMT
          physical id: 0.0.0
          bus info: scsi@8:0.0.0
          logical name: /dev/sde
          version: 0
          serial: 00000000000000000000
          size: 119GiB (128GB)
          capabilities: gpt-1.00 partitioned partitioned:gpt
          configuration: ansiversion=6 guid=812b4b47-96f5-4815-9a7a-39420846c178 logicalsectorsize=512 sectorsize=512
  *-storage
       description: SATA controller
       product: Marvell Technology Group Ltd.
       vendor: Marvell Technology Group Ltd.
       physical id: 0
       bus info: pci@0000:01:00.0
       version: 11
       width: 32 bits
       clock: 33MHz
       capabilities: storage pm msi pciexpress ahci_1.0 bus_master cap_list rom
       configuration: driver=ahci latency=0
       resources: irq:272 ioport:d050(size=8) ioport:d040(size=4) ioport:d030(size=8) ioport:d020(size=4) ioport:d000(size=32) memory:90b10000-90b107ff memory:90b00000-90b0ffff
  *-storage
       description: SATA controller
       product: Marvell Technology Group Ltd.
       vendor: Marvell Technology Group Ltd.
       physical id: 0
       bus info: pci@0000:02:00.0
       version: 11
       width: 32 bits
       clock: 33MHz
       capabilities: storage pm msi pciexpress ahci_1.0 bus_master cap_list rom
       configuration: driver=ahci latency=0
       resources: irq:278 ioport:c050(size=8) ioport:c040(size=4) ioport:c030(size=8) ioport:c020(size=4) ioport:c000(size=32) memory:90a10000-90a107ff memory:90a00000-90a0ffff
  *-scsi:0
       physical id: 1
       logical name: scsi0
       capabilities: emulated
     *-disk
          description: ATA Disk
          product: WDC WD40EFRX-68W
          vendor: Western Digital
          physical id: 0.0.0
          bus info: scsi@0:0.0.0
          logical name: /dev/sda
          version: 0A82
          serial: WD-WCC4E1SSPZV8
          size: 3726GiB (4TB)
          capabilities: removable
          configuration: ansiversion=5 logicalsectorsize=512 sectorsize=4096
        *-medium
             physical id: 0
             logical name: /dev/sda
             size: 3726GiB (4TB)
  *-scsi:1
       physical id: 2
       logical name: scsi3
       capabilities: emulated
     *-disk
          description: ATA Disk
          product: WDC WD40EFRX-68W
          vendor: Western Digital
          physical id: 0.0.0
          bus info: scsi@3:0.0.0
          logical name: /dev/sdb
          version: 0A82
          serial: WD-WCC4E3HS69CC
          size: 3726GiB (4TB)
          capabilities: removable
          configuration: ansiversion=5 logicalsectorsize=512 sectorsize=4096
        *-medium
             physical id: 0
             logical name: /dev/sdb
             size: 3726GiB (4TB)
  *-scsi:2
       physical id: 3
       logical name: scsi4
       capabilities: emulated
     *-disk
          description: ATA Disk
          product: WDC WD40EFRX-68W
          vendor: Western Digital
          physical id: 0.0.0
          bus info: scsi@4:0.0.0
          logical name: /dev/sdc
          version: 0A82
          serial: WD-WCC4E3VNJ5R2
          size: 3726GiB (4TB)
          capabilities: removable
          configuration: ansiversion=5 logicalsectorsize=512 sectorsize=4096
        *-medium
             physical id: 0
             logical name: /dev/sdc
             size: 3726GiB (4TB)
  *-scsi:3
       physical id: 4
       logical name: scsi7
       capabilities: emulated
     *-disk
          description: ATA Disk
          product: WDC WD40EFRX-68W
          vendor: Western Digital
          physical id: 0.0.0
          bus info: scsi@7:0.0.0
          logical name: /dev/sdd
          version: 0A82
          serial: WD-WCC4E1TJ37PK
          size: 3726GiB (4TB)
          capabilities: removable
          configuration: ansiversion=5 logicalsectorsize=512 sectorsize=4096
        *-medium
             physical id: 0
             logical name: /dev/sdd
             size: 3726GiB (4TB)

我已将驱动器状态设置为https://pastebin.com/bstnDcHe因为这篇文章已经容不下它了(超过 30,000 个字符)。

我也尝试过(我在另一篇文章中发现):

server@server:~$ sudo mdadm --stop /dev/md0
mdadm: stopped /dev/md0
server@server:~$ sudo mdadm --assemble --scan --verbose
mdadm: looking for devices for /dev/md0
mdadm: no recogniseable superblock on /dev/sde4
mdadm: no recogniseable superblock on /dev/sde3
mdadm: no recogniseable superblock on /dev/sde2
mdadm: Cannot assemble mbr metadata on /dev/sde1
mdadm: Cannot assemble mbr metadata on /dev/sde
mdadm: /dev/sdd is identified as a member of /dev/md0, slot 3.
mdadm: /dev/sdc is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdb is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sda is identified as a member of /dev/md0, slot 0.
mdadm: added /dev/sdb to /dev/md0 as 1
mdadm: added /dev/sdc to /dev/md0 as 2 (possibly out of date)
mdadm: added /dev/sdd to /dev/md0 as 3
mdadm: added /dev/sda to /dev/md0 as 0
mdadm: /dev/md0 assembled from 3 drives - not enough to start the array while not clean - consider --force.

它不起作用并且没有给我任何驱动器故障的迹象。

因此,对我来说,不太确定驱动器是否出现故障(因此应予以更换)或者我是否可以继续使用“--force”,这听起来好像如果驱动器最终仍然损坏,它可能会损坏某些东西。

答案1

/dev/sdc似乎没有失败(它处于干净状态),但它非常不同步:

Events : 47370

与其他 3 个磁盘相比,所有磁盘均已同步:

Events : 256957

强制应该有帮助,但是为了安全并以更可控的方式进行,我会失败/dev/sdc并仅使用 3 个健康磁盘重新启动阵列,然后添加/dev/sdc回来(它将重新同步)。

命令如下:

sudo mdadm --manage /dev/md0 --fail /dev/sdc
sudo mdadm --manage /dev/md0 --remove /dev/sdc
sudo mdadm --assemble --scan --verbose
sudo mdadm --manage /dev/md0 --add /dev/sdc

相关内容