为什么当 1 个驱动器发生故障时,所有驱动器都会脱离 RAID?LSI9260-8i/IBM M5014

为什么当 1 个驱动器发生故障时,所有驱动器都会脱离 RAID?LSI9260-8i/IBM M5014

我有一个有趣的问题,希望大家能帮助我。我有一台 IBM 5014(相当于 LSI 9260-8i)运行两个 RAID10 虚拟驱动器。第一个是 4 个 WD RE4,每个 2TB,总驱动器容量为 4TB - 我们称之为 VD1。另一个是 4 个 WD RE4-GP,每个 2TB,总驱动器容量为 4TB - 我们称之为 VD0。以防万一,该卡在 Norco 机箱中运行,配有 3 个风扇(每组 4 个驱动器上 1 个 + Gigabyte MB、16GB RAM 和 IBM 卡上 1 个。还有一台 IBM5015,也在 RAID10 中运行 4 个 256GB SSD)。我使用 ESXi5.5 和一系列 VM 进行虚拟化。 5014 卡以直通模式运行至 WHS2011 主机,而 5015 包含 VM 本身。

VD0 运行良好,没有任何问题。这是我的主要文档存储。

但是,包含我所有视频的 VD1 会定期丢弃一个驱动器,导致其状态降低,然后几乎立即(通常具有完全相同的时间戳,但有时会延迟 1 秒)丢弃剩余的驱动器,导致其离线。

控制器本身已经正常运行了近 6 个月,因此虽然它可能与控制器有关,但感觉它会导致两个虚拟驱动器都出现问题,而不仅仅是其中一个。

我面临的挑战是驱动器不会以相同的顺序持续掉线(至少根据日志)——所以我不知道哪个驱动器导致了这个问题。我在下面附上了日志的摘录。正如你所看到的,它先是删除了驱动器,然后又重新添加了它们。

任何关于如何排除哪个驱动器故障的建议都非常受欢迎 - 我不敢相信它们都一起坏了,也不敢相信 MSM 日志本身包含的信息如此之少。

谢谢大家!

道格

        ID = 248
    SEQUENCE NUMBER = 382617
    TIME = 07-07-2015 08:14:46
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   8

    ID = 112
    SEQUENCE NUMBER = 382616
    TIME = 07-07-2015 08:14:46
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:1

    ID = 248
    SEQUENCE NUMBER = 382615
    TIME = 07-07-2015 08:14:45
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   13

    ID = 112
    SEQUENCE NUMBER = 382614
    TIME = 07-07-2015 08:14:45
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:3

    ID = 248
    SEQUENCE NUMBER = 382613
    TIME = 07-07-2015 08:14:44
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   9

    ID = 112
    SEQUENCE NUMBER = 382612
    TIME = 07-07-2015 08:14:44
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:0

    ID = 248
    SEQUENCE NUMBER = 382611
    TIME = 07-07-2015 08:14:44
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   14

    ID = 112
    SEQUENCE NUMBER = 382610
    TIME = 07-07-2015 08:14:44
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:2

    ID = 247
    SEQUENCE NUMBER = 382609
    TIME = 07-07-2015 07:53:09
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   14

    ID = 91
    SEQUENCE NUMBER = 382608
    TIME = 07-07-2015 07:53:09
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:2

    ID = 247
    SEQUENCE NUMBER = 382607
    TIME = 07-07-2015 07:53:09
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   9

    ID = 91
    SEQUENCE NUMBER = 382606
    TIME = 07-07-2015 07:53:09
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:0

    ID = 247
    SEQUENCE NUMBER = 382605
    TIME = 07-07-2015 07:53:09
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   8

    ID = 91
    SEQUENCE NUMBER = 382604
    TIME = 07-07-2015 07:53:09
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:1

    ID = 247
    SEQUENCE NUMBER = 382603
    TIME = 07-07-2015 07:53:04
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   13

    ID = 91
    SEQUENCE NUMBER = 382602
    TIME = 07-07-2015 07:53:04
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:3

    ID = 248
    SEQUENCE NUMBER = 382601
    TIME = 07-07-2015 07:52:44
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   9

    ID = 112
    SEQUENCE NUMBER = 382600
    TIME = 07-07-2015 07:52:44
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:0

    ID = 248
    SEQUENCE NUMBER = 382599
    TIME = 07-07-2015 07:52:42
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   13

    ID = 112
    SEQUENCE NUMBER = 382598
    TIME = 07-07-2015 07:52:42
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:3

    ID = 248
    SEQUENCE NUMBER = 382597
    TIME = 07-07-2015 07:52:41
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   8

    ID = 112
    SEQUENCE NUMBER = 382596
    TIME = 07-07-2015 07:52:41
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:1

    ID = 248
    SEQUENCE NUMBER = 382595
    TIME = 07-07-2015 07:52:40
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   14

    ID = 112
    SEQUENCE NUMBER = 382594
    TIME = 07-07-2015 07:52:40
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:2

    ID = 145
    SEQUENCE NUMBER = 382593
    TIME = 07-07-2015 07:10:59
    LOCALIZED MESSAGE = Controller ID:  0   Battery temperature is high

    ID = 149
    SEQUENCE NUMBER = 382592
    TIME = 07-07-2015 06:56:54
    LOCALIZED MESSAGE = Controller ID:  0   Battery temperature is normal

    ID = 247
    SEQUENCE NUMBER = 382591
    TIME = 07-07-2015 04:08:56
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   14

    ID = 91
    SEQUENCE NUMBER = 382590
    TIME = 07-07-2015 04:08:56
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:2

    ID = 247
    SEQUENCE NUMBER = 382589
    TIME = 07-07-2015 04:08:56
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   9

    ID = 91
    SEQUENCE NUMBER = 382588
    TIME = 07-07-2015 04:08:56
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:0

    ID = 247
    SEQUENCE NUMBER = 382587
    TIME = 07-07-2015 04:08:55
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   8

    ID = 91
    SEQUENCE NUMBER = 382586
    TIME = 07-07-2015 04:08:55
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:1

    ID = 248
    SEQUENCE NUMBER = 382585
    TIME = 07-07-2015 04:08:49
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   8

    ID = 112
    SEQUENCE NUMBER = 382584
    TIME = 07-07-2015 04:08:49
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:1

    ID = 248
    SEQUENCE NUMBER = 382583
    TIME = 07-07-2015 04:08:47
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   9

    ID = 112
    SEQUENCE NUMBER = 382582
    TIME = 07-07-2015 04:08:47
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:0

    ID = 248
    SEQUENCE NUMBER = 382581
    TIME = 07-07-2015 04:08:47
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   14

    ID = 112
    SEQUENCE NUMBER = 382580
    TIME = 07-07-2015 04:08:47
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:2

    ID = 247
    SEQUENCE NUMBER = 382579
    TIME = 07-07-2015 03:24:32
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   14

    ID = 91
    SEQUENCE NUMBER = 382578
    TIME = 07-07-2015 03:24:32
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:2

    ID = 247
    SEQUENCE NUMBER = 382577
    TIME = 07-07-2015 03:24:32
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   13

    ID = 91
    SEQUENCE NUMBER = 382576
    TIME = 07-07-2015 03:24:32
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:3

    ID = 247
    SEQUENCE NUMBER = 382575
    TIME = 07-07-2015 03:24:32
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   8

    ID = 91
    SEQUENCE NUMBER = 382574
    TIME = 07-07-2015 03:24:32
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:1

    ID = 247
    SEQUENCE NUMBER = 382573
    TIME = 07-07-2015 03:24:27
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   9

    ID = 91
    SEQUENCE NUMBER = 382572
    TIME = 07-07-2015 03:24:27
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:0

    ID = 248
    SEQUENCE NUMBER = 382571
    TIME = 07-07-2015 03:23:36
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   9

    ID = 112
    SEQUENCE NUMBER = 382570
    TIME = 07-07-2015 03:23:36
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:0

    ID = 248
    SEQUENCE NUMBER = 382569
    TIME = 07-07-2015 03:23:36
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   14

    ID = 112
    SEQUENCE NUMBER = 382568
    TIME = 07-07-2015 03:23:36
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:2

    ID = 248
    SEQUENCE NUMBER = 382567
    TIME = 07-07-2015 03:23:36
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   8

    ID = 112
    SEQUENCE NUMBER = 382566
    TIME = 07-07-2015 03:23:36
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:1

    ID = 248
    SEQUENCE NUMBER = 382565
    TIME = 07-07-2015 03:23:36
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   13

    ID = 112
    SEQUENCE NUMBER = 382564
    TIME = 07-07-2015 03:23:36
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:3
ID = 139
SEQUENCE NUMBER = 382435
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   Deleted VD:       1

ID = 114
SEQUENCE NUMBER = 382434
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   -:-:0  Previous   =   Failed      Current   =   Unconfigured Bad

ID = 114
SEQUENCE NUMBER = 382433
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   -:-:2  Previous   =   Failed      Current   =   Unconfigured Bad

ID = 114
SEQUENCE NUMBER = 382432
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   -:-:1  Previous   =   Failed      Current   =   Unconfigured Bad

ID = 114
SEQUENCE NUMBER = 382431
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   -:-:3  Previous   =   Failed      Current   =   Unconfigured Bad

ID = 114
SEQUENCE NUMBER = 382430
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   -:-:0  Previous   =   Online      Current   =   Failed

ID = 248
SEQUENCE NUMBER = 382429
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   9

ID = 112
SEQUENCE NUMBER = 382428
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:0

ID = 252
SEQUENCE NUMBER = 382427
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0  VD is now OFFLINE   VD       1

ID = 81
SEQUENCE NUMBER = 382426
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change on VD:   1      Previous   =   Degraded  Current   =       Offline

ID = 114
SEQUENCE NUMBER = 382425
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   -:-:2  Previous   =   Online      Current   =   Failed

ID = 248
SEQUENCE NUMBER = 382424
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   14

ID = 112
SEQUENCE NUMBER = 382423
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:2

ID = 114
SEQUENCE NUMBER = 382422
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   -:-:1  Previous   =   Online      Current   =   Failed

ID = 248
SEQUENCE NUMBER = 382421
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   8

ID = 112
SEQUENCE NUMBER = 382420
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:1

ID = 251
SEQUENCE NUMBER = 382419
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0  VD is now DEGRADED   VD       1

ID = 81
SEQUENCE NUMBER = 382418
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change on VD:   1      Previous   =   Optimal  Current   =       Degraded

ID = 114
SEQUENCE NUMBER = 382417
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   -:-:3  Previous   =   Online      Current   =   Failed

ID = 248
SEQUENCE NUMBER = 382416
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   13

ID = 112
SEQUENCE NUMBER = 382415
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:3

答案1

抱歉,我没有遇到过同样的情况,但我们使用的是 LSI,并且之前有过固件更新,因此问题已经解决了。请检查您是否拥有设备的最新固件。

相关内容