我们最近购买了一台新的 Dell R340XL 服务器并安装了 CentOS 8。我们在 BOSS 控制器上有一个用于启动的 SSD(作为 sda),在 PERC H330 上安装了 4 个 HDDS,组成了 RAID 5(作为 sdb)。几天来它运行良好,但两天前,我们再也看不到 RAID 卷了。我们打电话给 Dell,他们帮助我们升级了多个设备上的固件,但我们仍然看不到 RAID 卷。任何帮助都将不胜感激。
当我执行 lspci 时,该设备会显示出来:
02:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3008 [Fury] (rev 02)
Subsystem: Dell PERC H330 Adapter
...
Kernel driver in use: megaraid_sas
Kernel modules: megaraid_sas
我下载了 MegaCli,可以获取各种信息。命令MegaCli64 -LdGetNum -aAll
告诉我以下内容:
Number of Virtual Drives Configured on Adapter 0: 1
如果我们一次又一次地重新启动服务器,偶尔我们会看到 RAID 阵列显示为块设备,但它显示为 sda,从而使启动驱动器成为 sdb。
我查看了 dmesg 和系统日志,不太清楚如何识别,但我确实看到了如下内容:
[ 1.296976] megasas: 07.707.51.00-rc1
[ 1.301092] megaraid_sas 0000:02:00.0: FW now in Ready state
[ 1.301095] megaraid_sas 0000:02:00.0: 63 bit DMA mask and 32 bit consistent mask
[ 1.301363] megaraid_sas 0000:02:00.0: firmware supports msix : (96)
[ 1.301364] megaraid_sas 0000:02:00.0: current msix/online cpus : (12/12)
[ 1.301365] megaraid_sas 0000:02:00.0: RDPQ mode : (disabled)
[ 1.301366] megaraid_sas 0000:02:00.0: Current firmware supports maximum commands: 928 LDIO threshold: 237
[ 1.301477] megaraid_sas 0000:02:00.0: Configured max firmware commands: 927
[ 1.303185] megaraid_sas 0000:02:00.0: FW supports sync cache : No
...
[ 1.680004] megaraid_sas 0000:02:00.0: FW provided supportMaxExtLDs: 0 max_lds: 32
[ 1.680005] megaraid_sas 0000:02:00.0: controller type : iMR(0MB)
[ 1.680005] megaraid_sas 0000:02:00.0: Online Controller Reset(OCR) : Enabled
[ 1.680006] megaraid_sas 0000:02:00.0: Secure JBOD support : No
[ 1.680006] megaraid_sas 0000:02:00.0: NVMe passthru support : No
[ 1.680007] megaraid_sas 0000:02:00.0: FW provided TM TaskAbort/Reset timeout : 0 secs/0 secs
[ 1.702120] megaraid_sas 0000:02:00.0: INIT adapter done
[ 1.702121] megaraid_sas 0000:02:00.0: Jbod map is not supported megasas_setup_jbod_map 5371
[ 1.728949] megaraid_sas 0000:02:00.0: pci id : (0x1000)/(0x005f)/(0x1028)/(0x1f44)
[ 1.728950] megaraid_sas 0000:02:00.0: unevenspan support : yes
[ 1.728950] megaraid_sas 0000:02:00.0: firmware crash dump : no
[ 1.728951] megaraid_sas 0000:02:00.0: jbod sync map : no
[ 1.729017] scsi host0: Avago SAS based MegaRAID driver
[ 1.730804] scsi 11:0:0:0: Processor Marvell Console 1.01 PQ: 0 ANSI: 5
[ 1.732057] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[ 1.732076] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[ 1.732094] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[ 1.732112] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[ 1.732131] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[ 1.732149] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[ 1.732167] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[ 1.732185] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[ 1.732206] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[ 1.732224] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
[ 1.732242] scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured
分配失败看起来令人担忧,但当我尝试谷歌搜索时,我并没有找到太多信息。有人试图分配超过 200 个逻辑设备,但我们在这里绝对不会这样做。
提前致谢。
答案1
仔细检查它是 H330 还是 HBA330;如果它确实是 H330,请检查是否有人意外将其置于 HBA/直通模式。希望它就是这样简单的事情,但我猜可能还不止这些。