CentOS 7 DL120 G9 搭配 H240 - 监控 RAID 问题

CentOS 7 DL120 G9 搭配 H240 - 监控 RAID 问题

我刚刚使用智能 HBA H240 卡配置了一台新服务器并安装了 hpssaducli,它检测控制器并允许我生成报告。

我遇到的问题是如何检测 RAID 故障并发送警报。

通过 hpssaducli 生成的报告包含大量难以筛选的信息,并且目前没有发现故障阵列,因此不确定在驱动器发生故障时我需要查找什么信息。

细节

root@server [~]# lsmod | grep hp
hpwdt                  14242  0
hpilo                  17381  0
shpchp                 37032  0
hpsa                   94958  3

root@server [~]# rpm -qa | grep hpsa
kmod-hpsa-3.4.12-110.rhel7u1.x86_64

root@server [~]# uname -a
Linux server.hostname 3.10.0-229.14.1.el7.x86_64 #1 SMP Tue Sep 15 15:05:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

root@server [~]# hpssaducli
HP Smart Storage Diagnostics 2.10.14.0
Usage: hpssaducli [ -adu | -ssd | -val ] [ command-specific options ]
...
...

Diagnosable devices:
Smart HBA H240 in Slot 2

hpssacli 的输出

root@server [~]# hpssacli ctrl all show config detail

Smart HBA H240 in Slot 2 (RAID Mode)
   Bus Interface: PCI
   Slot: 2
   Serial Number: XXXXXXXXX
   Cache Serial Number: XXXXXXXXX
   Controller Status: OK
   Hardware Revision: B
   Firmware Version: 1.34
   Rebuild Priority: High
   Surface Scan Delay: 3 secs
   Surface Scan Mode: Idle
   Parallel Surface Scan Supported: No
   Queue Depth: Automatic
   Monitor and Performance Delay: 60  min
   Elevator Sort: Enabled
   Degraded Performance Optimization: Disabled
   Inconsistency Repair Policy: Disabled
   Wait for Cache Room: Disabled
   Surface Analysis Inconsistency Notification: Disabled
   Post Prompt Timeout: 15 secs
   Cache Board Present: False
   Drive Write Cache: Disabled
   Controller Memory Size: 256 MB
   SATA NCQ Supported: True
   Spare Activation Mode: Activate on physical drive failure (default)
   Controller Temperature (C): 72
   Cache Module Temperature (C): 36
   Number of Ports: 2 Internal only
   Encryption: Disabled
   Express Local Encryption: False
   Driver Name: hpsa
   Driver Version: 3.4.12
   Driver Supports HP SSD Smart Path: True
   PCI Address (Domain:Bus:Device.Function): 0000:0A:00.0
   Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s)
   Controller Mode: RAID Mode
   Controller Mode Reboot: Not Required
   Latency Scheduler Setting: Disabled
   Current Power Mode: MaxPerformance
   Host Serial Number: CZ250305FS
   Sanitize Erase Supported: False
   Primary Boot Volume: None
   Secondary Boot Volume: None


   Port Name: 2I
         Port ID: 0
         Port Connection Number: 0
         SAS Address: 500143803366B9C0
         Port Location: Internal
         Managed Cable Connected: False

   Port Name: 1I
         Port ID: 1
         Port Connection Number: 1
         SAS Address: 500143803366B9C4
         Port Location: Internal
         Managed Cable Connected: False

   Internal Drive Cage at Port 1I, Box 1, OK
      Power Supply Status: Not Redundant
      Drive Bays: 4
      Port: 1I
      Box: 1
      Location: Internal

   Physical Drives
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, Solid State SATA, 500 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, Solid State SATA, 500 GB, OK)
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, Solid State SATA, 500 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, Solid State SATA, 500 GB, OK)
      None attached


   Internal Drive Cage at Port 2I, Box 0, OK
      Power Supply Status: Not Redundant
      Drive Bays: 4
      Port: 2I
      Box: 0
      Location: Internal

   Physical Drives
      None attached
      None attached

   Array: A
      Interface Type: Solid State SATA
      Unused Space: 0  MB (0.0%)
      Used Space: 1.8 TB (100.0%)
      Status: OK
      Array Type: Data
      HP SSD Smart Path: enable



      Logical Drive: 1
         Size: 931.5 GB
         Fault Tolerance: 1+0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 512 KB
         Status: Ready for Rebuild
         Caching:  Disabled
         Unique Identifier: XXXXXXXXX
         Disk Name: /dev/sda
         Mount Points: /boot/efi 200 MB Partition Number 2, /boot 500 MB Partition Number 3
         OS Status: LOCKED
         Logical Drive Label: 026ACA51PDNNK0ARH7Q0B9471B
         Mirror Group 1:
      Smart HBA H240 in Slot 2
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, Solid State SATA, 500 GB, OK)
      Smart HBA H240 in Slot 2
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, Solid State SATA, 500 GB, OK)
         Mirror Group 2:
      Smart HBA H240 in Slot 2
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, Solid State SATA, 500 GB, OK)
      Smart HBA H240 in Slot 2
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, Solid State SATA, 500 GB, OK)
         Drive Type: Data
         LD Acceleration Method: HP SSD Smart Path

      physicaldrive 1I:1:1
         Port: 1I
         Box: 1
         Bay: 1
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Native Block Size: 512
         Firmware Revision: EMT01B6Q
         Serial Number: XXXXXXXXX
         Model: ATA     Samsung SSD 850
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 27
         Maximum Temperature (C): 70
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
         Sanitize Erase Supported: False

      physicaldrive 1I:1:2
         Port: 1I
         Box: 1
         Bay: 2
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Native Block Size: 512
         Firmware Revision: EMT01B6Q
         Serial Number: XXXXXXXXX
         Model: ATA     Samsung SSD 850
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 27
         Maximum Temperature (C): 70
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False

      physicaldrive 1I:1:3
         Port: 1I
         Box: 1
         Bay: 3
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Native Block Size: 512
         Firmware Revision: EMT01B6Q
         Serial Number: XXXXXXXXX
         Model: ATA     Samsung SSD 850
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 28
         Maximum Temperature (C): 70
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False

      physicaldrive 1I:1:4
         Port: 1I
         Box: 1
         Bay: 4
         Status: OK
         Drive Type: Data Drive
         Interface Type: Solid State SATA
         Size: 500 GB
         Drive exposed to OS: False
         Native Block Size: 512
         Firmware Revision: EMT01B6Q
         Serial Number: XXXXXXXXX
         Model: ATA     Samsung SSD 850
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 28
         Maximum Temperature (C): 70
         SSD Smart Trip Wearout: Not Supported
         PHY Count: 1
         PHY Transfer Rate: 6.0Gbps
         Drive Authentication Status: OK
         Carrier Application Version: 11
         Carrier Bootloader Version: 6
         Sanitize Erase Supported: False

答案1

我不想将其作为重复项关闭,但您应该安装 HP 管理代理来提供服务器健康信息。这是可用的通过 yum或使用支持站点适用于 ProLiant DL120 Gen9 和 RHEL7。

看:HP ProLiant DL380e Gen8 服务器 - SPP 使用一些想法...

至少,你可以使用hpssacli 工具根据需要为您提供实际的 RAID 控制器信息。

但要明白,当您包含其他实用程序时,服务器还能够发送电子邮件、SNMP 陷阱和记录健康事件。

相关内容