我刚刚使用智能 HBA H240 卡配置了一台新服务器并安装了 hpssaducli,它检测控制器并允许我生成报告。
我遇到的问题是如何检测 RAID 故障并发送警报。
通过 hpssaducli 生成的报告包含大量难以筛选的信息,并且目前没有发现故障阵列,因此不确定在驱动器发生故障时我需要查找什么信息。
细节
root@server [~]# lsmod | grep hp
hpwdt 14242 0
hpilo 17381 0
shpchp 37032 0
hpsa 94958 3
root@server [~]# rpm -qa | grep hpsa
kmod-hpsa-3.4.12-110.rhel7u1.x86_64
root@server [~]# uname -a
Linux server.hostname 3.10.0-229.14.1.el7.x86_64 #1 SMP Tue Sep 15 15:05:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
root@server [~]# hpssaducli
HP Smart Storage Diagnostics 2.10.14.0
Usage: hpssaducli [ -adu | -ssd | -val ] [ command-specific options ]
...
...
Diagnosable devices:
Smart HBA H240 in Slot 2
hpssacli 的输出
root@server [~]# hpssacli ctrl all show config detail
Smart HBA H240 in Slot 2 (RAID Mode)
Bus Interface: PCI
Slot: 2
Serial Number: XXXXXXXXX
Cache Serial Number: XXXXXXXXX
Controller Status: OK
Hardware Revision: B
Firmware Version: 1.34
Rebuild Priority: High
Surface Scan Delay: 3 secs
Surface Scan Mode: Idle
Parallel Surface Scan Supported: No
Queue Depth: Automatic
Monitor and Performance Delay: 60 min
Elevator Sort: Enabled
Degraded Performance Optimization: Disabled
Inconsistency Repair Policy: Disabled
Wait for Cache Room: Disabled
Surface Analysis Inconsistency Notification: Disabled
Post Prompt Timeout: 15 secs
Cache Board Present: False
Drive Write Cache: Disabled
Controller Memory Size: 256 MB
SATA NCQ Supported: True
Spare Activation Mode: Activate on physical drive failure (default)
Controller Temperature (C): 72
Cache Module Temperature (C): 36
Number of Ports: 2 Internal only
Encryption: Disabled
Express Local Encryption: False
Driver Name: hpsa
Driver Version: 3.4.12
Driver Supports HP SSD Smart Path: True
PCI Address (Domain:Bus:Device.Function): 0000:0A:00.0
Negotiated PCIe Data Rate: PCIe 3.0 x8 (7880 MB/s)
Controller Mode: RAID Mode
Controller Mode Reboot: Not Required
Latency Scheduler Setting: Disabled
Current Power Mode: MaxPerformance
Host Serial Number: CZ250305FS
Sanitize Erase Supported: False
Primary Boot Volume: None
Secondary Boot Volume: None
Port Name: 2I
Port ID: 0
Port Connection Number: 0
SAS Address: 500143803366B9C0
Port Location: Internal
Managed Cable Connected: False
Port Name: 1I
Port ID: 1
Port Connection Number: 1
SAS Address: 500143803366B9C4
Port Location: Internal
Managed Cable Connected: False
Internal Drive Cage at Port 1I, Box 1, OK
Power Supply Status: Not Redundant
Drive Bays: 4
Port: 1I
Box: 1
Location: Internal
Physical Drives
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, Solid State SATA, 500 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, Solid State SATA, 500 GB, OK)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, Solid State SATA, 500 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, Solid State SATA, 500 GB, OK)
None attached
Internal Drive Cage at Port 2I, Box 0, OK
Power Supply Status: Not Redundant
Drive Bays: 4
Port: 2I
Box: 0
Location: Internal
Physical Drives
None attached
None attached
Array: A
Interface Type: Solid State SATA
Unused Space: 0 MB (0.0%)
Used Space: 1.8 TB (100.0%)
Status: OK
Array Type: Data
HP SSD Smart Path: enable
Logical Drive: 1
Size: 931.5 GB
Fault Tolerance: 1+0
Heads: 255
Sectors Per Track: 32
Cylinders: 65535
Strip Size: 256 KB
Full Stripe Size: 512 KB
Status: Ready for Rebuild
Caching: Disabled
Unique Identifier: XXXXXXXXX
Disk Name: /dev/sda
Mount Points: /boot/efi 200 MB Partition Number 2, /boot 500 MB Partition Number 3
OS Status: LOCKED
Logical Drive Label: 026ACA51PDNNK0ARH7Q0B9471B
Mirror Group 1:
Smart HBA H240 in Slot 2
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, Solid State SATA, 500 GB, OK)
Smart HBA H240 in Slot 2
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, Solid State SATA, 500 GB, OK)
Mirror Group 2:
Smart HBA H240 in Slot 2
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, Solid State SATA, 500 GB, OK)
Smart HBA H240 in Slot 2
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, Solid State SATA, 500 GB, OK)
Drive Type: Data
LD Acceleration Method: HP SSD Smart Path
physicaldrive 1I:1:1
Port: 1I
Box: 1
Bay: 1
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 500 GB
Drive exposed to OS: False
Native Block Size: 512
Firmware Revision: EMT01B6Q
Serial Number: XXXXXXXXX
Model: ATA Samsung SSD 850
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 27
Maximum Temperature (C): 70
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
Sanitize Erase Supported: False
physicaldrive 1I:1:2
Port: 1I
Box: 1
Bay: 2
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 500 GB
Drive exposed to OS: False
Native Block Size: 512
Firmware Revision: EMT01B6Q
Serial Number: XXXXXXXXX
Model: ATA Samsung SSD 850
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 27
Maximum Temperature (C): 70
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
physicaldrive 1I:1:3
Port: 1I
Box: 1
Bay: 3
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 500 GB
Drive exposed to OS: False
Native Block Size: 512
Firmware Revision: EMT01B6Q
Serial Number: XXXXXXXXX
Model: ATA Samsung SSD 850
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 28
Maximum Temperature (C): 70
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
physicaldrive 1I:1:4
Port: 1I
Box: 1
Bay: 4
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 500 GB
Drive exposed to OS: False
Native Block Size: 512
Firmware Revision: EMT01B6Q
Serial Number: XXXXXXXXX
Model: ATA Samsung SSD 850
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 28
Maximum Temperature (C): 70
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
答案1
我不想将其作为重复项关闭,但您应该安装 HP 管理代理来提供服务器健康信息。这是可用的通过 yum或使用支持站点适用于 ProLiant DL120 Gen9 和 RHEL7。
看:HP ProLiant DL380e Gen8 服务器 - SPP 使用一些想法...
至少,你可以使用hpssacli 工具根据需要为您提供实际的 RAID 控制器信息。
但要明白,当您包含其他实用程序时,服务器还能够发送电子邮件、SNMP 陷阱和记录健康事件。