LSI-9260-16i RAID5 系统的 3 个驱动器中出现无法纠正的介质错误

LSI-9260-16i RAID5 系统的 3 个驱动器中出现无法纠正的介质错误

我们有一台存储机(文件服务器),采用 LSI Megaraid 9260-16i RAID 系统,15 块硬盘配置为 RAID5,外加一块即将更换的离线硬盘。每块硬盘都是 4TB SATA3 磁盘。最近我们发现 RAID5 中的 3 块在线硬盘发生了以下相同的“巡检读取”错误事件日志:

Code: 0x0000005f
Class: 3
Locale: 0x02
Event Description: Patrol Read found an uncorrectable medium error on PD 1b(e0xf5/s15) at e0377d38
Event Data:
Device ID: 27
Enclosure Index: 245
Slot Number: 15
LBA: 3761732920

Code: 0x0000005f
Class: 3
Locale: 0x02
Event Description: Patrol Read found an uncorrectable medium error on PD 11(e0xf5/s5) at e0377d38
Event Data:
Device ID: 17
Enclosure Index: 245
Slot Number: 5
LBA: 3761732920

Code: 0x0000005f
Class: 3
Locale: 0x02
Event Description: Patrol Read found an uncorrectable medium error on PD 15(e0xf5/s12) at e0377d38
Event Data:
Device ID: 21
Enclosure Index: 245
Slot Number: 12
LBA: 3761732920

很奇怪的是,不可修复的介质错误在三块硬盘(e0xf5/s15)、(e0xf5/s5)和(e0xf5/s12)的同一个地址 e0377d38 处出现。这导致 RAID 卡一次又一次地自动启动后台初始化。从事件日志来看,RAID 卡似乎尝试修复此错误,但总是失败。因此我暂时中止了后台初始化。

我们不太清楚为什么会出现这个问题。最近我们更换了上述三块硬盘中的两块,即(e0xf5/s5)和(e0xf5/s12),因为大约三周前原来的硬盘坏了。看来是在更换硬盘后才出现这个问题的。

有人能建议我如何解决这个问题吗?如果无法修复会有什么后果?

最后附上这三个硬盘驱动器的 -AdpAllInfo、-LDInfo 和 -PDList 列表以供进一步了解。非常感谢您的帮助。

泰和长荣

-AdpAllInfo

Adapter #0

==============================================================================
                    Versions
                ================
Product Name    : LSI MegaRAID SAS 9260-16i
Serial No       : SV21116679
FW Package Build: 12.9.0-0038

                    Mfg. Data
                ================
Mfg. Date       : 03/17/12
Rework Date     : 00/00/00
Revision No     : 20C
Battery FRU     : N/A

                Image Versions in Flash:
                ================
BIOS Version       : 3.18.00_4.09.05.00_0x0416A000
FW Version         : 2.90.03-0933
Preboot CLI Version: 04.04-010:#%00008
WebBIOS Version    : 6.0-18-e_13-Rel
NVDATA Version     : 2.06.03-0010
Boot Block Version : 2.02.00.00-0000
BOOT Version       : 01.250.04.219

                Pending Images in Flash
                ================
None

                PCI Info
                ================
Controller Id   : 0000
Vendor Id       : 1000
Device Id       : 0079
SubVendorId     : 1000
SubDeviceId     : 9276

Host Interface  : PCIE

Number of Frontend Port: 0 
Device Interface  : PCIE

Number of Backend Port: 8 
Port  :  Address
0        500062b20037c5ff 
1        0000000000000000 
2        0000000000000000 
3        0000000000000000 
4        0000000000000000 
5        0000000000000000 
6        0000000000000000 
7        0000000000000000 

                HW Configuration
                ================
SAS Address      : 500062b20037c5c0
BBU              : Absent
Alarm            : Present
NVRAM            : Present
Serial Debugger  : Present
Memory           : Present
Flash            : Present
Memory Size      : 512MB
TPM              : Absent
On board Expander: Present
Upgrade Key      : Absent
Temperature sensor for ROC    : Absent
Temperature sensor for controller    : Absent

On board Expander FW version : 25.05.04.00

                Settings
                ================
Current Time                     : 17:15:57 3/3, 2021
Predictive Fail Poll Interval    : 300sec
Interrupt Throttle Active Count  : 16
Interrupt Throttle Completion    : 50us
Rebuild Rate                     : 30%
PR Rate                          : 30%
BGI Rate                         : 30%
Check Consistency Rate           : 30%
Reconstruction Rate              : 30%
Cache Flush Interval             : 4s
Max Drives to Spinup at One Time : 24
Delay Among Spinup Groups        : 2s
Physical Drive Coercion Mode     : Disabled
Cluster Mode                     : Disabled
Alarm                            : Disabled
Auto Rebuild                     : Enabled
Battery Warning                  : Disabled
Ecc Bucket Size                  : 15
Ecc Bucket Leak Rate             : 1440 Minutes
Restore HotSpare on Insertion    : Disabled
Expose Enclosure Devices         : Enabled
Maintain PD Fail History         : Enabled
Host Request Reordering          : Enabled
Auto Detect BackPlane Enabled    : SGPIO/i2c SEP
Load Balance Mode                : Auto
Use FDE Only                     : No
Security Key Assigned            : No
Security Key Failed              : No
Security Key Not Backedup        : No
Default LD PowerSave Policy      : Controller Defined
Maximum number of direct attached drives to spin up in 1 min : 0 
Auto Enhanced Import             : No
Any Offline VD Cache Preserved   : No
Allow Boot with Preserved Cache  : No
Disable Online Controller Reset  : No
PFK in NVRAM                     : No
Use disk activity for locate     : No
POST delay                       : 90 seconds

                Capabilities
                ================
RAID Level Supported             : RAID0, RAID1, RAID5, RAID6, RAID00, RAID10, RAID50, RAID60, PRL 11, PRL 11 with spanning, SRL 3 supported, PRL11-RLQ0 DDF layout with no span, PRL11-RLQ0 DDF layout with span
Supported Drives                 : SAS, SATA

Allowed Mixing:

Mix in Enclosure Allowed
Mix of SAS/SATA of HDD type in VD Allowed

                Status
                ================
ECC Bucket Count                 : 0

                Limitations
                ================
Max Arms Per VD          : 32 
Max Spans Per VD         : 8 
Max Arrays               : 128 
Max Number of VDs        : 64 
Max Parallel Commands    : 1008 
Max SGE Count            : 60 
Max Data Transfer Size   : 8192 sectors 
Max Strips PerIO         : 42 
Max LD per array         : 16 
Min Strip Size           : 8 KB
Max Strip Size           : 1.0 MB
Max Configurable CacheCade Size: 0 GB
Current Size of CacheCade      : 0 GB
Current Size of FW Cache       : 431 MB

                Device Present
                ================
Virtual Drives    : 1 
  Degraded        : 0 
  Offline         : 0 
Physical Devices  : 17 
  Disks           : 16 
  Critical Disks  : 0 
  Failed Disks    : 0 

                Supported Adapter Operations
                ================
Rebuild Rate                    : Yes
CC Rate                         : Yes
BGI Rate                        : Yes
Reconstruct Rate                : Yes
Patrol Read Rate                : Yes
Alarm Control                   : Yes
Cluster Support                 : No
BBU                             : Yes
Spanning                        : Yes
Dedicated Hot Spare             : Yes
Revertible Hot Spares           : Yes
Foreign Config Import           : Yes
Self Diagnostic                 : Yes
Allow Mixed Redundancy on Array : No
Global Hot Spares               : Yes
Deny SCSI Passthrough           : No
Deny SMP Passthrough            : No
Deny STP Passthrough            : No
Support Security                : No
Snapshot Enabled                : No
Support the OCE without adding drives : Yes
Support PFK                     : No
Support PI                      : No
Support Boot Time PFK Change    : No
Disable Online PFK Change       : No
Support Shield State            : No
Block SSD Write Disk Cache Change: No

                Supported VD Operations
                ================
Read Policy          : Yes
Write Policy         : Yes
IO Policy            : Yes
Access Policy        : Yes
Disk Cache Policy    : Yes
Reconstruction       : Yes
Deny Locate          : No
Deny CC              : No
Allow Ctrl Encryption: No
Enable LDBBM         : No
Support Breakmirror  : No
Power Savings        : No

                Supported PD Operations
                ================
Force Online                            : Yes
Force Offline                           : Yes
Force Rebuild                           : Yes
Deny Force Failed                       : No
Deny Force Good/Bad                     : No
Deny Missing Replace                    : No
Deny Clear                              : No
Deny Locate                             : No
Support Temperature                     : No
Disable Copyback                        : No
Enable JBOD                             : No
Enable Copyback on SMART                : No
Enable Copyback to SSD on SMART Error   : Yes
Enable SSD Patrol Read                  : No
PR Correct Unconfigured Areas           : Yes
Enable Spin Down of UnConfigured Drives : Yes
Disable Spin Down of hot spares         : Yes
Spin Down time                          : 30 
T10 Power State                         : No
                Error Counters
                ================
Memory Correctable Errors   : 0 
Memory Uncorrectable Errors : 0 

                Cluster Information
                ================
Cluster Permitted     : No
Cluster Active        : No

                Default Settings
                ================
Phy Polarity                     : 0 
Phy PolaritySplit                : 0 
Background Rate                  : 30 
Strip Size                       : 64kB
Flush Time                       : 4 seconds
Write Policy                     : WB
Read Policy                      : Adaptive
Cache When BBU Bad               : Disabled
Cached IO                        : No
SMART Mode                       : Mode 6
Alarm Disable                    : Yes
Coercion Mode                    : None
ZCR Config                       : Unknown
Dirty LED Shows Drive Activity   : No
BIOS Continue on Error           : No
Spin Down Mode                   : None
Allowed Device Type              : SAS/SATA Mix
Allow Mix in Enclosure           : Yes
Allow HDD SAS/SATA Mix in VD     : Yes
Allow SSD SAS/SATA Mix in VD     : No
Allow HDD/SSD Mix in VD          : No
Allow SATA in Cluster            : No
Max Chained Enclosures           : 16 
Disable Ctrl-R                   : Yes
Enable Web BIOS                  : Yes
Direct PD Mapping                : No
BIOS Enumerate VDs               : Yes
Restore Hot Spare on Insertion   : No
Expose Enclosure Devices         : Yes
Maintain PD Fail History         : Yes
Disable Puncturing               : No
Zero Based Enclosure Enumeration : No
PreBoot CLI Enabled              : Yes
LED Show Drive Activity          : Yes
Cluster Disable                  : Yes
SAS Disable                      : No
Auto Detect BackPlane Enable     : SGPIO/i2c SEP
Use FDE Only                     : No
Enable Led Header                : No
Delay during POST                : 0 
EnableCrashDump                  : No
Disable Online Controller Reset  : No
EnableLDBBM                      : No
Un-Certified Hard Disk Drives    : Allow
Treat Single span R1E as R10     : No
Max LD per array                 : 16
Power Saving option              : All power saving options are enabled
Default spin down time in minutes: 30 
Enable JBOD                      : No
TTY Log In Flash                 : No
Auto Enhanced Import             : No
BreakMirror RAID Support         : No
Disable Join Mirror              : No
Enable Shield State              : No
Time taken to detect CME         : 60s

Exit Code: 0x00

-LD信息

Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-5, Secondary-0, RAID Level Qualifier-3
Size                : 50.934 TB
Parity Size         : 3.637 TB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 15
Span Depth          : 1
Default Cache Policy: WriteThrough, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAdaptive, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Is VD Cached: No


Number of Dedicated Hot Spares: 1
    0 : EnclId - 245 SlotId - 15 

Exit Code: 0x00

-PD列表

Enclosure Device ID: 245
Slot Number: 5
Drive's postion: DiskGroup: 0, Span: 0, Arm: 5
Enclosure position: 0
Device Id: 17
WWN: 5000CCA25DCF7521
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors]
Coerced Size: 3.637 TB [0x1d1b00000 Sectors]
Firmware state: Online, Spun Up
Device Firmware Level: 1M02
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x500062b20037c5cd
Connected Port Number: 0(path0) 
Inquiry Data: K4H3047B            WDC WD4002FYYZ-01B7CB0                  01.01M02
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive Temperature : N/A
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Drive's write cache : Disabled
Drive's NCQ setting : Disabled
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Drive has flagged a S.M.A.R.T alert : No

Enclosure Device ID: 245
Slot Number: 12
Drive's postion: DiskGroup: 0, Span: 0, Arm: 12
Enclosure position: 0
Device Id: 21
WWN: 5000CCA244CAFCD1
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors]
Coerced Size: 3.637 TB [0x1d1b00000 Sectors]
Firmware state: Online, Spun Up
Device Firmware Level: T907
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x500062b20037c5d8
Connected Port Number: 0(path0) 
Inquiry Data: N8GT59EY            HGST HUS726040ALE610                    APGNT907
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive Temperature : N/A
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Drive's write cache : Disabled
Drive's NCQ setting : Disabled
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Drive has flagged a S.M.A.R.T alert : No

Enclosure Device ID: 245
Slot Number: 15
Enclosure position: 0
Device Id: 27
WWN: 5000CCA25DCE7105
Sequence Number: 5
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATAHotspare Information: 
Type: Dedicated, is revertible
Array #: 0

Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Non Coerced Size: 3.637 TB [0x1d1b0beb0 Sectors]
Coerced Size: 3.637 TB [0x1d1b00000 Sectors]
Firmware state: Hotspare, Spun Up
Device Firmware Level: 1M02
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x500062b20037c5db
Connected Port Number: 0(path0) 
Inquiry Data: K4H0SV7B            WDC WD4002FYYZ-01B7CB0                  01.01M02Hotspare Information: 
Type: Dedicated, is revertible
Array #: 0

FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive Temperature : N/A
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Drive's write cache : Disabled
Drive's NCQ setting : Disabled
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Drive has flagged a S.M.A.R.T alert : No


Hotspare Information: 
Type: Dedicated, is revertible
Array #: 0

相关内容