服务器:Ubuntu Lucid
RAID 控制器:Adaptec 3805
HP Proliant DL180 G5 硬件上的 RAID6 中的 8 个磁盘
我的 kern.log 告诉我 sdb 上有一个错误,如下所示:
[2740390.344436] sd 4:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[2740390.344439] sd 4:0:1:0: [sdb] Sense Key : Hardware Error [current]
[2740390.344442] sd 4:0:1:0: [sdb] Add. Sense: Internal target failure
[2740390.344447] sd 4:0:1:0: [sdb] CDB: Read(10): 28 00 33 dd dc 00 00 00 08 00
[2740390.344454] end_request: I/O error, dev sdb, sector 870177792
[2774094.573841] sd 4:0:1:0: [sdb] Unhandled sense code
[2774094.573847] sd 4:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[2774094.573851] sd 4:0:1:0: [sdb] Sense Key : Hardware Error [current]
[2774094.573856] sd 4:0:1:0: [sdb] Add. Sense: Internal target failure
[2774094.573862] sd 4:0:1:0: [sdb] CDB: Read(16): 88 00 00 00 00 01 33 dd ef e8 00 00 01 00 00 00
[2774094.573873] end_request: I/O error, dev sdb, sector 5165150184
[2774094.615437] sd 4:0:1:0: [sdb] Unhandled sense code
arcconf 命令告诉我所有磁盘状态均为在线且条带故障:是
如何识别 8 个磁盘 raid6 阵列中哪个磁盘是坏的?
修正:2012 年 5 月 2 日 — 添加以下内容:
/usr/local/sbin/arcconf getconfig 1 AL
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
Controller Status : Optimal
Channel description : SAS/SATA
Controller Model : Adaptec 3805
Controller Serial Number : 0C18115C3BB
Temperature : 0 C/ 32 F (Normal)
Installed memory : 128 MB
Copyback : Disabled
Background consistency check : Disabled
Automatic Failover : Enabled
Global task priority : High
Stayawake period : Disabled
Spinup limit internal drives : 0
Spinup limit external drives : 0
Defunct disk drive count : 0
Logical devices/Failed/Degraded : 2/0/0
NCQ status : Enabled
--------------------------------------------------------
Controller Version Information
--------------------------------------------------------
BIOS : 5.2-0 (17342)
Firmware : 5.2-0 (17342)
Driver : 1.1-5 (2461)
Boot Flash : 5.2-0 (17342)
--------------------------------------------------------
Controller Battery Information
--------------------------------------------------------
Status : Optimal
Over temperature : No
Capacity remaining : 99 percent
Time remaining (at current draw) : 3 days, 1 hours, 11 minutes
----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical device number 0
Logical device name : boot
RAID level : 1
Status of logical device : Optimal
Size : 476150 MB
Read-cache mode : Enabled
Write-cache mode : Enabled (write-back)
Write-cache setting : Enabled (write-back)
Partitioned : Yes
Protected by Hot-Spare : No
Bootable : Yes
Failed stripes : No
Power settings : Disabled
--------------------------------------------------------
Logical device segment information
--------------------------------------------------------
Segment 0 : Present (0,7) Z2AD1A3H
Segment 1 : Present (0,3) Z2AD1834
Logical device number 1
Logical device name : data
RAID level : 6 Reed-Solomon
Status of logical device : Optimal
Size : 2858990 MB
Stripe-unit size : 128 KB
Read-cache mode : Enabled
Write-cache mode : Enabled (write-back)
Write-cache setting : Enabled (write-back)
Partitioned : Yes
Protected by Hot-Spare : No
Bootable : No
Failed stripes : Yes
Power settings : Disabled
--------------------------------------------------------
Logical device segment information
--------------------------------------------------------
Segment 0 : Present (0,0) 6VPEFSZ0
Segment 1 : Present (0,1) 5VPA5934
Segment 2 : Present (0,2) 5VPA7132
Segment 3 : Present (0,4) 5VPAJ8EJ
Segment 4 : Present (0,5) 5VPA6NAZ
Segment 5 : Present (0,6) 5VPAJM8Q
----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
Device #0
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SATA 3.0 Gb/s
Reported Channel,Device(T:L) : 0,0(0:0)
Reported Location : Connector 0, Device 0
Vendor : ST375052
Model : 5AS
Firmware : JC4B
Serial number : 6VPEFSZ0
Size : 715404 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
NCQ status : Enabled
Device #1
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SATA 3.0 Gb/s
Reported Channel,Device(T:L) : 0,1(1:0)
Reported Location : Connector 0, Device 1
Vendor : ST375052
Model : 5AS
Firmware : JC4B
Serial number : 5VPA5934
Size : 715404 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
NCQ status : Enabled
Device #2
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SATA 3.0 Gb/s
Reported Channel,Device(T:L) : 0,2(2:0)
Reported Location : Connector 0, Device 2
Vendor : ST375052
Model : 5AS
Firmware : JC4B
Serial number : 5VPA7132
Size : 715404 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
NCQ status : Enabled
Device #3
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SATA 3.0 Gb/s
Reported Channel,Device(T:L) : 0,3(3:0)
Reported Location : Connector 0, Device 3
Vendor : ST500DM0
Model : 02-1BD142
Firmware : KC44
Serial number : Z2AD1834
Size : 476940 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
NCQ status : Enabled
Device #4
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SATA 3.0 Gb/s
Reported Channel,Device(T:L) : 0,4(4:0)
Reported Location : Connector 1, Device 0
Vendor : ST375052
Model : 5AS
Firmware : JC4B
Serial number : 5VPAJ8EJ
Size : 715404 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
NCQ status : Enabled
Device #5
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SATA 3.0 Gb/s
Reported Channel,Device(T:L) : 0,5(5:0)
Reported Location : Connector 1, Device 1
Vendor : ST375052
Model : 5AS
Firmware : JC4B
Serial number : 5VPA6NAZ
Size : 715404 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
NCQ status : Enabled
Device #6
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SATA 3.0 Gb/s
Reported Channel,Device(T:L) : 0,6(6:0)
Reported Location : Connector 1, Device 2
Vendor : ST375052
Model : 5AS
Firmware : JC4B
Serial number : 5VPAJM8Q
Size : 715404 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
NCQ status : Enabled
Device #7
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SATA 3.0 Gb/s
Reported Channel,Device(T:L) : 0,7(7:0)
Reported Location : Connector 1, Device 3
Vendor : ST500DM0
Model : 02-1BD142
Firmware : KC44
Serial number : Z2AD1A3H
Size : 476940 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
NCQ status : Enabled
Command completed successfully.
更新以下添加的分区信息:
**fdisk -l**
Disk /dev/sda: 499.3 GB, 499289948160 bytes
255 heads, 63 sectors/track, 60701 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0002ab26
Device Boot Start End Blocks Id System
/dev/sda1 * 1 59952 481562624 83 Linux
/dev/sda2 59953 60702 6022145 5 Extended
/dev/sda5 59953 60702 6022144 82 Linux swap / Solaris
WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.
Disk /dev/sdb: 2997.9 GB, 2997878784000 bytes
255 heads, 63 sectors/track, 364471 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sdb1 1 267350 2147483647+ ee GPT
**df -h**
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 453G 112G 319G 26% /
none 1000M 224K 1000M 1% /dev
none 1005M 0 1005M 0% /dev/shm
none 1005M 664K 1004M 1% /var/run
none 1005M 4.0K 1005M 1% /var/lock
none 1005M 0 1005M 0% /lib/init/rw
/dev/sdb1 2.7T 1.5T 1.1T 58% /media/raid1
/dev/sdb1 2.7T 1.5T 1.1T 58% /media/usbhd-sdb1
/dev/sda1 453G 112G 319G 26% /media/usbhd-sda1
**fstab**
# /etc/fstab: static file system information.
#
# Use 'blkid -o value -s UUID' to print the universally unique identifier
# for a device; this may be used with UUID= as a more robust way to name
# devices that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc nodev,noexec,nosuid 0 0
# / was on /dev/sda1 during installation
UUID=12dd3c31-6dba-4c26-ba81-88a76510bffd / ext4 errors=remount-ro 0 1
# swap was on /dev/sda5 during installation
UUID=81618042-ec4e-45e9-947f-9198d29651d3 none swap sw 0 0
UUID=a7832728-5bf9-45c4-8a29-2824b4f2c250 /media/raid1 ext4 errors=remount-ro,noatime 0 1
答案1
如果我没记错的话,这些错误是在告诉您,您遇到了 RAID 控制器尚未纠正的错误。RAID 控制器应该会向您隐藏此类错误。我认为您遇到的不是简单的磁盘故障。我认为您遇到了更严重的事情。
答案2
假设您的 raid-setup 中的卷“boot”被识别为 sda 而“data”被识别为 sdb,则系统会告诉您以下内容:
[2740390.344436] sd 4:0:1:0:[sdb] 结果:hostbyte=DID_OK driverbyte=DRIVER_SENSE
scsi 子系统向低级驱动程序(用于您的 adaptec 卡)发出了一个没有错误的命令,并且卡响应了一个错误(设置了 DRIVE_SENSE)。
[2740390.344439] sd 4:0:1:0: [sdb] 感知键:硬件错误 [当前]
这是错误类型(参见scsi 驱动程序信息)。
[2740390.344442] sd 4:0:1:0:[sdb] Add. Sense:内部目标故障
这是驾驶员报告的附加信息,而据我所知,此信息的意思是“没有具体信息”/“不知道出了什么问题”。
[2740390.344454] end_request:I/O 错误,dev sdb,扇区 870177792
错误已到达块层。
正如另一个答案所述:这不是单个磁盘故障,而是整个 raid 故障。您应该仔细检查数据并考虑更换 raid 子系统或至少更换控制器。
并且您应该始终(!) 在您的 RAID 控制器上启用“后台一致性检查”/“被动扫描”/“验证”来查找静默损坏,否则在重建时可能会杀死您的 RAID。
您是否看到任何文件系统错误?/dev/sdb 是否已分区/挂载?
答案3
这听起来很有趣,但是您是否查看过服务器的前面,看看哪个驱动器的错误 LED 亮了?(假设驱动器有 LED)
您还可以安装存储管理器软件: http://www.adaptec.com/en-us/downloads/storage_manager/sm/productid=sas-3805&dn=adaptec+raid+3805.html
答案4
如果您可以重新启动服务器,请从 SmartStart DVD 执行此操作。如果我没记错的话,您可以从那里访问 ACU 以获得 RAID 卷的图形视图。