我遇到了 Adaptec 5805 RAID 卡的问题
http://www.adaptec.com/en-us/support/raid/sas_raid/sas-5805/
(磁盘阵列中有两块 SAS 磁盘)和技嘉主板 GA-H67A-D3H-B3
http://www.gigabyte.com/products/product-page.aspx?pid=3866#sp
运行 CENTOS 6 作为 Web 服务器。
简而言之:当我启动服务器时,RAID 卡全速运行,传输速率超过 250Mb/s。不到 60 分钟,我收到一个 IRQ 错误,IRQ 16 停止,从那时起,该卡的传输速率不超过 2.5Mb/s(但可以工作)。我需要修复它,以便卡始终全速运行。
很长的故事 :
1] 主板没有 PCIe x8 插槽来安装 RAID 卡。我尝试了 x16 插槽,但当插入此插槽时,根本检测不到卡,系统在没有卡的情况下启动。所以我使用了 x4 插槽,卡(令我惊讶的是)工作得很好。除了 IRQ...
2] 有两个 SATA 磁盘连接到主板,每个磁盘在其通道上都作为主磁盘
三星 HD502HJ 三星 HD103UJ
然后,在第一个普通 PCI 插槽中有一个额外的网卡(在上面的链接的图片中,它是主板上“DUAL BOOT”描述旁边最右边的白色 PCI 插槽)。
并且 RAID 卡位于 PCIeX4 插槽中(位于三个白色 PCI 插槽旁边)
没有使用其他任何东西,我没有使用任何 USB 设备或其他任何东西,只有两个 SATA 磁盘、两个网络连接器(主板和卡)和连接了两个 SAS 磁盘的 raid 卡
3]系统就像我说的Centos 6
uname -a
Linux 2.6.32-71.29.1.el6.x86_64 #1 SMP Mon Jun 27 19:49:27 BST 2011 x86_64 x86_64 x86_64 GNU/Linux
CPU 是
Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
lspci-v
00:00.0 Host bridge: Intel Corporation Sandy Bridge DRAM Controller (rev 09)
Flags: bus master, fast devsel, latency 0
Capabilities: [e0] Vendor Specific Information <?>
00:02.0 VGA compatible controller: Intel Corporation Sandy Bridge Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
Subsystem: Giga-byte Technology Device d000
Flags: bus master, fast devsel, latency 0, IRQ 10
Memory at fb400000 (64-bit, non-prefetchable) [size=4M]
Memory at e0000000 (64-bit, prefetchable) [size=256M]
I/O ports at ff00 [size=64]
Expansion ROM at <unassigned> [disabled]
Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
Capabilities: [d0] Power Management version 2
Capabilities: [a4] PCI Advanced Features
00:16.0 Communication controller: Intel Corporation Cougar Point HECI Controller #1 (rev 04)
Subsystem: Giga-byte Technology Device 1c3a
Flags: bus master, fast devsel, latency 0, IRQ 10
Memory at fbfff000 (64-bit, non-prefetchable) [size=16]
Capabilities: [50] Power Management version 3
Capabilities: [8c] MSI: Enable- Count=1/1 Maskable- 64bit+
00:1a.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #2 (rev 05) (prog-if 20 [EHCI])
Subsystem: Giga-byte Technology Device 5006
Flags: bus master, medium devsel, latency 0, IRQ 18
Memory at fbffe000 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2
Capabilities: [58] Debug port: BAR=1 offset=00a0
Capabilities: [98] PCI Advanced Features
Kernel driver in use: ehci_hcd
00:1c.0 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
Memory behind bridge: fb800000-fbbfffff
Prefetchable memory behind bridge: 00000000dc000000-00000000dc0fffff
Capabilities: [40] Express Root Port (Slot+), MSI 00
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [90] Subsystem: Giga-byte Technology Device 5001
Capabilities: [a0] Power Management version 2
Kernel driver in use: pcieport
Kernel modules: shpchp
00:1c.5 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 6 (rev b5) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
I/O behind bridge: 0000d000-0000dfff
Prefetchable memory behind bridge: 00000000fbd00000-00000000fbdfffff
Capabilities: [40] Express Root Port (Slot+), MSI 00
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [90] Subsystem: Giga-byte Technology Device 5001
Capabilities: [a0] Power Management version 2
Kernel driver in use: pcieport
Kernel modules: shpchp
00:1c.6 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5) (prog-if 01 [Subtractive decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=03, subordinate=04, sec-latency=0
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: fbc00000-fbcfffff
Prefetchable memory behind bridge: 00000000dc100000-00000000dc1fffff
Capabilities: [40] Express Root Port (Slot+), MSI 00
Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
Capabilities: [90] Subsystem: Giga-byte Technology Device 5001
Capabilities: [a0] Power Management version 2
00:1c.7 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 8 (rev b5) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=05, subordinate=05, sec-latency=0
Memory behind bridge: fbe00000-fbefffff
Capabilities: [40] Express Root Port (Slot+), MSI 00
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [90] Subsystem: Giga-byte Technology Device 5001
Capabilities: [a0] Power Management version 2
Kernel driver in use: pcieport
Kernel modules: shpchp
00:1d.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #1 (rev 05) (prog-if 20 [EHCI])
Subsystem: Giga-byte Technology Device 5006
Flags: bus master, medium devsel, latency 0, IRQ 23
Memory at fbffd000 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2
Capabilities: [58] Debug port: BAR=1 offset=00a0
Capabilities: [98] PCI Advanced Features
Kernel driver in use: ehci_hcd
00:1f.0 ISA bridge: Intel Corporation Cougar Point LPC Controller (rev 05)
Subsystem: Giga-byte Technology Device 5001
Flags: bus master, medium devsel, latency 0
Capabilities: [e0] Vendor Specific Information <?>
Kernel modules: iTCO_wdt
00:1f.2 IDE interface: Intel Corporation Cougar Point 4 port SATA IDE Controller (rev 05) (prog-if 8f [Master SecP SecO PriP PriO])
Subsystem: Giga-byte Technology Device b002
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19
I/O ports at fe00 [size=8]
I/O ports at fd00 [size=4]
I/O ports at fc00 [size=8]
I/O ports at fb00 [size=4]
I/O ports at fa00 [size=16]
I/O ports at f900 [size=16]
Capabilities: [70] Power Management version 3
Capabilities: [b0] PCI Advanced Features
Kernel driver in use: ata_piix
Kernel modules: ata_generic, pata_acpi, ata_piix
00:1f.3 SMBus: Intel Corporation Cougar Point SMBus Controller (rev 05)
Subsystem: Giga-byte Technology Device 5001
Flags: medium devsel, IRQ 18
Memory at fbffc000 (64-bit, non-prefetchable) [size=256]
I/O ports at 0500 [size=32]
Kernel driver in use: i801_smbus
Kernel modules: i2c-i801
00:1f.5 IDE interface: Intel Corporation Cougar Point 2 port SATA IDE Controller (rev 05) (prog-if 85 [Master SecO PriO])
Subsystem: Giga-byte Technology Device b002
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19
I/O ports at f700 [size=8]
I/O ports at f600 [size=4]
I/O ports at f500 [size=8]
I/O ports at f400 [size=4]
I/O ports at f300 [size=16]
I/O ports at f200 [size=16]
Capabilities: [70] Power Management version 3
Capabilities: [b0] PCI Advanced Features
Kernel driver in use: ata_piix
Kernel modules: ata_generic, pata_acpi, ata_piix
01:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09)
Subsystem: Adaptec ASR5805
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at fb800000 (64-bit, non-prefetchable) [size=2M]
[virtual] Expansion ROM at dc000000 [disabled] [size=512K]
Capabilities: [98] Power Management version 2
Capabilities: [a0] MSI: Enable- Count=1/2 Maskable- 64bit+
Capabilities: [d0] Express Endpoint, MSI 00
Capabilities: [90] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: aacraid
Kernel modules: aacraid
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Flags: bus master, fast devsel, latency 0, IRQ 32
I/O ports at de00 [size=256]
Memory at fbdff000 (64-bit, prefetchable) [size=4K]
Memory at fbdf8000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 01
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Capabilities: [d0] Vital Product Data
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel <?>
Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
Kernel driver in use: r8169
Kernel modules: r8169
03:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892 (rev 30) (prog-if 01 [Subtractive decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=03, secondary=04, subordinate=04, sec-latency=32
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: fbc00000-fbcfffff
Prefetchable memory behind bridge: 00000000dc100000-00000000dc1fffff
Capabilities: [90] Power Management version 2
Capabilities: [a0] Subsystem: Giga-byte Technology Device 5000
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
Subsystem: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 18
I/O ports at ee00 [size=256]
Memory at fbcff000 (32-bit, non-prefetchable) [size=256]
[virtual] Expansion ROM at dc100000 [disabled] [size=64K]
Capabilities: [dc] Power Management version 2
Kernel driver in use: r8169
Kernel modules: r8169
05:00.0 USB Controller: Device 1b6f:7023 (rev 01) (prog-if 30)
Subsystem: Device 1b6f:7023
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at fbef8000 (64-bit, non-prefetchable) [size=32K]
Capabilities: [50] Power Management version 3
Capabilities: [70] MSI: Enable- Count=1/4 Maskable+ 64bit+
Capabilities: [a0] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [190] Device Serial Number 01-01-01-01-01-01-01-01
lspci-vv
01:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09)
Subsystem: Adaptec ASR5805
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 4 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at fb800000 (64-bit, non-prefetchable) [size=2M]
[virtual] Expansion ROM at dc000000 [disabled] [size=512K]
Capabilities: [98] Power Management version 2
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a0] MSI: Enable- Count=1/2 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [d0] Express (v1) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s, Latency L0 <128ns, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [90] Vital Product Data
Unknown small resource type 00, will not decode more.
Capabilities: [100] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Kernel driver in use: aacraid
Kernel modules: aacraid
猫/ proc /中断
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 128 0 0 0 0 0 0 0 IO-APIC-edge timer
1: 105 0 606 4366 0 0 0 0 IO-APIC-edge i8042
8: 1 0 0 0 0 0 0 0 IO-APIC-edge rtc0
9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi
16: 1381 0 197881 730 0 0 0 9 IO-APIC-fasteoi aacraid
18: 1695 0 0 0 13372 60347990 0 0 IO-APIC-fasteoi ehci_hcd:usb1, eth1
19: 4637 0 14949 6352494 0 0 0 106473 IO-APIC-fasteoi ata_piix, ata_piix
23: 33 0 27 12 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb2
24: 291 0 0 0 0 0 0 0 HPET_MSI-edge hpet2
25: 0 0 0 0 0 0 0 0 HPET_MSI-edge hpet3
26: 0 0 0 0 0 0 0 0 HPET_MSI-edge hpet4
27: 0 0 0 0 0 0 0 0 HPET_MSI-edge hpet5
28: 0 0 0 0 0 0 0 0 HPET_MSI-edge hpet6
32: 1275 0 0 0 0 1905 21317086 0 PCI-MSI-edge eth0
NMI: 1873 10150 1974 1672 702 3046 1825 780 Non-maskable interrupts
LOC: 17501877 13611350 13868117 3612581 1520650 1850972 8633075 1486682 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 0 0 0 0 Performance monitoring interrupts
PND: 0 0 0 0 0 0 0 0 Performance pending work
RES: 5238 34250 12858 4299 1555 4833 5663 2485 Rescheduling interrupts
CAL: 334 302 429 414 421 464 465 468 Function call interrupts
TLB: 7863 154723 12147 11152 14099 33766 42580 11065 TLB shootdowns
TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 0 0 0 0 Machine check exceptions
MCP: 293 293 293 293 293 293 293 293 Machine check polls
ERR: 7
MIS: 0
使用的模块是来自 elrepo 的适用于 Centos 6 的内核模块 kmod-aacraid
Installed Packages
Name : kmod-aacraid
Arch : x86_64
Version : 1.1.7
Release : 1.el6.elrepo
Size : 340 k
Repo : installed
From repo : elrepo
Summary : aacraid kernel module(s)
URL : http://www.adaptec.com/
License : GPLv2
Description: This package provides the aacraid kernel module(s) built
: for the Linux kernel using the x86_64 family of processors.
以及日志中的错误
Dec 15 14:02:33 kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
Dec 15 14:02:33 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-71.29.1.el6.x86_64 #1
Dec 15 14:02:33 kernel: Call Trace:
Dec 15 14:02:33 kernel: <IRQ> [<ffffffff810da96b>] __report_bad_irq+0x2b/0xa0
Dec 15 14:02:33 kernel: [<ffffffff810dab6c>] note_interrupt+0x18c/0x1d0
Dec 15 14:02:33 kernel: [<ffffffff810db255>] handle_fasteoi_irq+0xc5/0xf0
Dec 15 14:02:33 kernel: [<ffffffff81015fb9>] handle_irq+0x49/0xa0
Dec 15 14:02:33 kernel: [<ffffffff814d093c>] do_IRQ+0x6c/0xf0
Dec 15 14:02:33 kernel: [<ffffffff81013ad3>] ret_from_intr+0x0/0x11
Dec 15 14:02:33 kernel: <EOI> [<ffffffff812da962>] ? acpi_idle_enter_c1+0xa3/0xc1
Dec 15 14:02:33 kernel: [<ffffffff812da941>] ? acpi_idle_enter_c1+0x82/0xc1
Dec 15 14:02:33 kernel: [<ffffffff813df687>] cpuidle_idle_call+0xa7/0x140
Dec 15 14:02:33 kernel: [<ffffffff81011e96>] cpu_idle+0xb6/0x110
Dec 15 14:02:33 kernel: [<ffffffff814c27d8>] start_secondary+0x1fc/0x23f
Dec 15 14:02:33 kernel: handlers:
Dec 15 14:02:33 kernel: [<ffffffffa002a590>] (aac_rx_intr_message+0x0/0xc0 [aacraid])
Dec 15 14:02:33 kernel: Disabling IRQ #16
我没有看到任何 IRQ 16 冲突,建议的 irqpoll 选项不会改变任何东西。我不需要 USB,所以我可以禁用它,但系统是生产系统,所以我想知道问题出在哪里,然后再开始摆弄 BIOS 或任何其他东西(我还需要尽可能减少停机时间)。
有人能帮我诊断这里的问题吗?