Centos 6 上的 Adaptec 5805 IRQ 16 错误

Centos 6 上的 Adaptec 5805 IRQ 16 错误

我遇到了 Adaptec 5805 RAID 卡的问题

http://www.adaptec.com/en-us/support/raid/sas_raid/sas-5805/

(磁盘阵列中有两块 SAS 磁盘)和技嘉主板 GA-H67A-D3H-B3

http://www.gigabyte.com/products/product-page.aspx?pid=3866#sp

运行 CENTOS 6 作为 Web 服务器。

简而言之:当我启动服务器时,RAID 卡全速运行,传输速率超过 250Mb/s。不到 60 分钟,我收到一个 IRQ 错误,IRQ 16 停止,从那时起,该卡的传输速率不超过 2.5Mb/s(但可以工作)。我需要修复它,以便卡始终全速运行。

很长的故事 :

1] 主板没有 PCIe x8 插槽来安装 RAID 卡。我尝试了 x16 插槽,但当插入此插槽时,根本检测不到卡,系统在没有卡的情况下启动。所以我使用了 x4 插槽,卡(令我惊讶的是)工作得很好。除了 IRQ...

2] 有两个 SATA 磁盘连接到主板,每个磁盘在其通道上都作为主磁盘

三星 HD502HJ 三星 HD103UJ

然后,在第一个普通 PCI 插槽中有一个额外的网卡(在上面的链接的图片中,它是主板上“DUAL BOOT”描述旁边最右边的白色 PCI 插槽)。

并且 RAID 卡位于 PCIeX4 插槽中(位于三个白色 PCI 插槽旁边)

没有使用其他任何东西,我没有使用任何 USB 设备或其他任何东西,只有两个 SATA 磁盘、两个网络连接器(主板和卡)和连接了两个 SAS 磁盘的 raid 卡

3]系统就像我说的Centos 6

uname -a

Linux 2.6.32-71.29.1.el6.x86_64 #1 SMP Mon Jun 27 19:49:27 BST 2011 x86_64 x86_64 x86_64 GNU/Linux

CPU 是

Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz

lspci-v

00:00.0 Host bridge: Intel Corporation Sandy Bridge DRAM Controller (rev 09)
    Flags: bus master, fast devsel, latency 0
    Capabilities: [e0] Vendor Specific Information <?>

00:02.0 VGA compatible controller: Intel Corporation Sandy Bridge Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
    Subsystem: Giga-byte Technology Device d000
    Flags: bus master, fast devsel, latency 0, IRQ 10
    Memory at fb400000 (64-bit, non-prefetchable) [size=4M]
    Memory at e0000000 (64-bit, prefetchable) [size=256M]
    I/O ports at ff00 [size=64]
    Expansion ROM at <unassigned> [disabled]
    Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
    Capabilities: [d0] Power Management version 2
    Capabilities: [a4] PCI Advanced Features

00:16.0 Communication controller: Intel Corporation Cougar Point HECI Controller #1 (rev 04)
    Subsystem: Giga-byte Technology Device 1c3a
    Flags: bus master, fast devsel, latency 0, IRQ 10
    Memory at fbfff000 (64-bit, non-prefetchable) [size=16]
    Capabilities: [50] Power Management version 3
    Capabilities: [8c] MSI: Enable- Count=1/1 Maskable- 64bit+

00:1a.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #2 (rev 05) (prog-if 20 [EHCI])
    Subsystem: Giga-byte Technology Device 5006
    Flags: bus master, medium devsel, latency 0, IRQ 18
    Memory at fbffe000 (32-bit, non-prefetchable) [size=1K]
    Capabilities: [50] Power Management version 2
    Capabilities: [58] Debug port: BAR=1 offset=00a0
    Capabilities: [98] PCI Advanced Features
    Kernel driver in use: ehci_hcd

00:1c.0 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 1 (rev b5) (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
    Memory behind bridge: fb800000-fbbfffff
    Prefetchable memory behind bridge: 00000000dc000000-00000000dc0fffff
    Capabilities: [40] Express Root Port (Slot+), MSI 00
    Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
    Capabilities: [90] Subsystem: Giga-byte Technology Device 5001
    Capabilities: [a0] Power Management version 2
    Kernel driver in use: pcieport
    Kernel modules: shpchp

00:1c.5 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 6 (rev b5) (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
    I/O behind bridge: 0000d000-0000dfff
    Prefetchable memory behind bridge: 00000000fbd00000-00000000fbdfffff
    Capabilities: [40] Express Root Port (Slot+), MSI 00
    Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
    Capabilities: [90] Subsystem: Giga-byte Technology Device 5001
    Capabilities: [a0] Power Management version 2
    Kernel driver in use: pcieport
    Kernel modules: shpchp

00:1c.6 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5) (prog-if 01 [Subtractive decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=00, secondary=03, subordinate=04, sec-latency=0
    I/O behind bridge: 0000e000-0000efff
    Memory behind bridge: fbc00000-fbcfffff
    Prefetchable memory behind bridge: 00000000dc100000-00000000dc1fffff
    Capabilities: [40] Express Root Port (Slot+), MSI 00
    Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
    Capabilities: [90] Subsystem: Giga-byte Technology Device 5001
    Capabilities: [a0] Power Management version 2

00:1c.7 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 8 (rev b5) (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=00, secondary=05, subordinate=05, sec-latency=0
    Memory behind bridge: fbe00000-fbefffff
    Capabilities: [40] Express Root Port (Slot+), MSI 00
    Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
    Capabilities: [90] Subsystem: Giga-byte Technology Device 5001
    Capabilities: [a0] Power Management version 2
    Kernel driver in use: pcieport
    Kernel modules: shpchp

00:1d.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #1 (rev 05) (prog-if 20 [EHCI])
    Subsystem: Giga-byte Technology Device 5006
    Flags: bus master, medium devsel, latency 0, IRQ 23
    Memory at fbffd000 (32-bit, non-prefetchable) [size=1K]
    Capabilities: [50] Power Management version 2
    Capabilities: [58] Debug port: BAR=1 offset=00a0
    Capabilities: [98] PCI Advanced Features
    Kernel driver in use: ehci_hcd

00:1f.0 ISA bridge: Intel Corporation Cougar Point LPC Controller (rev 05)
    Subsystem: Giga-byte Technology Device 5001
    Flags: bus master, medium devsel, latency 0
    Capabilities: [e0] Vendor Specific Information <?>
    Kernel modules: iTCO_wdt

00:1f.2 IDE interface: Intel Corporation Cougar Point 4 port SATA IDE Controller (rev 05) (prog-if 8f [Master SecP SecO PriP PriO])
    Subsystem: Giga-byte Technology Device b002
    Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19
    I/O ports at fe00 [size=8]
    I/O ports at fd00 [size=4]
    I/O ports at fc00 [size=8]
    I/O ports at fb00 [size=4]
    I/O ports at fa00 [size=16]
    I/O ports at f900 [size=16]
    Capabilities: [70] Power Management version 3
    Capabilities: [b0] PCI Advanced Features
    Kernel driver in use: ata_piix
    Kernel modules: ata_generic, pata_acpi, ata_piix

00:1f.3 SMBus: Intel Corporation Cougar Point SMBus Controller (rev 05)
    Subsystem: Giga-byte Technology Device 5001
    Flags: medium devsel, IRQ 18
    Memory at fbffc000 (64-bit, non-prefetchable) [size=256]
    I/O ports at 0500 [size=32]
    Kernel driver in use: i801_smbus
    Kernel modules: i2c-i801

00:1f.5 IDE interface: Intel Corporation Cougar Point 2 port SATA IDE Controller (rev 05) (prog-if 85 [Master SecO PriO])
    Subsystem: Giga-byte Technology Device b002
    Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19
    I/O ports at f700 [size=8]
    I/O ports at f600 [size=4]
    I/O ports at f500 [size=8]
    I/O ports at f400 [size=4]
    I/O ports at f300 [size=16]
    I/O ports at f200 [size=16]
    Capabilities: [70] Power Management version 3
    Capabilities: [b0] PCI Advanced Features
    Kernel driver in use: ata_piix
    Kernel modules: ata_generic, pata_acpi, ata_piix

01:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09)
    Subsystem: Adaptec ASR5805
    Flags: bus master, fast devsel, latency 0, IRQ 16
    Memory at fb800000 (64-bit, non-prefetchable) [size=2M]
    [virtual] Expansion ROM at dc000000 [disabled] [size=512K]
    Capabilities: [98] Power Management version 2
    Capabilities: [a0] MSI: Enable- Count=1/2 Maskable- 64bit+
    Capabilities: [d0] Express Endpoint, MSI 00
    Capabilities: [90] Vital Product Data
    Capabilities: [100] Advanced Error Reporting
    Kernel driver in use: aacraid
    Kernel modules: aacraid

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
    Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
    Flags: bus master, fast devsel, latency 0, IRQ 32
    I/O ports at de00 [size=256]
    Memory at fbdff000 (64-bit, prefetchable) [size=4K]
    Memory at fbdf8000 (64-bit, prefetchable) [size=16K]
    Capabilities: [40] Power Management version 3
    Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
    Capabilities: [70] Express Endpoint, MSI 01
    Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
    Capabilities: [d0] Vital Product Data
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [140] Virtual Channel <?>
    Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
    Kernel driver in use: r8169
    Kernel modules: r8169

03:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892 (rev 30) (prog-if 01 [Subtractive decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=03, secondary=04, subordinate=04, sec-latency=32
    I/O behind bridge: 0000e000-0000efff
    Memory behind bridge: fbc00000-fbcfffff
    Prefetchable memory behind bridge: 00000000dc100000-00000000dc1fffff
    Capabilities: [90] Power Management version 2
    Capabilities: [a0] Subsystem: Giga-byte Technology Device 5000

04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
    Subsystem: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet
    Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 18
    I/O ports at ee00 [size=256]
    Memory at fbcff000 (32-bit, non-prefetchable) [size=256]
    [virtual] Expansion ROM at dc100000 [disabled] [size=64K]
    Capabilities: [dc] Power Management version 2
    Kernel driver in use: r8169
    Kernel modules: r8169

05:00.0 USB Controller: Device 1b6f:7023 (rev 01) (prog-if 30)
    Subsystem: Device 1b6f:7023
    Flags: bus master, fast devsel, latency 0, IRQ 11
    Memory at fbef8000 (64-bit, non-prefetchable) [size=32K]
    Capabilities: [50] Power Management version 3
    Capabilities: [70] MSI: Enable- Count=1/4 Maskable+ 64bit+
    Capabilities: [a0] Express Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [190] Device Serial Number 01-01-01-01-01-01-01-01

lspci-vv

01:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09)
Subsystem: Adaptec ASR5805
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 4 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at fb800000 (64-bit, non-prefetchable) [size=2M]
[virtual] Expansion ROM at dc000000 [disabled] [size=512K]
Capabilities: [98] Power Management version 2
    Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
    Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [a0] MSI: Enable- Count=1/2 Maskable- 64bit+
    Address: 0000000000000000  Data: 0000
Capabilities: [d0] Express (v1) Endpoint, MSI 00
    DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 <1us
        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
    DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
        MaxPayload 128 bytes, MaxReadReq 512 bytes
    DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
    LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s, Latency L0 <128ns, L1 unlimited
        ClockPM- Surprise- LLActRep- BwNot-
    LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
    LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [90] Vital Product Data
    Unknown small resource type 00, will not decode more.
Capabilities: [100] Advanced Error Reporting
    UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
    UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
    UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
    CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
    CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
    AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Kernel driver in use: aacraid
Kernel modules: aacraid

猫/ proc /中断

       CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
0:        128          0          0          0          0          0          0          0   IO-APIC-edge      timer
1:        105          0        606       4366          0          0          0          0   IO-APIC-edge      i8042
8:          1          0          0          0          0          0          0          0   IO-APIC-edge      rtc0
9:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   acpi
16:       1381          0     197881        730          0          0          0          9   IO-APIC-fasteoi   aacraid
18:       1695          0          0          0      13372   60347990          0          0   IO-APIC-fasteoi   ehci_hcd:usb1, eth1
19:       4637          0      14949    6352494          0          0          0     106473   IO-APIC-fasteoi   ata_piix, ata_piix
23:         33          0         27         12          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb2
24:        291          0          0          0          0          0          0          0  HPET_MSI-edge      hpet2
25:          0          0          0          0          0          0          0          0  HPET_MSI-edge      hpet3
26:          0          0          0          0          0          0          0          0  HPET_MSI-edge      hpet4
27:          0          0          0          0          0          0          0          0  HPET_MSI-edge      hpet5
28:          0          0          0          0          0          0          0          0  HPET_MSI-edge      hpet6
32:       1275          0          0          0          0       1905   21317086          0   PCI-MSI-edge      eth0
NMI:       1873      10150       1974       1672        702       3046       1825        780   Non-maskable interrupts
LOC:   17501877   13611350   13868117    3612581    1520650    1850972    8633075    1486682   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0          0          0          0          0   Performance monitoring interrupts
PND:          0          0          0          0          0          0          0          0   Performance pending work
RES:       5238      34250      12858       4299       1555       4833       5663       2485   Rescheduling interrupts
CAL:        334        302        429        414        421        464        465        468   Function call interrupts
TLB:       7863     154723      12147      11152      14099      33766      42580      11065   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:        293        293        293        293        293        293        293        293   Machine check polls
ERR:          7
MIS:          0

使用的模块是来自 elrepo 的适用于 Centos 6 的内核模块 kmod-aacraid

Installed Packages
Name       : kmod-aacraid
Arch       : x86_64
Version    : 1.1.7
Release    : 1.el6.elrepo
Size       : 340 k
Repo       : installed
From repo  : elrepo
Summary    : aacraid kernel module(s)
URL        : http://www.adaptec.com/
License    : GPLv2
Description: This package provides the aacraid kernel module(s) built
       : for the Linux kernel using the x86_64 family of processors.

以及日志中的错误

Dec 15 14:02:33 kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
Dec 15 14:02:33 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-71.29.1.el6.x86_64 #1
Dec 15 14:02:33 kernel: Call Trace:
Dec 15 14:02:33 kernel: <IRQ>  [<ffffffff810da96b>] __report_bad_irq+0x2b/0xa0
Dec 15 14:02:33 kernel: [<ffffffff810dab6c>] note_interrupt+0x18c/0x1d0
Dec 15 14:02:33 kernel: [<ffffffff810db255>] handle_fasteoi_irq+0xc5/0xf0
Dec 15 14:02:33 kernel: [<ffffffff81015fb9>] handle_irq+0x49/0xa0
Dec 15 14:02:33 kernel: [<ffffffff814d093c>] do_IRQ+0x6c/0xf0
Dec 15 14:02:33 kernel: [<ffffffff81013ad3>] ret_from_intr+0x0/0x11
Dec 15 14:02:33 kernel: <EOI>  [<ffffffff812da962>] ? acpi_idle_enter_c1+0xa3/0xc1
Dec 15 14:02:33 kernel: [<ffffffff812da941>] ? acpi_idle_enter_c1+0x82/0xc1
Dec 15 14:02:33 kernel: [<ffffffff813df687>] cpuidle_idle_call+0xa7/0x140
Dec 15 14:02:33 kernel: [<ffffffff81011e96>] cpu_idle+0xb6/0x110
Dec 15 14:02:33 kernel: [<ffffffff814c27d8>] start_secondary+0x1fc/0x23f
Dec 15 14:02:33 kernel: handlers:
Dec 15 14:02:33 kernel: [<ffffffffa002a590>] (aac_rx_intr_message+0x0/0xc0 [aacraid])
Dec 15 14:02:33 kernel: Disabling IRQ #16

我没有看到任何 IRQ 16 冲突,建议的 irqpoll 选项不会改变任何东西。我不需要 USB,所以我可以禁用它,但系统是生产系统,所以我想知道问题出在哪里,然后再开始摆弄 BIOS 或任何其他东西(我还需要尽可能减少停机时间)。

有人能帮我诊断这里的问题吗?

相关内容