Debian Squeeze x86_64 上 2.6.32/2.6.39 内核的 IRQ 问题

Debian Squeeze x86_64 上 2.6.32/2.6.39 内核的 IRQ 问题

我最近组装了一台新电脑,所以所有硬件都很新。从那时起,我在运行 Debian 6.0 时遇到了一些 IRQ 问题。有时,通常运行一小时左右后,我会听到哔声,并且会出现以下信息dmesg

[ 3537.762795] irq 16: nobody cared (try booting with the "irqpoll" option)
[ 3537.762797] Pid: 0, comm: swapper Tainted: P        W  O 2.6.39-2-amd64 #1
[ 3537.762798] Call Trace:
[ 3537.762799]  <IRQ>  [<ffffffff810924d4>] ? __report_bad_irq+0x3a/0xa2
[ 3537.762803]  [<ffffffff810926a4>] ? note_interrupt+0x168/0x1da
[ 3537.762805]  [<ffffffff81090dd4>] ? handle_irq_event_percpu+0x171/0x18f
[ 3537.762807]  [<ffffffff8100e0e2>] ? read_tsc+0x5/0x16
[ 3537.762809]  [<ffffffff8106b8a2>] ? update_ts_time_stats+0x32/0x6b
[ 3537.762810]  [<ffffffff81090e26>] ? handle_irq_event+0x34/0x52
[ 3537.762812]  [<ffffffff81063fb7>] ? sched_clock_idle_wakeup_event+0x12/0x1c
[ 3537.762813]  [<ffffffff81092df2>] ? handle_fasteoi_irq+0x82/0xa4
[ 3537.762815]  [<ffffffff8100aadb>] ? handle_irq+0x1a/0x23
[ 3537.762816]  [<ffffffff8100a384>] ? do_IRQ+0x45/0xaa
[ 3537.762818]  [<ffffffff81332c93>] ? common_interrupt+0x13/0x13
[ 3537.762818]  <EOI>  [<ffffffff81332c8e>] ? common_interrupt+0xe/0x13
[ 3537.762821]  [<ffffffff81026800>] ? native_safe_halt+0x2/0x3
[ 3537.762829]  [<ffffffffa016ed58>] ? acpi_idle_do_entry+0x39/0x62 [processor]
[ 3537.762831]  [<ffffffffa016edde>] ? acpi_idle_enter_c1+0x5d/0xad [processor]
[ 3537.762834]  [<ffffffff81261033>] ? cpuidle_idle_call+0x11f/0x1cc
[ 3537.762835]  [<ffffffff81008dd2>] ? cpu_idle+0xab/0xe1
[ 3537.762837]  [<ffffffff8169fc60>] ? start_kernel+0x3e0/0x3eb
[ 3537.762838]  [<ffffffff8169f3c8>] ? x86_64_start_kernel+0x102/0x10f
[ 3537.762839] handlers:
[ 3537.762840] [<ffffffffa0358d5a>] (rtl8169_interrupt+0x0/0x2d7 [r8169])
[ 3537.762842] [<ffffffffa08ff2ca>] (nv_kern_isr+0x0/0x54 [nvidia])
[ 3537.762902] Disabling IRQ #16

此后,Xorg 要么占用大量 CPU,要么不稳定(甚至会完全挂起系统)。当我重新启动 Xorg 时,一切又恢复正常,直到下次重新启动时才会出现问题。

我尝试将内核从 stock 升级2.6.322.6.39不稳定存储库,但没有帮助。使用irqpoll选项启动似乎只会延长问题发生的初始时间段。

我使用的是最新的 NVIDIA 驱动程序和firmware-realtek软件包中的 Realtek 固件。我有两个以 SLI 运行的 GTX 560Ti。禁用 SLI 或完全取出一张卡也无法解决问题。

的输出uname -a为:

Linux whitestar 2.6.39-2-amd64 #1 SMP Wed Jun 8 11:01:04 UTC 2011 x86_64 GNU/Linux

的输出lspci为:

00:00.0 Host bridge: Intel Corporation Sandy Bridge DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Sandy Bridge PCI Express Root Port (rev 09)
00:01.1 PCI bridge: Intel Corporation Sandy Bridge PCI Express Root Port (rev 09)
00:16.0 Communication controller: Intel Corporation Cougar Point HECI Controller #1 (rev 04)
00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 05)
00:1a.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #2 (rev 05)
00:1b.0 Audio device: Intel Corporation Cougar Point High Definition Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 1 (rev b5)
00:1c.1 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 2 (rev b5)
00:1c.2 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 3 (rev b5)
00:1c.4 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 5 (rev b5)
00:1c.6 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5)
00:1d.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation Cougar Point LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation Cougar Point 6 port SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation Cougar Point SMBus Controller (rev 05)
01:00.0 VGA compatible controller: nVidia Corporation Device 1200 (rev a1)
01:00.1 Audio device: nVidia Corporation Device 0e0c (rev a1)
02:00.0 VGA compatible controller: nVidia Corporation Device 1200 (rev a1)
02:00.1 Audio device: nVidia Corporation Device 0e0c (rev a1)
04:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04)
06:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04)
07:00.0 PCI bridge: Device 1b21:1080 (rev 01)
08:02.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10)
08:03.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller (rev c0)

内容/proc/interrupts

CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
0:         77          0          0          0          0          0          0          0   IO-APIC-edge      timer
1:          2          0          0          0          0          0          0          0   IO-APIC-edge      i8042
8:          1          0          0          0          0          0          0          0   IO-APIC-edge      rtc0
9:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   acpi
12:          4          0          0          0          0          0          0          0   IO-APIC-edge      i8042
16:     699083          0          0          0          0          0          0          0   IO-APIC-fasteoi   nvidia, eth0
17:      87810          0          0          0          0          0          0          0   IO-APIC-fasteoi   firewire_ohci, hda_intel, nvidia
18:        242          0          0          0          0          0          0          0   IO-APIC-fasteoi   hda_intel
23:      85925          0          0          0          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb5, ehci_hcd:usb6
40:          0          0          0          0          0          0          0          0   PCI-MSI-edge      PCIe PME
41:          0          0          0          0          0          0          0          0   PCI-MSI-edge      PCIe PME
42:          0          0          0          0          0          0          0          0   PCI-MSI-edge      PCIe PME
43:          0          0          0          0          0          0          0          0   PCI-MSI-edge      PCIe PME
44:          0          0          0          0          0          0          0          0   PCI-MSI-edge      PCIe PME
45:          0          0          0          0          0          0          0          0   PCI-MSI-edge      PCIe PME
46:      79853          0          0          0          0          0          0          0   PCI-MSI-edge      ahci
48:          1          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
49:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
50:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
51:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
52:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
53:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
54:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
55:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
56:          1          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
57:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
58:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
59:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
60:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
61:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
62:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
63:          0          0          0          0          0          0          0          0   PCI-MSI-edge      xhci_hcd
64:     173506          0          0          0          0          0          0          0   PCI-MSI-edge      hda_intel
NMI:        482         89         25         13        277         24         11         10   Non-maskable interrupts
LOC:     783857     194752     114133      70577     372438     179065     117179     162016   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:        482         89         25         13        277         24         11         10   Performance monitoring interrupts
IWI:          0          0          0          0          0          0          0          0   IRQ work interrupts
RES:     131917      46750       7432       3291     150003       9576       3435       3067   Rescheduling interrupts
CAL:       2759       6563       7150       6997       5387       7140       7269       6678   Function call interrupts
TLB:       4396       2038       1336        492       5434       1896       1121        606   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:         37         37         37         37         37         37         37         37   Machine check polls
ERR:          0
MIS:          0

最后但同样重要的一点是,启动后通常会出现以下行dmesg

[   18.367094] hda-intel: IRQ timing workaround is activated for card #1. Suggest a bigger bdl_pos_adj.
[   18.458859] hda-intel: IRQ timing workaround is activated for card #2. Suggest a bigger bdl_pos_adj.

我不确定它是否相关或者是更大问题的征兆,所以我以防万一发布它。

我真的不知道这里还有什么其他相关信息。如有疑问,请随时在评论中询问。

答案1

看来我终于找到了解决这个问题的方法。

需要pci=routeirq向内核添加启动选项。如文档所述,它执行以下操作:

对所有 PCI 设备执行 IRQ 路由。这通常在 pci_enable_device() 中完成,因此此选项是针对未调用它的损坏驱动程序的临时解决方法。

看来 NVIDIA Xorg 驱动程序是罪魁祸首。我可能应该提交错误报告。

答案2

只是一种预感...进入您的 BIOS 并禁用有关图形“SERR”功能的任何功能。如果可能,您也可以尝试更新到较新的内核。

答案3

我遇到了完全相同的问题,运行 Debian 6.0,尝试了很多内核(2.6.32、2.6.38、2.6.39)和很多内核参数(“irqpoll”或“noapic”没有区别,但“acpi=off”有时可以让系统可用几天)。因此,您可以尝试使用“acpi=off”运行。

我的主板是华硕 P8H67-M EVO,你也用的是带 Sandy Bridge 芯片组的华硕主板吗?如果是,也尝试更新 BIOS,这可能会解决问题。

答案4

这里存在同样的问题,正在运行 Sandy Bridge ...

华硕 P-67 Sabertooth i7 2600k @3.4 nVidia Evga GTX 570 Debian Squeeze 2.6.39-bpo.2-amd64

我收到内核错误信息,禁用 IRQ 17,使用 firewire_ohci、hda_intel 17。

相关内容