我最近组装了一台新电脑,所以所有硬件都很新。从那时起,我在运行 Debian 6.0 时遇到了一些 IRQ 问题。有时,通常运行一小时左右后,我会听到哔声,并且会出现以下信息dmesg
:
[ 3537.762795] irq 16: nobody cared (try booting with the "irqpoll" option)
[ 3537.762797] Pid: 0, comm: swapper Tainted: P W O 2.6.39-2-amd64 #1
[ 3537.762798] Call Trace:
[ 3537.762799] <IRQ> [<ffffffff810924d4>] ? __report_bad_irq+0x3a/0xa2
[ 3537.762803] [<ffffffff810926a4>] ? note_interrupt+0x168/0x1da
[ 3537.762805] [<ffffffff81090dd4>] ? handle_irq_event_percpu+0x171/0x18f
[ 3537.762807] [<ffffffff8100e0e2>] ? read_tsc+0x5/0x16
[ 3537.762809] [<ffffffff8106b8a2>] ? update_ts_time_stats+0x32/0x6b
[ 3537.762810] [<ffffffff81090e26>] ? handle_irq_event+0x34/0x52
[ 3537.762812] [<ffffffff81063fb7>] ? sched_clock_idle_wakeup_event+0x12/0x1c
[ 3537.762813] [<ffffffff81092df2>] ? handle_fasteoi_irq+0x82/0xa4
[ 3537.762815] [<ffffffff8100aadb>] ? handle_irq+0x1a/0x23
[ 3537.762816] [<ffffffff8100a384>] ? do_IRQ+0x45/0xaa
[ 3537.762818] [<ffffffff81332c93>] ? common_interrupt+0x13/0x13
[ 3537.762818] <EOI> [<ffffffff81332c8e>] ? common_interrupt+0xe/0x13
[ 3537.762821] [<ffffffff81026800>] ? native_safe_halt+0x2/0x3
[ 3537.762829] [<ffffffffa016ed58>] ? acpi_idle_do_entry+0x39/0x62 [processor]
[ 3537.762831] [<ffffffffa016edde>] ? acpi_idle_enter_c1+0x5d/0xad [processor]
[ 3537.762834] [<ffffffff81261033>] ? cpuidle_idle_call+0x11f/0x1cc
[ 3537.762835] [<ffffffff81008dd2>] ? cpu_idle+0xab/0xe1
[ 3537.762837] [<ffffffff8169fc60>] ? start_kernel+0x3e0/0x3eb
[ 3537.762838] [<ffffffff8169f3c8>] ? x86_64_start_kernel+0x102/0x10f
[ 3537.762839] handlers:
[ 3537.762840] [<ffffffffa0358d5a>] (rtl8169_interrupt+0x0/0x2d7 [r8169])
[ 3537.762842] [<ffffffffa08ff2ca>] (nv_kern_isr+0x0/0x54 [nvidia])
[ 3537.762902] Disabling IRQ #16
此后,Xorg 要么占用大量 CPU,要么不稳定(甚至会完全挂起系统)。当我重新启动 Xorg 时,一切又恢复正常,直到下次重新启动时才会出现问题。
我尝试将内核从 stock 升级2.6.32
到2.6.39
不稳定存储库,但没有帮助。使用irqpoll
选项启动似乎只会延长问题发生的初始时间段。
我使用的是最新的 NVIDIA 驱动程序和firmware-realtek
软件包中的 Realtek 固件。我有两个以 SLI 运行的 GTX 560Ti。禁用 SLI 或完全取出一张卡也无法解决问题。
的输出uname -a
为:
Linux whitestar 2.6.39-2-amd64 #1 SMP Wed Jun 8 11:01:04 UTC 2011 x86_64 GNU/Linux
的输出lspci
为:
00:00.0 Host bridge: Intel Corporation Sandy Bridge DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Sandy Bridge PCI Express Root Port (rev 09)
00:01.1 PCI bridge: Intel Corporation Sandy Bridge PCI Express Root Port (rev 09)
00:16.0 Communication controller: Intel Corporation Cougar Point HECI Controller #1 (rev 04)
00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 05)
00:1a.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #2 (rev 05)
00:1b.0 Audio device: Intel Corporation Cougar Point High Definition Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 1 (rev b5)
00:1c.1 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 2 (rev b5)
00:1c.2 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 3 (rev b5)
00:1c.4 PCI bridge: Intel Corporation Cougar Point PCI Express Root Port 5 (rev b5)
00:1c.6 PCI bridge: Intel Corporation 82801 PCI Bridge (rev b5)
00:1d.0 USB Controller: Intel Corporation Cougar Point USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation Cougar Point LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation Cougar Point 6 port SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation Cougar Point SMBus Controller (rev 05)
01:00.0 VGA compatible controller: nVidia Corporation Device 1200 (rev a1)
01:00.1 Audio device: nVidia Corporation Device 0e0c (rev a1)
02:00.0 VGA compatible controller: nVidia Corporation Device 1200 (rev a1)
02:00.1 Audio device: nVidia Corporation Device 0e0c (rev a1)
04:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04)
06:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04)
07:00.0 PCI bridge: Device 1b21:1080 (rev 01)
08:02.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8110SC/8169SC Gigabit Ethernet (rev 10)
08:03.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire II(M)] IEEE 1394 OHCI Controller (rev c0)
内容/proc/interrupts
:
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 77 0 0 0 0 0 0 0 IO-APIC-edge timer
1: 2 0 0 0 0 0 0 0 IO-APIC-edge i8042
8: 1 0 0 0 0 0 0 0 IO-APIC-edge rtc0
9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi
12: 4 0 0 0 0 0 0 0 IO-APIC-edge i8042
16: 699083 0 0 0 0 0 0 0 IO-APIC-fasteoi nvidia, eth0
17: 87810 0 0 0 0 0 0 0 IO-APIC-fasteoi firewire_ohci, hda_intel, nvidia
18: 242 0 0 0 0 0 0 0 IO-APIC-fasteoi hda_intel
23: 85925 0 0 0 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb5, ehci_hcd:usb6
40: 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME
41: 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME
42: 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME
43: 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME
44: 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME
45: 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME
46: 79853 0 0 0 0 0 0 0 PCI-MSI-edge ahci
48: 1 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
49: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
50: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
51: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
52: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
53: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
54: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
55: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
56: 1 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
57: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
58: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
59: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
60: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
61: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
62: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
63: 0 0 0 0 0 0 0 0 PCI-MSI-edge xhci_hcd
64: 173506 0 0 0 0 0 0 0 PCI-MSI-edge hda_intel
NMI: 482 89 25 13 277 24 11 10 Non-maskable interrupts
LOC: 783857 194752 114133 70577 372438 179065 117179 162016 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 482 89 25 13 277 24 11 10 Performance monitoring interrupts
IWI: 0 0 0 0 0 0 0 0 IRQ work interrupts
RES: 131917 46750 7432 3291 150003 9576 3435 3067 Rescheduling interrupts
CAL: 2759 6563 7150 6997 5387 7140 7269 6678 Function call interrupts
TLB: 4396 2038 1336 492 5434 1896 1121 606 TLB shootdowns
TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 0 0 0 0 Machine check exceptions
MCP: 37 37 37 37 37 37 37 37 Machine check polls
ERR: 0
MIS: 0
最后但同样重要的一点是,启动后通常会出现以下行dmesg
:
[ 18.367094] hda-intel: IRQ timing workaround is activated for card #1. Suggest a bigger bdl_pos_adj.
[ 18.458859] hda-intel: IRQ timing workaround is activated for card #2. Suggest a bigger bdl_pos_adj.
我不确定它是否相关或者是更大问题的征兆,所以我以防万一发布它。
我真的不知道这里还有什么其他相关信息。如有疑问,请随时在评论中询问。
答案1
看来我终于找到了解决这个问题的方法。
需要pci=routeirq
向内核添加启动选项。如文档所述,它执行以下操作:
对所有 PCI 设备执行 IRQ 路由。这通常在 pci_enable_device() 中完成,因此此选项是针对未调用它的损坏驱动程序的临时解决方法。
看来 NVIDIA Xorg 驱动程序是罪魁祸首。我可能应该提交错误报告。
答案2
只是一种预感...进入您的 BIOS 并禁用有关图形“SERR”功能的任何功能。如果可能,您也可以尝试更新到较新的内核。
答案3
我遇到了完全相同的问题,运行 Debian 6.0,尝试了很多内核(2.6.32、2.6.38、2.6.39)和很多内核参数(“irqpoll”或“noapic”没有区别,但“acpi=off”有时可以让系统可用几天)。因此,您可以尝试使用“acpi=off”运行。
我的主板是华硕 P8H67-M EVO,你也用的是带 Sandy Bridge 芯片组的华硕主板吗?如果是,也尝试更新 BIOS,这可能会解决问题。
答案4
这里存在同样的问题,正在运行 Sandy Bridge ...
华硕 P-67 Sabertooth i7 2600k @3.4 nVidia Evga GTX 570 Debian Squeeze 2.6.39-bpo.2-amd64
我收到内核错误信息,禁用 IRQ 17,使用 firewire_ohci、hda_intel 17。