为什么interrupts
没有遍布所有核心?
Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi,100.0%si, 0.0%st
Cpu1 : 25.2%us, 32.6%sy, 0.0%ni, 12.6%id, 26.2%wa, 0.0%hi, 3.3%si, 0.0%st
Cpu2 : 29.0%us, 15.0%sy, 0.0%ni, 29.3%id, 26.7%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 16.0%us, 21.7%sy, 0.0%ni, 34.3%id, 27.7%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu4 : 26.0%us, 14.3%sy, 0.0%ni, 33.7%id, 25.7%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu5 : 15.0%us, 15.0%sy, 0.0%ni, 44.2%id, 25.2%wa, 0.0%hi, 0.7%si, 0.0%st
Cpu6 : 13.0%us, 13.3%sy, 0.0%ni, 42.2%id, 31.2%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu7 : 9.7%us, 11.0%sy, 0.0%ni, 56.3%id, 23.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 13.0%us, 12.6%sy, 0.0%ni, 49.2%id, 25.2%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu9 : 9.6%us, 7.3%sy, 0.0%ni, 69.1%id, 13.6%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu10 : 8.9%us, 7.9%sy, 0.0%ni, 54.8%id, 28.1%wa, 0.0%hi, 0.3%si, 0.0%st
没有任何明显的原因,我的服务器开始工作不好,检查顶部后我注意到只有一个核心处理 100% 的中断。
猫/ proc /中断
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11
0: 213 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-edge timer
8: 1 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-edge rtc0
9: 1 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi
16: 557 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb6, uhci_hcd:usb7, uhci_hcd:usb8
17: 4373632 89953 0 0 0 10737111 0 0 0 0 0 22943776 IO-APIC-fasteoi firewire_ohci
19: 48 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb3, uhci_hcd:usb4, uhci_hcd:usb5
24: 378 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-fasteoi nouveau
34: 232 0 0 0 0 0 0 0 0 0 0 0 IO-APIC-fasteoi hda_intel
64: 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge aerdrv, PCIe PME
65: 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge aerdrv, PCIe PME
66: 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge aerdrv, PCIe PME
67: 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME, pciehp
68: 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME, pciehp
69: 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge PCIe PME, pciehp
70: 27356052 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge mpt2sas0
71: 360910 0 0 0 10388 366203 0 660341 0 0 0 1011704 PCI-MSI-edge ahci
72: 7 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth0
73: 3223115 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth0-TxRx-0
74: 6 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth1
75: 3573711 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth1-TxRx-0
76: 6 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth2
77: 3548069 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth2-TxRx-0
78: 6 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth3
79: 3290681 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth3-TxRx-0
80: 6 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth4
81: 3319709 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth4-TxRx-0
82: 7 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth5
83: 3294914 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge eth5-TxRx-0
84: 223 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge hda_intel
85: 4 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge ioat-msix
86: 4 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge ioat-msix
87: 4 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge ioat-msix
88: 4 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge ioat-msix
89: 4 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge ioat-msix
90: 4 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge ioat-msix
91: 4 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge ioat-msix
92: 4 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI-edge ioat-msix
NMI: 20083 11292 9555 10288 8470 9085 7319 7726 6190 6286 5305 5966 Non-maskable interrupts
LOC: 12625312 12863741 12757467 12819307 12735818 12636631 12594014 12340042 12351248 11896407 11976946 11309230 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 20083 11292 9555 10288 8470 9085 7319 7726 6190 6286 5305 5966 Performance monitoring interrupts
IWI: 0 0 0 0 0 0 0 0 0 0 0 0 IRQ work interrupts
RES: 2102300 11881309 11859706 12689803 11274676 10461216 9626798 8188722 7976358 6329291 6344685 4528014 Rescheduling interrupts
CAL: 732819 20016455 15519 15361 17958 23935 23377 43079 40287 108860 70814 257653 Function call interrupts
TLB: 7589 72270 46673 99284 46373 121129 43286 101506 34109 78720 28570 70600 TLB shootdowns
TRM: 0 0 0 0 0 0 0 0 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 0 0 0 0 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 0 0 0 0 0 0 0 0 Machine check exceptions
MCP: 44 44 44 44 44 44 44 44 44 44 44 44 Machine check polls
ERR: 0
MIS: 0
cat /proc/interrupts 表示所有网络网卡都由 CPU0 处理,我猜问题就在这里。
网络配置为 BONDING_OPTS="mode=4 miimon=100 xmit_hash_policy=layer3+4"
我尝试过的:
- 运行 irqbalance
- 重启
答案1
使用以下命令检查中断 CPU 亲和性:
cat /proc/irq/70/smp_affinity
我选择 70 是因为它与存储卡驱动程序 mpt2sas0 相关联。您可能还想对所有其他驱动程序重复检查,尤其是如果您要处理大量流量的网卡。
您希望该设置报告值呸因为这意味着所有 CPU 都可以处理该中断。
您可以关注文档来自 RedHat 作为参考。
答案2
您可能需要使用中断平衡。