在一台大量用于文件下载的服务器上,每隔几个小时服务器就会停止响应 ssh、http 和 ping 请求。服务器重启后就会恢复正常。
提供商技术人员猜测这可能是由于网络故障造成的。我想知道如何调查并解决这个问题?
这是 dmesg 日志中的最新日志。在过去 24 小时内,服务器已重启两次。
[ 7.266682] ioatdma 0000:00:16.0: setting latency timer to 64
[ 7.266726] alloc irq_desc for 65 on node -1
[ 7.266728] alloc kstat_irqs on node -1
[ 7.266731] alloc irq_2_iommu on node -1
[ 7.266736] ioatdma 0000:00:16.0: irq 65 for MSI/MSI-X
[ 7.266879] ioatdma 0000:00:16.1: enabling device (0000 -> 0002)
[ 7.266882] alloc irq_desc for 44 on node -1
[ 7.266883] alloc kstat_irqs on node -1
[ 7.266886] alloc irq_2_iommu on node -1
[ 7.266891] ioatdma 0000:00:16.1: PCI INT B -> GSI 44 (level, low) -> IRQ 44
[ 7.266902] ioatdma 0000:00:16.1: setting latency timer to 64
[ 7.266936] alloc irq_desc for 66 on node -1
[ 7.266938] alloc kstat_irqs on node -1
[ 7.266940] alloc irq_2_iommu on node -1
[ 7.266944] ioatdma 0000:00:16.1: irq 66 for MSI/MSI-X
[ 7.267097] ioatdma 0000:00:16.2: enabling device (0000 -> 0002)
[ 7.267101] alloc irq_desc for 45 on node -1
[ 7.267103] alloc kstat_irqs on node -1
[ 7.267107] alloc irq_2_iommu on node -1
[ 7.267113] ioatdma 0000:00:16.2: PCI INT C -> GSI 45 (level, low) -> IRQ 45
[ 7.267126] ioatdma 0000:00:16.2: setting latency timer to 64
[ 7.267162] alloc irq_desc for 67 on node -1
[ 7.267163] alloc kstat_irqs on node -1
[ 7.267165] alloc irq_2_iommu on node -1
[ 7.267170] ioatdma 0000:00:16.2: irq 67 for MSI/MSI-X
[ 7.267307] ioatdma 0000:00:16.3: enabling device (0000 -> 0002)
[ 7.267312] alloc irq_desc for 46 on node -1
[ 7.267314] alloc kstat_irqs on node -1
[ 7.267317] alloc irq_2_iommu on node -1
[ 7.267324] ioatdma 0000:00:16.3: PCI INT D -> GSI 46 (level, low) -> IRQ 46
[ 7.267339] ioatdma 0000:00:16.3: setting latency timer to 64
[ 7.267383] alloc irq_desc for 68 on node -1
[ 7.267386] alloc kstat_irqs on node -1
[ 7.267389] alloc irq_2_iommu on node -1
[ 7.267395] ioatdma 0000:00:16.3: irq 68 for MSI/MSI-X
[ 7.267527] ioatdma 0000:00:16.4: enabling device (0000 -> 0002)
[ 7.267531] ioatdma 0000:00:16.4: PCI INT A -> GSI 43 (level, low) -> IRQ 43
[ 7.267543] ioatdma 0000:00:16.4: setting latency timer to 64
[ 7.267587] alloc irq_desc for 69 on node -1
[ 7.267589] alloc kstat_irqs on node -1
[ 7.267593] alloc irq_2_iommu on node -1
[ 7.267599] ioatdma 0000:00:16.4: irq 69 for MSI/MSI-X
[ 7.267743] ioatdma 0000:00:16.5: enabling device (0000 -> 0002)
[ 7.267746] ioatdma 0000:00:16.5: PCI INT B -> GSI 44 (level, low) -> IRQ 44
[ 7.267759] ioatdma 0000:00:16.5: setting latency timer to 64
[ 7.267794] alloc irq_desc for 70 on node -1
[ 7.267796] alloc kstat_irqs on node -1
[ 7.267798] alloc irq_2_iommu on node -1
[ 7.267803] ioatdma 0000:00:16.5: irq 70 for MSI/MSI-X
[ 7.267950] ioatdma 0000:00:16.6: enabling device (0000 -> 0002)
[ 7.267955] ioatdma 0000:00:16.6: PCI INT C -> GSI 45 (level, low) -> IRQ 45
[ 7.267970] ioatdma 0000:00:16.6: setting latency timer to 64
[ 7.268012] alloc irq_desc for 71 on node -1
[ 7.268013] alloc kstat_irqs on node -1
[ 7.268016] alloc irq_2_iommu on node -1
[ 7.268021] ioatdma 0000:00:16.6: irq 71 for MSI/MSI-X
[ 7.268152] ioatdma 0000:00:16.7: enabling device (0000 -> 0002)
[ 7.268157] ioatdma 0000:00:16.7: PCI INT D -> GSI 46 (level, low) -> IRQ 46
[ 7.268173] ioatdma 0000:00:16.7: setting latency timer to 64
[ 7.268217] alloc irq_desc for 72 on node -1
[ 7.268219] alloc kstat_irqs on node -1
[ 7.268222] alloc irq_2_iommu on node -1
[ 7.268228] ioatdma 0000:00:16.7: irq 72 for MSI/MSI-X
[ 7.273295] i801_smbus 0000:00:1f.3: PCI INT C -> GSI 18 (level, low) -> IRQ 18
[ 7.277431] Monitor-Mwait will be used to enter C-1 state
[ 7.277533] Monitor-Mwait will be used to enter C-2 state
[ 7.278051] Monitor-Mwait will be used to enter C-3 state
[ 7.278131] processor LNXCPU:00: registered as cooling_device0
[ 7.278197] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input7
[ 7.278226] ACPI: Power Button [PWRF]
[ 7.278892] processor LNXCPU:01: registered as cooling_device1
[ 7.279463] processor LNXCPU:02: registered as cooling_device2
[ 7.280028] processor LNXCPU:03: registered as cooling_device3
[ 7.280564] processor LNXCPU:04: registered as cooling_device4
[ 7.283535] processor LNXCPU:05: registered as cooling_device5
[ 7.284159] processor LNXCPU:06: registered as cooling_device6
[ 7.284768] processor LNXCPU:07: registered as cooling_device7
[ 7.285364] processor LNXCPU:08: registered as cooling_device8
[ 7.285879] processor LNXCPU:09: registered as cooling_device9
[ 7.286595] processor LNXCPU:0a: registered as cooling_device10
[ 7.287125] processor LNXCPU:0b: registered as cooling_device11
[ 7.287720] processor LNXCPU:0c: registered as cooling_device12
[ 7.288295] processor LNXCPU:0d: registered as cooling_device13
[ 7.288825] processor LNXCPU:0e: registered as cooling_device14
[ 7.289485] processor LNXCPU:0f: registered as cooling_device15
[ 7.290069] processor LNXCPU:10: registered as cooling_device16
[ 7.290675] processor LNXCPU:11: registered as cooling_device17
[ 7.296242] Error: Driver 'pcspkr' is already registered, aborting...
[ 7.299964] processor LNXCPU:12: registered as cooling_device18
[ 7.300702] processor LNXCPU:13: registered as cooling_device19
[ 7.301409] processor LNXCPU:14: registered as cooling_device20
[ 7.302091] processor LNXCPU:15: registered as cooling_device21
[ 7.302741] processor LNXCPU:16: registered as cooling_device22
[ 7.303410] processor LNXCPU:17: registered as cooling_device23
[ 7.447430] Adding 8787960k swap on /dev/md1. Priority:-1 extents:1 across:8787960k
[ 7.502237] loop: module loaded
[ 7.660050] EXT4-fs (sdd1): mounted filesystem with ordered data mode
[ 7.668827] EXT4-fs (sda3): mounted filesystem with ordered data mode
[ 7.669375] EXT4-fs (sdc): Unrecognized mount option "0" or missing value
[ 7.824669] ADDRCONF(NETDEV_UP): eth0: link is not ready
答案1
可能值得使用ethtool
统计信息检查网络设备是否存在任何 NIC 和驱动程序错误:
ethtool -S "ethX"
只需用你的 NIC 替换即可ethX
。
您还可以使用该参数测试网络适配器-t
,尽管这可能会中断连接。
抱歉——这应该是一条评论,但我还不允许评论。