我的服务器出了问题。/var/log/messages 错误日志中写入以下内容:
Aug 15 10:22:46 s00000000 kernel: ------------[ cut here ]------------
Aug 15 10:22:46 s00000000 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Tainted: G W ---------------- )
Aug 15 10:22:46 s00000000 kernel: Hardware name: X9SCL/X9SCM
Aug 15 10:22:46 s00000000 kernel: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Aug 15 10:22:46 s00000000 kernel: Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ext4 jbd2 serio_raw i2c_i801 i2c_core sg iTCO_wdt iT
CO_vendor_support e1000e ext3 jbd mbcache raid1 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Aug 15 10:22:46 s00000000 kernel: Pid: 0, comm: swapper Tainted: G W ---------------- 2.6.32-220.2.1.el6.x86_64 #1
Aug 15 10:22:46 s00000000 kernel: Call Trace:
Aug 15 10:22:46 s00000000 kernel: <IRQ> [<ffffffff81069997>] ? warn_slowpath_common+0x87/0xc0
Aug 15 10:22:46 s00000000 kernel: [<ffffffff81069a86>] ? warn_slowpath_fmt+0x46/0x50
Aug 15 10:22:46 s00000000 kernel: [<ffffffff8144a5fd>] ? dev_watchdog+0x26d/0x280
Aug 15 10:22:46 s00000000 kernel: [<ffffffff8107c4a8>] ? add_timer_on+0xa8/0x120
Aug 15 10:22:46 s00000000 kernel: [<ffffffff8144a390>] ? dev_watchdog+0x0/0x280
Aug 15 10:22:46 s00000000 kernel: [<ffffffff8107c777>] ? run_timer_softirq+0x197/0x340
Aug 15 10:22:46 s00000000 kernel: [<ffffffff810a0990>] ? tick_sched_timer+0x0/0xc0
Aug 15 10:22:46 s00000000 kernel: [<ffffffff8102ad2d>] ? lapic_next_event+0x1d/0x30
Aug 15 10:22:46 s00000000 kernel: [<ffffffff81071f81>] ? __do_softirq+0xc1/0x1d0
Aug 15 10:22:46 s00000000 kernel: [<ffffffff81095590>] ? hrtimer_interrupt+0x140/0x250
Aug 15 10:22:46 s00000000 kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
Aug 15 10:22:46 s00000000 kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
Aug 15 10:22:46 s00000000 kernel: [<ffffffff81071d65>] ? irq_exit+0x85/0x90
Aug 15 10:22:46 s00000000 kernel: [<ffffffff814f4ec0>] ? smp_apic_timer_interrupt+0x70/0x9b
Aug 15 10:22:46 s00000000 kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20
Aug 15 10:22:46 s00000000 kernel: <EOI> [<ffffffff812c4ade>] ? intel_idle+0xde/0x170
Aug 15 10:22:46 s00000000 kernel: [<ffffffff812c4ac1>] ? intel_idle+0xc1/0x170
Aug 15 10:22:46 s00000000 kernel: [<ffffffff81097a8d>] ? sched_clock_cpu+0xcd/0x110
Aug 15 10:22:46 s00000000 kernel: [<ffffffff813f9ff7>] ? cpuidle_idle_call+0xa7/0x140
Aug 15 10:22:46 s00000000 kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110
Aug 15 10:22:46 s00000000 kernel: [<ffffffff814e5fbb>] ? start_secondary+0x202/0x245
Aug 15 10:22:46 s00000000 kernel: ---[ end trace 7b02a6494611efa0 ]---
Aug 15 10:22:46 s00000000 kernel: e1000e 0000:02:00.0: eth0: Reset adapter
Aug 15 10:22:46 s00000000 kernel: e1000e 0000:02:00.0: eth0: Error reading PHY register
Aug 15 10:22:46 s00000000 kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
由于某种原因,CentOS 想要重置网络适配器,然后读取 PHY 寄存器时出现问题,因此网络完全失败。然后我必须重新启动服务器。当然,我在谷歌上搜索了一些。我发现了一些相同的报告,但没有解决方案。
该问题的发生非常不规律,今天之前最后一次发生是在两周前。
有人有主意吗?
答案1
NETDEV WATCHDOG 消息表示 NIC 停止发送提供给它的数据。NIC 停止响应的原因尚不清楚。
尽管尚无解决方案,但 Red Hat 内部目前正在努力解决这个问题。
您最好查看 CentOS 项目中是否存在线程或错误并在那里报告,然后 CentOS 团队可以收集相关信息并将其提供给 Red Hat 进行进一步的故障排除。