我有几个 Linux 服务器(SunFire X4270),运行 CentOS 5.3(kernel-2.6.18-128.1.16.el5),带有 Qlogic FC-8 QLE2562 HBA...我在使用这些新服务器时遇到了很多问题,其中一个服务器每秒都会显示以下消息:
qla2xxx 0000:2f:00.0: Passthru CT request failed to login management server
qla2xxx 0000:2f:00.0: Passthru CT failed
qla2xxx 0000:2f:00.1: Passthru CT request failed to login management server
qla2xxx 0000:2f:00.1: Passthru CT failed
此外,我的几台服务器都出现以下问题(见下文)。我尝试了 CentOS 5.3 2.6.18-128.el5 和 2.6.18-128.1.16.el5(最新)的几个内核版本,还尝试了 Qlogic 的最新驱动程序(内嵌 4.06 版 QLE2562 固件),但都没有成功。奇怪的是,我还有另一台服务器,硬件/软件配置相同,运行良好(稳定……)。Sun 支持(这些服务器可用)尚未能够解决问题……有什么想法吗?
qla2xxx_eh_abort(8): aborting sp ffff81037d86ebc0 from RISC. pid=952 sp->state=7 q->q_flag=2
qla2xxx 0000:2f:00.1: Mailbox command timeout occurred. Issuing ISP abort.
NMI Watchdog detected LOCKUP on CPU 13
CPU 13
Modules linked in: autofs4 sunrpc ipv6 xfrm_nalgo crypto_api cpufreq_ondemand acpi_cpufreq freq_table dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev qla2xxx(U) qla2xxx_conf(U) igb i2c_i801 intermodule(U) i2c_core sg pcspkr dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ahci libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 2982, comm: scsi_eh_8 Tainted: G 2.6.18-128.el5 #1
RIP: 0010:[<ffffffff8000c6f2>] [<ffffffff8000c6f2>] __delay+0x8/0x10
RSP: 0018:ffff81067dc7db88 EFLAGS: 00000097
RAX: 00000000ecd06b41 RBX: 000000000018c42b RCX: 00000000ecd05808
RDX: 0000000000000324 RSI: 0000000000000046 RDI: 0000000000003689
RBP: ffffc20000034000 R08: 0000000000000002 R09: ffff81067dc7db54
R10: 0000000000000001 R11: ffffffff80213fbd R12: ffff81037e84c4f8
R13: 0000000000000246 R14: 0000000000000001 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff81067fc46140(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006bb424 CR3: 000000067d035000 CR4: 00000000000006e0
Process scsi_eh_8 (pid: 2982, threadinfo ffff81067dc7c000, task ffff81010c6ec040)
Stack: ffffffff8827f743 ffff81037e84c4f8 ffff81067dc7dc90 ffff81060000dc20
ffff81037fa461c8 ffff81037e84c4f8 ffff81067dc7dc90 0000000000000100
ffffffff88285488 ffff81037fa461c8 ffff81037e84c4f8 ffff81067dc7dc90
Call Trace:
[<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e
[<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b
[<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553
[<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf
[<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0
[<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5
[<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124
[<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9
[<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac
[<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032360>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032262>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
Code: 29 c8 48 39 f8 72 f5 c3 41 54 83 3d ad d8 3c 00 00 49 89 f4
Kernel panic - not syncing: nmi watchdog
BUG: warning at kernel/panic.c:137/panic() (Tainted: G )
Call Trace:
<NMI> [<ffffffff8008efff>] panic+0x1da/0x1eb
[<ffffffff8006ba21>] _show_stack+0xdb/0xea
[<ffffffff8006bb14>] show_registers+0xe4/0x100
[<ffffffff8006537d>] die_nmi+0x66/0xa3
[<ffffffff80065ac3>] nmi_watchdog_tick+0x157/0x1d3
[<ffffffff800656e1>] default_do_nmi+0x81/0x225
[<ffffffff8006594e>] do_nmi+0x43/0x61
[<ffffffff80064fa7>] nmi+0x7f/0x88
[<ffffffff80213fbd>] pci_mmcfg_read+0x0/0x92
[<ffffffff8000c6f2>] __delay+0x8/0x10
<<EOE>> [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e
[<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b
[<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553
[<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf
[<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0
[<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5
[<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124
[<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9
[<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac
[<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032360>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032262>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
BUG: warning at drivers/input/serio/i8042.c:846/i8042_panic_blink() (Tainted: G )
Call Trace:
<NMI> [<ffffffff801fa015>] i8042_panic_blink+0x112/0x2a5
[<ffffffff8008efa5>] panic+0x180/0x1eb
[<ffffffff8006ba21>] _show_stack+0xdb/0xea
[<ffffffff8006bb14>] show_registers+0xe4/0x100
[<ffffffff8006537d>] die_nmi+0x66/0xa3
[<ffffffff80065ac3>] nmi_watchdog_tick+0x157/0x1d3
[<ffffffff800656e1>] default_do_nmi+0x81/0x225
[<ffffffff8006594e>] do_nmi+0x43/0x61
[<ffffffff80064fa7>] nmi+0x7f/0x88
[<ffffffff80213fbd>] pci_mmcfg_read+0x0/0x92
[<ffffffff8000c6f2>] __delay+0x8/0x10
<<EOE>> [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e
[<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b
[<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553
[<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf
[<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0
[<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5
[<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124
[<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9
[<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac
[<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032360>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032262>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
BUG: warning at drivers/input/serio/i8042.c:849/i8042_panic_blink() (Tainted: G )
Call Trace:
<NMI> [<ffffffff801fa0fe>] i8042_panic_blink+0x1fb/0x2a5
[<ffffffff8008efa5>] panic+0x180/0x1eb
[<ffffffff8006ba21>] _show_stack+0xdb/0xea
[<ffffffff8006bb14>] show_registers+0xe4/0x100
[<ffffffff8006537d>] die_nmi+0x66/0xa3
[<ffffffff80065ac3>] nmi_watchdog_tick+0x157/0x1d3
[<ffffffff800656e1>] default_do_nmi+0x81/0x225
[<ffffffff8006594e>] do_nmi+0x43/0x61
[<ffffffff80064fa7>] nmi+0x7f/0x88
[<ffffffff80213fbd>] pci_mmcfg_read+0x0/0x92
[<ffffffff8000c6f2>] __delay+0x8/0x10
<<EOE>> [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e
[<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b
[<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553
[<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf
[<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0
[<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5
[<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124
[<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9
[<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac
[<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032360>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032262>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
BUG: warning at drivers/input/serio/i8042.c:851/i8042_panic_blink() (Tainted: G )
Call Trace:
<NMI> [<ffffffff801fa17b>] i8042_panic_blink+0x278/0x2a5
[<ffffffff8008efa5>] panic+0x180/0x1eb
[<ffffffff8006ba21>] _show_stack+0xdb/0xea
[<ffffffff8006bb14>] show_registers+0xe4/0x100
[<ffffffff8006537d>] die_nmi+0x66/0xa3
[<ffffffff80065ac3>] nmi_watchdog_tick+0x157/0x1d3
[<ffffffff800656e1>] default_do_nmi+0x81/0x225
[<ffffffff8006594e>] do_nmi+0x43/0x61
[<ffffffff80064fa7>] nmi+0x7f/0x88
[<ffffffff80213fbd>] pci_mmcfg_read+0x0/0x92
[<ffffffff8000c6f2>] __delay+0x8/0x10
<<EOE>> [<ffffffff8827f743>] :qla2xxx:qla2x00_reset_chip+0x157/0x47e
[<ffffffff88285488>] :qla2xxx:qla2x00_abort_isp+0x6c/0x70b
[<ffffffff88286dfd>] :qla2xxx:qla2x00_mailbox_command+0x48e/0x553
[<ffffffff88286960>] :qla2xxx:qla2x00_mbx_sem_timeout+0x0/0xf
[<ffffffff882886f5>] :qla2xxx:qla2x00_issue_iocb_timeout+0x5f/0xc0
[<ffffffff88288fd0>] :qla2xxx:qla24xx_abort_command+0xf9/0x1a5
[<ffffffff88289099>] :qla2xxx:qla2x00_abort_command+0x1d/0x124
[<ffffffff80064c08>] _spin_unlock_irqrestore+0x8/0x9
[<ffffffff8827f1e6>] :qla2xxx:qla2xxx_eh_abort+0x9f8/0xba0
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff8807919f>] :scsi_mod:scsi_error_handler+0x290/0x4ac
[<ffffffff88078f0f>] :scsi_mod:scsi_error_handler+0x0/0x4ac
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032360>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032262>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
答案1
如果qla2xxx 0000:2f:00.0: Passthru CT request failed to login management server
仅在一台服务器上附加,则可能是卡的硬件问题。您是否尝试将此卡放在另一台服务器上?
对于运行良好的服务器,我会通过将卡从服务器 A 放到服务器 B 来尝试相同的测试,看看服务器 B 是否开始稳定或服务器 A 是否仍然稳定。
答案2
谢谢 radius。这似乎Passthru CT request failed
是硬件问题(尚未完全验证)。对于另一个大问题,它与我们拥有的 PCIe Active Riser 卡(Sun X4270 硬件)有关:这些卡包含与 QLE2562 冲突的 PCIe 交换机(问题已由 Sun 支持级别 2 验证/重现)... 如果您在使用 Sun 硬件时遇到此问题,请尝试将 HBA 放在未切换的 PCIe 插槽中(X4270 上的插槽 0 和 3,因为 Riser 0 不是活动 Riser,它位于 16x 插槽上)。Sun 正在努力解决其机器上的问题,以允许用户将 HBA 放在任何插槽中。
答案3
qla2xxx_eh_abort(8):aborting sp。此问题完全与安装在 Sun Blade 服务器上的 HBA 卡有关。实际上,我们最近在 2012 年 12 月 16 日就遇到了这个问题。因此,请更换 HBA 卡,这样就可以完全解决问题。