内核:BUG:软锁定_raw_spin_unlock_irqrestore

内核:BUG:软锁定_raw_spin_unlock_irqrestore

系统挂起后重新启动,软锁定消息不断出现。由于 vmcore 在重新启动前未启用,因此没有 vmcore。内核:3.10.0-327.el7.x86_64。

如果有人以前遇到过类似的问题,您知道问题是什么吗?谢谢。

Nov 14 06:25:07 localhost kernel: BUG: soft lockup - CPU#3 stuck for 37s! [xfsaild/dm-0:487]
Nov 14 06:25:07 localhost kernel: Modules linked in: fuse btrfs zlib_deflate raid6_pq xor vfat msdos fat ext4 mbcache jbd2 binfmt_misc ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter vmw_vsock_vmci_transport vsock coretemp crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ppdev vmw_balloon pcspkr sg parport_pc parport shpchp i2c_piix4 vmw_vmci ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi sd_mod crc_t10dif crct10dif_generic serio_raw crct10dif_pclmul
Nov 14 06:25:07 localhost kernel: crct10dif_common vmwgfx crc32c_intel drm_kms_helper ttm drm ata_piix vmxnet3 libata i2c_core vmw_pvscsi floppy dm_mirror dm_region_hash dm_log dm_mod
Nov 14 06:25:07 localhost kernel: CPU: 3 PID: 487 Comm: xfsaild/dm-0 Tainted: G             L ------------   3.10.0-327.el7.x86_64 #1
Nov 14 06:25:07 localhost kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
Nov 14 06:25:07 localhost kernel: task: ffff880fe4ac9700 ti: ffff880fe33f4000 task.ti: ffff880fe33f4000
Nov 14 06:25:07 localhost kernel: RIP: 0010:[<ffffffff8163ca4b>]  [<ffffffff8163ca4b>] _raw_spin_unlock_irqrestore+0x1b/0x40
Nov 14 06:25:07 localhost kernel: RSP: 0018:ffff880fe33f7b68  EFLAGS: 00000282
Nov 14 06:25:07 localhost kernel: RAX: 0000000000000000 RBX: ffff880fe33f7b30 RCX: 0000000000000200
Nov 14 06:25:07 localhost kernel: RDX: ffffc90006060000 RSI: 0000000000000282 RDI: 0000000000000282
Nov 14 06:25:07 localhost kernel: RBP: ffff880fe33f7b70 R08: 0000000000000000 R09: ffff8805e761ec00
Nov 14 06:25:07 localhost kernel: R10: ffff880fe471a000 R11: ffff880fe88db800 R12: ffff880b28fc9f00
Nov 14 06:25:07 localhost kernel: R13: 0000000000000020 R14: ffffffff8141e59f R15: ffff880fe33f7ae0
Nov 14 06:25:07 localhost kernel: FS:  0000000000000000(0000) GS:ffff88103fcc0000(0000) knlGS:0000000000000000
Nov 14 06:25:07 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 14 06:25:07 localhost kernel: CR2: 00007ff539c0c810 CR3: 0000000fe7528000 CR4: 00000000001407e0
Nov 14 06:25:07 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 14 06:25:07 localhost kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 14 06:25:07 localhost kernel: Stack:
Nov 14 06:25:07 localhost kernel: 0000000000000000 ffff880fe33f7be0 ffffffffa00554f7 ffff8805e761ec00
Nov 14 06:25:07 localhost kernel: ffff880fe471a000 ffffffff8141e0c0 0000000000000002 ffff880fe88dc754
Nov 14 06:25:07 localhost kernel: ffff880fe31e6a00 0000000000000282 ffff880b28fc9f80 0000000000000000
Nov 14 06:25:07 localhost kernel: Call Trace:
Nov 14 06:25:07 localhost kernel: [<ffffffffa00554f7>] pvscsi_queue+0x3b7/0x5c0 [vmw_pvscsi]
Nov 14 06:25:07 localhost kernel: [<ffffffff8141e0c0>] ? scsi_kmap_atomic_sg+0x190/0x190
Nov 14 06:25:07 localhost kernel: [<ffffffff81417b1a>] scsi_dispatch_cmd+0xaa/0x230
Nov 14 06:25:07 localhost kernel: [<ffffffff81420aa1>] scsi_request_fn+0x501/0x770
Nov 14 06:25:07 localhost kernel: [<ffffffff812c73e3>] __blk_run_queue+0x33/0x40
Nov 14 06:25:07 localhost kernel: [<ffffffff812c749a>] queue_unplugged+0x2a/0xa0
Nov 14 06:25:07 localhost kernel: [<ffffffff812cbcc5>] blk_flush_plug_list+0x185/0x230
Nov 14 06:25:07 localhost kernel: [<ffffffff812cc124>] blk_finish_plug+0x14/0x40
Nov 14 06:25:07 localhost kernel: [<ffffffffa0222a79>] __xfs_buf_delwri_submit+0x1e9/0x250 [xfs]
Nov 14 06:25:07 localhost kernel: [<ffffffffa022367f>] ? xfs_buf_delwri_submit_nowait+0x2f/0x50 [xfs]
Nov 14 06:25:07 localhost kernel: [<ffffffffa024e470>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Nov 14 06:25:07 localhost kernel: [<ffffffffa022367f>] xfs_buf_delwri_submit_nowait+0x2f/0x50 [xfs]
Nov 14 06:25:07 localhost kernel: [<ffffffffa024e6b0>] xfsaild+0x240/0x5e0 [xfs]
Nov 14 06:25:07 localhost kernel: [<ffffffffa024e470>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
Nov 14 06:25:07 localhost kernel: [<ffffffff810a5aef>] kthread+0xcf/0xe0
Nov 14 06:25:07 localhost kernel: [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
Nov 14 06:25:07 localhost kernel: [<ffffffff81645858>] ret_from_fork+0x58/0x90
Nov 14 06:25:07 localhost kernel: [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
Nov 14 06:25:07 localhost kernel: Code: 08 e8 aa 72 a4 ff 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 f3 0f 1f 44 00 00 66 83 07 02 48 89 df 57 9d <0f> 1f 44 00 00 5b 5d c3 0f 1f 44 00 00 8b 37 f0 66 83 07 02 f6

答案1

软锁定似乎与磁盘 I/O 请求处理有关。在硬件系统上,我会检查 SMART 数据和任何其他可用的磁盘运行状况信息,以排除硬件问题的可能性。

然而,这似乎是一个VMware虚拟机,所以首先要检查的是虚拟化主机的统计信息:主机或其存储是否因上面的所有虚拟机而过载?这可能会导致对 I/O 请求的响应出现长时间延迟。如果这种延迟持续超过 30 秒,您就会开始收到这些软锁定通知,即使根本原因可能是没有足够的 CPU 容量或存储 I/O 带宽来满足主机上所有虚拟机的要求。

相关内容