内核:BUG:无法处理地址的页面错误

内核:BUG:无法处理地址的页面错误

我们的一台设备今天冻结并显示以下内核消息:

[79648.067306] BUG: unable to handle page fault for address: 0000000004000034
[79648.067315] #PF: supervisor read access in kernel mode
[79648.067318] #PF: error_code(0x0000) - not-present page

从调用跟踪(见下文)来看,该错误似乎是由图形驱动程序(i915)引起的。据推测,内核更新可以解决该问题,但是,我对这个问题的背景感兴趣,所以我有 3 个问题:

  1. 这 3 行到底是什么意思,或者我在哪里可以找到这些错误的描述?
  2. 如果我启用硬件看门狗,出现此错误时是否会重新启动系统?
  3. 此错误是否是由于硬件(内存)故障导致的?

系统:5.4.0-91-generic,Ubuntu 20.04.1 LTS

内核环形缓冲区的完整转储 (dmesg):

[79648.067306] BUG: unable to handle page fault for address: 0000000004000034
[79648.067315] #PF: supervisor read access in kernel mode
[79648.067318] #PF: error_code(0x0000) - not-present page
[79648.067322] PGD 0 P4D 0
[79648.067328] Oops: 0000 [#1] SMP PTI
[79648.067335] CPU: 3 PID: 668 Comm: Xorg Not tainted 5.4.0-91-generic #102-Ubuntu
[79648.067338] Hardware name: Shuttle Inc. DH310S/DH310S, BIOS 1.06 03/23/2020
[79648.067349] RIP: 0010:find_get_entry+0x7a/0x170
[79648.067355] Code: b8 48 c7 45 d0 03 00 00 00 e8 d2 ff 85 00 49 89 c4 48 3d 02 04 00 00 74 e4 48 3d 06 04 00 00 74 dc 48 85 c0 74 3d a8 01 75 39 <8b> 40 34 85 c0 74 cc 8d 50 01 f0 41 0f b1 54 24 34 75 f0 48 8b 45
[79648.067359] RSP: 0018:ffffb80a8093f728 EFLAGS: 00010246
[79648.067364] RAX: 0000000004000000 RBX: 00000000000004a6 RCX: 0000000000000000
[79648.067367] RDX: 0000000000000026 RSI: ffff9a369e5ff6c0 RDI: ffffb80a8093f728
[79648.067370] RBP: ffffb80a8093f770 R08: 00000000001120d2 R09: 0000000000000000
[79648.067373] R10: ffff9a3714c8eaa0 R11: 0000000000003c64 R12: 0000000004000000
[79648.067376] R13: 00000000000004a6 R14: 0000000000000001 R15: ffff9a371bf261c0
[79648.067381] FS:  00007f5b0d819a40(0000) GS:ffff9a372ed80000(0000) knlGS:0000000000000000
[79648.067384] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[79648.067387] CR2: 0000000004000034 CR3: 000000025bf12003 CR4: 00000000003606e0
[79648.067390] Call Trace:
[79648.067401]  find_lock_entry+0x1f/0xe0
[79648.067408]  shmem_getpage_gfp+0xef/0x940
[79648.067417]  ? __kmalloc+0x194/0x290
[79648.067424]  shmem_read_mapping_page_gfp+0x44/0x80
[79648.067520]  shmem_get_pages+0x250/0x650 [i915]
[79648.067530]  ? __update_load_avg_se+0x23b/0x320
[79648.067538]  ? update_load_avg+0x7c/0x670
[79648.067619]  ____i915_gem_object_get_pages+0x22/0x40 [i915]
[79648.067692]  __i915_gem_object_get_pages+0x5b/0x70 [i915]
[79648.067774]  __i915_vma_do_pin+0x3ee/0x470 [i915]
[79648.067845]  eb_lookup_vmas+0x68a/0xb70 [i915]
[79648.067930]  ? eb_pin_engine+0x255/0x410 [i915]
[79648.067990]  i915_gem_do_execbuffer+0x38f/0xc20 [i915]
[79648.067997]  ? security_file_alloc+0x29/0x90
[79648.068004]  ? _cond_resched+0x19/0x30
[79648.068010]  ? apparmor_file_alloc_security+0x3e/0x160
[79648.068016]  ? __radix_tree_replace+0x6d/0x120
[79648.068020]  ? radix_tree_iter_tag_clear+0x12/0x20
[79648.068027]  ? kmem_cache_alloc_trace+0x177/0x240
[79648.068035]  ? __pm_runtime_resume+0x60/0x80
[79648.068040]  ? recalibrate_cpu_khz+0x10/0x10
[79648.068044]  ? ktime_get_mono_fast_ns+0x4e/0xa0
[79648.068048]  ? __kmalloc_node+0x213/0x330
[79648.068107]  i915_gem_execbuffer2_ioctl+0x1eb/0x3d0 [i915]
[79648.068112]  ? radix_tree_lookup+0xd/0x10
[79648.068167]  ? i915_gem_execbuffer_ioctl+0x2d0/0x2d0 [i915]
[79648.068196]  drm_ioctl_kernel+0xae/0xf0 [drm]
[79648.068218]  drm_ioctl+0x24a/0x3f0 [drm]
[79648.068278]  ? i915_gem_execbuffer_ioctl+0x2d0/0x2d0 [i915]
[79648.068288]  do_vfs_ioctl+0x407/0x670
[79648.068293]  ? fput+0x13/0x20
[79648.068299]  ? __sys_recvmsg+0x88/0xa0
[79648.068305]  ksys_ioctl+0x67/0x90
[79648.068311]  __x64_sys_ioctl+0x1a/0x20
[79648.068317]  do_syscall_64+0x57/0x190
[79648.068323]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[79648.068327] RIP: 0033:0x7f5b0db7937b
[79648.068332] Code: 0f 1e fa 48 8b 05 15 3b 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e5 3a 0d 00 f7 d8 64 89 01 48
[79648.068335] RSP: 002b:00007fff24ca5d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[79648.068339] RAX: ffffffffffffffda RBX: 000055eaa18c2290 RCX: 00007f5b0db7937b
[79648.068342] RDX: 00007fff24ca5db0 RSI: 0000000040406469 RDI: 000000000000000c
[79648.068345] RBP: 00007f5b0ba31000 R08: 0000000000000002 R09: 0000000000000001
[79648.068347] R10: 00007f5b0d4156a0 R11: 0000000000000246 R12: 00007fff24ca5db0
[79648.068350] R13: 000000000000000c R14: 000000000000001a R15: 0000000000000068
[79648.068354] Modules linked in: wdat_wdt nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi intel_rapl_msr snd_seq_midi_event intel_rapl_common snd_rawmidi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_seq kvm rtsx_pci_ms rapl snd_seq_device intel_cstate memstick snd_timer mei_me mei snd soundcore mac_hid acpi_pad sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd i2c_algo_bit rtsx_pci_sdmmc glue_helper drm_kms_helper syscopyarea sysfillrect sysimgblt i2c_i801 fb_sys_fops r8169 rtsx_pci drm realtek ahci libahci video
[79648.068413] CR2: 0000000004000034
[79648.068418] ---[ end trace 447ad409d057183e ]---
[79648.068425] RIP: 0010:find_get_entry+0x7a/0x170
[79648.068429] Code: b8 48 c7 45 d0 03 00 00 00 e8 d2 ff 85 00 49 89 c4 48 3d 02 04 00 00 74 e4 48 3d 06 04 00 00 74 dc 48 85 c0 74 3d a8 01 75 39 <8b> 40 34 85 c0 74 cc 8d 50 01 f0 41 0f b1 54 24 34 75 f0 48 8b 45
[79648.068432] RSP: 0018:ffffb80a8093f728 EFLAGS: 00010246
[79648.068435] RAX: 0000000004000000 RBX: 00000000000004a6 RCX: 0000000000000000
[79648.068438] RDX: 0000000000000026 RSI: ffff9a369e5ff6c0 RDI: ffffb80a8093f728
[79648.068441] RBP: ffffb80a8093f770 R08: 00000000001120d2 R09: 0000000000000000
[79648.068443] R10: ffff9a3714c8eaa0 R11: 0000000000003c64 R12: 0000000004000000
[79648.068446] R13: 00000000000004a6 R14: 0000000000000001 R15: ffff9a371bf261c0
[79648.068449] FS:  00007f5b0d819a40(0000) GS:ffff9a372ed80000(0000) knlGS:0000000000000000
[79648.068452] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[79648.068455] CR2: 0000000004000034 CR3: 000000025bf12003 CR4: 00000000003606e0

答案1

[79648.067306] BUG: unable to handle page fault for address: 0000000004000034
[79648.067315] #PF: supervisor read access in kernel mode
[79648.067318] #PF: error_code(0x0000) - not-present page

这些错误表明内核代码尝试访问无效指针。内核代码试图访问虚拟内存地址0x0000000004000034,但发现它不对应于任何真实的内存页面(该页面无法出错)。

第二行和第三行给出了以下上下文:1) 代码在内核模式(管理程序模式)下运行;2) 访问是读取; 3) 问题是页面丢失,而不是页面保护不兼容(例如写入只读页面)。

这可能是内核/驱动程序代码中的错误。

答案2

我最近遇到了完全相同的问题,我想我已经克服了它:

  • 我完全禁用了英特尔无线的电源管理。不再kbl_dmc_ver1_04.bin/lib/firmware.

  • 降级为iwlwifi-QuZ-a0-hr-b0-48.ucodeiwlwifi-QuZ-a0-hr-b0-50.ucode.

  • 而且,更重要的是:sudo iwconfig wlan0 power off

不再出现冻结或 RAM 违规。

PS 我的第一个想法是 RAM 故障或 VRAM 分配过大。但没有一个被证明是真实的。

相关内容