我们的一台设备今天冻结并显示以下内核消息:
[79648.067306] BUG: unable to handle page fault for address: 0000000004000034
[79648.067315] #PF: supervisor read access in kernel mode
[79648.067318] #PF: error_code(0x0000) - not-present page
从调用跟踪(见下文)来看,该错误似乎是由图形驱动程序(i915)引起的。据推测,内核更新可以解决该问题,但是,我对这个问题的背景感兴趣,所以我有 3 个问题:
- 这 3 行到底是什么意思,或者我在哪里可以找到这些错误的描述?
- 如果我启用硬件看门狗,出现此错误时是否会重新启动系统?
- 此错误是否是由于硬件(内存)故障导致的?
系统:5.4.0-91-generic,Ubuntu 20.04.1 LTS
内核环形缓冲区的完整转储 (dmesg):
[79648.067306] BUG: unable to handle page fault for address: 0000000004000034
[79648.067315] #PF: supervisor read access in kernel mode
[79648.067318] #PF: error_code(0x0000) - not-present page
[79648.067322] PGD 0 P4D 0
[79648.067328] Oops: 0000 [#1] SMP PTI
[79648.067335] CPU: 3 PID: 668 Comm: Xorg Not tainted 5.4.0-91-generic #102-Ubuntu
[79648.067338] Hardware name: Shuttle Inc. DH310S/DH310S, BIOS 1.06 03/23/2020
[79648.067349] RIP: 0010:find_get_entry+0x7a/0x170
[79648.067355] Code: b8 48 c7 45 d0 03 00 00 00 e8 d2 ff 85 00 49 89 c4 48 3d 02 04 00 00 74 e4 48 3d 06 04 00 00 74 dc 48 85 c0 74 3d a8 01 75 39 <8b> 40 34 85 c0 74 cc 8d 50 01 f0 41 0f b1 54 24 34 75 f0 48 8b 45
[79648.067359] RSP: 0018:ffffb80a8093f728 EFLAGS: 00010246
[79648.067364] RAX: 0000000004000000 RBX: 00000000000004a6 RCX: 0000000000000000
[79648.067367] RDX: 0000000000000026 RSI: ffff9a369e5ff6c0 RDI: ffffb80a8093f728
[79648.067370] RBP: ffffb80a8093f770 R08: 00000000001120d2 R09: 0000000000000000
[79648.067373] R10: ffff9a3714c8eaa0 R11: 0000000000003c64 R12: 0000000004000000
[79648.067376] R13: 00000000000004a6 R14: 0000000000000001 R15: ffff9a371bf261c0
[79648.067381] FS: 00007f5b0d819a40(0000) GS:ffff9a372ed80000(0000) knlGS:0000000000000000
[79648.067384] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[79648.067387] CR2: 0000000004000034 CR3: 000000025bf12003 CR4: 00000000003606e0
[79648.067390] Call Trace:
[79648.067401] find_lock_entry+0x1f/0xe0
[79648.067408] shmem_getpage_gfp+0xef/0x940
[79648.067417] ? __kmalloc+0x194/0x290
[79648.067424] shmem_read_mapping_page_gfp+0x44/0x80
[79648.067520] shmem_get_pages+0x250/0x650 [i915]
[79648.067530] ? __update_load_avg_se+0x23b/0x320
[79648.067538] ? update_load_avg+0x7c/0x670
[79648.067619] ____i915_gem_object_get_pages+0x22/0x40 [i915]
[79648.067692] __i915_gem_object_get_pages+0x5b/0x70 [i915]
[79648.067774] __i915_vma_do_pin+0x3ee/0x470 [i915]
[79648.067845] eb_lookup_vmas+0x68a/0xb70 [i915]
[79648.067930] ? eb_pin_engine+0x255/0x410 [i915]
[79648.067990] i915_gem_do_execbuffer+0x38f/0xc20 [i915]
[79648.067997] ? security_file_alloc+0x29/0x90
[79648.068004] ? _cond_resched+0x19/0x30
[79648.068010] ? apparmor_file_alloc_security+0x3e/0x160
[79648.068016] ? __radix_tree_replace+0x6d/0x120
[79648.068020] ? radix_tree_iter_tag_clear+0x12/0x20
[79648.068027] ? kmem_cache_alloc_trace+0x177/0x240
[79648.068035] ? __pm_runtime_resume+0x60/0x80
[79648.068040] ? recalibrate_cpu_khz+0x10/0x10
[79648.068044] ? ktime_get_mono_fast_ns+0x4e/0xa0
[79648.068048] ? __kmalloc_node+0x213/0x330
[79648.068107] i915_gem_execbuffer2_ioctl+0x1eb/0x3d0 [i915]
[79648.068112] ? radix_tree_lookup+0xd/0x10
[79648.068167] ? i915_gem_execbuffer_ioctl+0x2d0/0x2d0 [i915]
[79648.068196] drm_ioctl_kernel+0xae/0xf0 [drm]
[79648.068218] drm_ioctl+0x24a/0x3f0 [drm]
[79648.068278] ? i915_gem_execbuffer_ioctl+0x2d0/0x2d0 [i915]
[79648.068288] do_vfs_ioctl+0x407/0x670
[79648.068293] ? fput+0x13/0x20
[79648.068299] ? __sys_recvmsg+0x88/0xa0
[79648.068305] ksys_ioctl+0x67/0x90
[79648.068311] __x64_sys_ioctl+0x1a/0x20
[79648.068317] do_syscall_64+0x57/0x190
[79648.068323] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[79648.068327] RIP: 0033:0x7f5b0db7937b
[79648.068332] Code: 0f 1e fa 48 8b 05 15 3b 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e5 3a 0d 00 f7 d8 64 89 01 48
[79648.068335] RSP: 002b:00007fff24ca5d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[79648.068339] RAX: ffffffffffffffda RBX: 000055eaa18c2290 RCX: 00007f5b0db7937b
[79648.068342] RDX: 00007fff24ca5db0 RSI: 0000000040406469 RDI: 000000000000000c
[79648.068345] RBP: 00007f5b0ba31000 R08: 0000000000000002 R09: 0000000000000001
[79648.068347] R10: 00007f5b0d4156a0 R11: 0000000000000246 R12: 00007fff24ca5db0
[79648.068350] R13: 000000000000000c R14: 000000000000001a R15: 0000000000000068
[79648.068354] Modules linked in: wdat_wdt nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi intel_rapl_msr snd_seq_midi_event intel_rapl_common snd_rawmidi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_seq kvm rtsx_pci_ms rapl snd_seq_device intel_cstate memstick snd_timer mei_me mei snd soundcore mac_hid acpi_pad sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd i2c_algo_bit rtsx_pci_sdmmc glue_helper drm_kms_helper syscopyarea sysfillrect sysimgblt i2c_i801 fb_sys_fops r8169 rtsx_pci drm realtek ahci libahci video
[79648.068413] CR2: 0000000004000034
[79648.068418] ---[ end trace 447ad409d057183e ]---
[79648.068425] RIP: 0010:find_get_entry+0x7a/0x170
[79648.068429] Code: b8 48 c7 45 d0 03 00 00 00 e8 d2 ff 85 00 49 89 c4 48 3d 02 04 00 00 74 e4 48 3d 06 04 00 00 74 dc 48 85 c0 74 3d a8 01 75 39 <8b> 40 34 85 c0 74 cc 8d 50 01 f0 41 0f b1 54 24 34 75 f0 48 8b 45
[79648.068432] RSP: 0018:ffffb80a8093f728 EFLAGS: 00010246
[79648.068435] RAX: 0000000004000000 RBX: 00000000000004a6 RCX: 0000000000000000
[79648.068438] RDX: 0000000000000026 RSI: ffff9a369e5ff6c0 RDI: ffffb80a8093f728
[79648.068441] RBP: ffffb80a8093f770 R08: 00000000001120d2 R09: 0000000000000000
[79648.068443] R10: ffff9a3714c8eaa0 R11: 0000000000003c64 R12: 0000000004000000
[79648.068446] R13: 00000000000004a6 R14: 0000000000000001 R15: ffff9a371bf261c0
[79648.068449] FS: 00007f5b0d819a40(0000) GS:ffff9a372ed80000(0000) knlGS:0000000000000000
[79648.068452] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[79648.068455] CR2: 0000000004000034 CR3: 000000025bf12003 CR4: 00000000003606e0
答案1
[79648.067306] BUG: unable to handle page fault for address: 0000000004000034
[79648.067315] #PF: supervisor read access in kernel mode
[79648.067318] #PF: error_code(0x0000) - not-present page
这些错误表明内核代码尝试访问无效指针。内核代码试图访问虚拟内存地址0x0000000004000034
,但发现它不对应于任何真实的内存页面(该页面无法出错)。
第二行和第三行给出了以下上下文:1) 代码在内核模式(管理程序模式)下运行;2) 访问是读取; 3) 问题是页面丢失,而不是页面保护不兼容(例如写入只读页面)。
这可能是内核/驱动程序代码中的错误。
答案2
我最近遇到了完全相同的问题,我想我已经克服了它:
我完全禁用了英特尔无线的电源管理。不再
kbl_dmc_ver1_04.bin
在/lib/firmware
.降级为
iwlwifi-QuZ-a0-hr-b0-48.ucode
从iwlwifi-QuZ-a0-hr-b0-50.ucode
.而且,更重要的是:
sudo iwconfig wlan0 power off
。
不再出现冻结或 RAM 违规。
PS 我的第一个想法是 RAM 故障或 VRAM 分配过大。但没有一个被证明是真实的。