装有 Linux 的 T440 可能由于硬件问题而死机

装有 Linux 的 T440 可能由于硬件问题而死机

我在 Lenovo T440s 上运行带有 Linux 4.9.0-16-amd64 的 Debian 9,直到最近才稳定,但每天开始挂起几次。目前还没有升级,所以我怀疑挂起可能是由硬件引起的。

/var/log/syslog 中有错误,例如以下错误(并未立即导致挂起):

Jul  4 12:46:39 dumaty kernel: [ 2345.071294] ------------[ cut here ]------------
Jul  4 12:46:39 dumaty kernel: [ 2345.071314] WARNING: CPU: 2 PID: 366 at /build/linux-hrcSIZ/linux-4.9.272/drivers/net/wireless/intel/iwlwifi/mvm/rs.c:1212 iwl_mvm_rs_tx_status+0x159/0x1950 [iwlmvm]
Jul  4 12:46:39 dumaty kernel: [ 2345.071315] Modules linked in: ctr ccm binfmt_misc rfcomm fuse cmac bnep iTCO_wdt iTCO_vendor_support intel_rapl arc4 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi kvm iwlmvm irqbypass intel_cstate mac80211 joydev evdev intel_uncore pcspkr intel_rapl_perf snd_hda_codec_realtek rtsx_pci_ms serio_raw iwlwifi sg hid_multitouch snd_hda_codec_generic memstick uvcvideo lpc_ich cfg80211 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core btusb cdc_mbim btrtl cdc_wdm snd_hda_intel videodev btbcm shpchp btintel snd_hda_codec i915 media cdc_ncm cdc_acm snd_hda_core bluetooth usbnet drm_kms_helper mii snd_hwdep drm mei_me snd_pcm snd_timer mei i2c_algo_bit thinkpad_acpi wmi nvram snd soundcore ac rfkill battery video button parport_pc ppdev lp parport ip_tables x_tables
Jul  4 12:46:39 dumaty kernel: [ 2345.071369]  autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache algif_skcipher af_alg usbhid hid dm_crypt dm_mod sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel rtsx_pci_sdmmc mmc_core aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse ahci libahci i2c_i801 i2c_smbus libata xhci_pci scsi_mod ehci_pci xhci_hcd ehci_hcd rtsx_pci e1000e mfd_core ptp usbcore pps_core usb_common thermal
Jul  4 12:46:39 dumaty kernel: [ 2345.071401] CPU: 2 PID: 366 Comm: irq/47-iwlwifi Not tainted 4.9.0-16-amd64 #1 Debian 4.9.272-1
Jul  4 12:46:39 dumaty kernel: [ 2345.071402] Hardware name: LENOVO 20ARA0YL00/20ARA0YL00, BIOS GJET77WW (2.27 ) 05/20/2014
Jul  4 12:46:39 dumaty kernel: [ 2345.071403]  0000000000000000 ffffffffae213377 0000000000000000 0000000000000000
Jul  4 12:46:39 dumaty kernel: [ 2345.071406]  ffffffffadc7aa2b ffff9cf649fb0900 0000000000000005 ffff9cf64a4c1568
Jul  4 12:46:39 dumaty kernel: [ 2345.071409]  00000000ffffffea 000000000d9afcfb ffff9cf580809a28 ffffffffc0b479e9
Jul  4 12:46:39 dumaty kernel: [ 2345.071411] Call Trace:
Jul  4 12:46:39 dumaty kernel: [ 2345.071417]  [<ffffffffae213377>] ? dump_stack+0x66/0x81
Jul  4 12:46:39 dumaty kernel: [ 2345.071421]  [<ffffffffadc7aa2b>] ? __warn+0xcb/0xf0
Jul  4 12:46:39 dumaty kernel: [ 2345.071429]  [<ffffffffc0b479e9>] ? iwl_mvm_rs_tx_status+0x159/0x1950 [iwlmvm]
Jul  4 12:46:39 dumaty kernel: [ 2345.071432]  [<ffffffffadcb768e>] ? find_busiest_group+0x3e/0x4d0
Jul  4 12:46:39 dumaty kernel: [ 2345.071436]  [<ffffffffadce86b4>] ? lock_timer_base+0x74/0x90
Jul  4 12:46:39 dumaty kernel: [ 2345.071453]  [<ffffffffc0d68162>] ? ieee80211_tx_status+0x3b2/0x8b0 [mac80211]
Jul  4 12:46:39 dumaty kernel: [ 2345.071459]  [<ffffffffc0b3b8d6>] ? iwl_mvm_rx_tx_cmd+0x296/0x770 [iwlmvm]
Jul  4 12:46:39 dumaty kernel: [ 2345.071462]  [<ffffffffae2224a5>] ? __switch_to_asm+0x35/0x70
Jul  4 12:46:39 dumaty kernel: [ 2345.071468]  [<ffffffffc0cd3832>] ? iwl_pcie_rx_handle+0x2d2/0x840 [iwlwifi]
Jul  4 12:46:39 dumaty kernel: [ 2345.071473]  [<ffffffffc0cd4e51>] ? iwl_pcie_irq_handler+0x181/0x730 [iwlwifi]
Jul  4 12:46:39 dumaty kernel: [ 2345.071475]  [<ffffffffadcd7190>] ? irq_finalize_oneshot.part.36+0xf0/0xf0
Jul  4 12:46:39 dumaty kernel: [ 2345.071477]  [<ffffffffadcd71b1>] ? irq_thread_fn+0x21/0x60
Jul  4 12:46:39 dumaty kernel: [ 2345.071479]  [<ffffffffadcd79b6>] ? irq_thread+0x136/0x1c0
Jul  4 12:46:39 dumaty kernel: [ 2345.071481]  [<ffffffffae21d4d1>] ? __schedule+0x241/0x6f0
Jul  4 12:46:39 dumaty kernel: [ 2345.071483]  [<ffffffffadcbdb0f>] ? __wake_up_common+0x4f/0x90
Jul  4 12:46:39 dumaty kernel: [ 2345.071485]  [<ffffffffadcd7280>] ? irq_forced_thread_fn+0x90/0x90
Jul  4 12:46:39 dumaty kernel: [ 2345.071487]  [<ffffffffadcd7880>] ? irq_thread_check_affinity+0xd0/0xd0
Jul  4 12:46:39 dumaty kernel: [ 2345.071490]  [<ffffffffadc9af29>] ? kthread+0xd9/0xf0
Jul  4 12:46:39 dumaty kernel: [ 2345.071493]  [<ffffffffae2224b1>] ? __switch_to_asm+0x41/0x70
Jul  4 12:46:39 dumaty kernel: [ 2345.071496]  [<ffffffffadc9ae50>] ? kthread_park+0x60/0x60
Jul  4 12:46:39 dumaty kernel: [ 2345.071498]  [<ffffffffae222537>] ? ret_from_fork+0x57/0x70
Jul  4 12:46:39 dumaty kernel: [ 2345.071499] ---[ end trace e62295838fbe3e4e ]---

后来,又发生了一个错误。我记得以前也见过其他 swap_free 错误。

Jul  4 15:11:21 dumaty kernel: [11027.163548] swap_free: Unused swap file entry 3ffff8c9d3f8a
Jul  4 15:11:21 dumaty kernel: [11027.163554] BUG: Bad page map in process CompositorTileW  pte:e6c580ea2a pmd:24ca96067
Jul  4 15:11:21 dumaty kernel: [11027.163557] addr:000055f7a8fc0000 vm_flags:08100073 anon_vma:ffff9cf5bb2a9e10 mapping:          (null) index:55f7a8fc0
Jul  4 15:11:21 dumaty kernel: [11027.163559] file:          (null) fault:          (null) mmap:          (null) readpage:          (null)
Jul  4 15:11:21 dumaty kernel: [11027.163563] CPU: 3 PID: 6137 Comm: CompositorTileW Tainted: G        W       4.9.0-16-amd64 #1 Debian 4.9.272-1
Jul  4 15:11:21 dumaty kernel: [11027.163564] Hardware name: LENOVO 20ARA0YL00/20ARA0YL00, BIOS GJET77WW (2.27 ) 05/20/2014
Jul  4 15:11:21 dumaty kernel: [11027.163565]  0000000000000000 ffffffffae213377 000055f7a8fc0000 ffff9cf55af3b0c8
Jul  4 15:11:21 dumaty kernel: [11027.163568]  ffffffffaddb7c31 000055f7a9034000 0000000000000000 0000000000000000
Jul  4 15:11:21 dumaty kernel: [11027.163571]  000055f7a8fc0000 ffff9cf58ca96e00 000000e6c580ea2a ffffb7c4839a3c38
Jul  4 15:11:21 dumaty kernel: [11027.163573] Call Trace:
Jul  4 15:11:21 dumaty kernel: [11027.163580]  [<ffffffffae213377>] ? dump_stack+0x66/0x81
Jul  4 15:11:21 dumaty kernel: [11027.163582]  [<ffffffffaddb7c31>] ? print_bad_pte+0x1d1/0x2a0
Jul  4 15:11:21 dumaty kernel: [11027.163584]  [<ffffffffaddba434>] ? unmap_page_range+0x5d4/0x9d0
Jul  4 15:11:21 dumaty kernel: [11027.163586]  [<ffffffffaddbabfc>] ? unmap_vmas+0x4c/0xa0
Jul  4 15:11:21 dumaty kernel: [11027.163589]  [<ffffffffaddc3b9f>] ? exit_mmap+0x8f/0x140
Jul  4 15:11:21 dumaty kernel: [11027.163593]  [<ffffffffadc77604>] ? mmput+0x54/0x100
Jul  4 15:11:21 dumaty kernel: [11027.163594]  [<ffffffffadc7f1be>] ? do_exit+0x27e/0xb60
Jul  4 15:11:21 dumaty kernel: [11027.163596]  [<ffffffffadc7fb1a>] ? do_group_exit+0x3a/0xa0
Jul  4 15:11:21 dumaty kernel: [11027.163599]  [<ffffffffadc8abe1>] ? get_signal+0x161/0x850
Jul  4 15:11:21 dumaty kernel: [11027.163602]  [<ffffffffadcfea0f>] ? do_futex+0x14f/0xba0
Jul  4 15:11:21 dumaty kernel: [11027.163605]  [<ffffffffadc26486>] ? do_signal+0x36/0x690
Jul  4 15:11:21 dumaty kernel: [11027.163607]  [<ffffffffadd2d5a4>] ? __seccomp_filter+0x74/0x270
Jul  4 15:11:21 dumaty kernel: [11027.163610]  [<ffffffffadcff4df>] ? SyS_futex+0x7f/0x160
Jul  4 15:11:21 dumaty kernel: [11027.163613]  [<ffffffffadc03721>] ? exit_to_usermode_loop+0x71/0xb0
Jul  4 15:11:21 dumaty kernel: [11027.163615]  [<ffffffffadc03bd9>] ? do_syscall_64+0xe9/0x100
Jul  4 15:11:21 dumaty kernel: [11027.163619]  [<ffffffffae22238e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
Jul  4 15:11:21 dumaty kernel: [11027.163620] Disabling lock debugging due to kernel taint
Jul  4 15:11:21 dumaty kernel: [11027.165144] BUG: Bad rss-counter state mm:ffff9cf5bb398000 idx:2 val:-1

后来还是:

Jul  4 16:03:38 dumaty kernel: [14164.368364] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
Jul  4 16:03:38 dumaty kernel: [14164.368412] IP: [<ffffffffadf61e1f>] swiotlb_unmap_sg_attrs+0x1f/0x50
Jul  4 16:03:38 dumaty kernel: [14164.368447] PGD 0 
Jul  4 16:03:38 dumaty kernel: [14164.368457] 
Jul  4 16:03:38 dumaty kernel: [14164.368467] Oops: 0000 [#2] SMP
Jul  4 16:03:38 dumaty kernel: [14164.368483] Modules linked in: ctr ccm binfmt_misc rfcomm fuse cmac bnep iTCO_wdt iTCO_vendor_support intel_rapl arc4 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi kvm iwlmvm irqbypass intel_cstate mac80211 joydev evdev intel_uncore pcspkr intel_rapl_perf snd_hda_codec_realtek rtsx_pci_ms serio_raw iwlwifi sg hid_multitouch snd_hda_codec_generic memstick uvcvideo lpc_ich cfg80211 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core btusb cdc_mbim btrtl cdc_wdm snd_hda_intel videodev btbcm shpchp btintel snd_hda_codec i915 media cdc_ncm cdc_acm snd_hda_core bluetooth usbnet drm_kms_helper mii snd_hwdep drm mei_me snd_pcm snd_timer mei i2c_algo_bit thinkpad_acpi wmi nvram snd soundcore ac rfkill battery video button parport_pc ppdev lp parport ip_tables x_tables
Jul  4 16:03:38 dumaty kernel: [14164.368930]  autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache algif_skcipher af_alg usbhid hid dm_crypt dm_mod sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel rtsx_pci_sdmmc mmc_core aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse ahci libahci i2c_i801 i2c_smbus libata xhci_pci scsi_mod ehci_pci xhci_hcd ehci_hcd rtsx_pci e1000e mfd_core ptp usbcore pps_core usb_common thermal
Jul  4 16:03:38 dumaty kernel: [14164.369092] CPU: 2 PID: 1819 Comm: chrome Tainted: G    B D W       4.9.0-16-amd64 #1 Debian 4.9.272-1
Jul  4 16:03:38 dumaty kernel: [14164.369126] Hardware name: LENOVO 20ARA0YL00/20ARA0YL00, BIOS GJET77WW (2.27 ) 05/20/2014
Jul  4 16:03:38 dumaty kernel: [14164.369158] task: ffff9cf5e8136100 task.stack: ffffb7c482578000
Jul  4 16:03:38 dumaty kernel: [14164.369191] RIP: 0010:[<ffffffffadf61e1f>]  [<ffffffffadf61e1f>] swiotlb_unmap_sg_attrs+0x1f/0x50
Jul  4 16:03:38 dumaty kernel: [14164.369230] RSP: 0018:ffffb7c48257bc70  EFLAGS: 00010212
Jul  4 16:03:38 dumaty kernel: [14164.369257] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Jul  4 16:03:38 dumaty kernel: [14164.369294] RDX: 0000000000001000 RSI: 0000000080eed000 RDI: ffff9cf36fd09400
Jul  4 16:03:38 dumaty kernel: [14164.369328] RBP: 0000000000000021 R08: 0000000000000000 R09: 000000000000ffff
Jul  4 16:03:38 dumaty kernel: [14164.369357] R10: ffff9cf62fd12a20 R11: ffff9cf5e1bbf738 R12: 0000000000000000
Jul  4 16:03:38 dumaty kernel: [14164.370826] R13: 0000000000000040 R14: ffff9cf64f8a40a0 R15: ffff9cf64b600000
Jul  4 16:03:38 dumaty kernel: [14164.372260] FS:  00007fac918be000(0000) GS:ffff9cf65e280000(0000) knlGS:0000000000000000
Jul  4 16:03:38 dumaty kernel: [14164.373705] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul  4 16:03:38 dumaty kernel: [14164.375100] CR2: 0000000000000018 CR3: 00000002a82f2000 CR4: 0000000000160670
Jul  4 16:03:38 dumaty kernel: [14164.376543] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul  4 16:03:38 dumaty kernel: [14164.377938] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jul  4 16:03:38 dumaty kernel: [14164.379216] Stack:
Jul  4 16:03:38 dumaty kernel: [14164.380610]  ffff9cf649f5d300 0000000000000000 ffffffffc09d6da0 ffff9cf649f5d300
Jul  4 16:03:38 dumaty kernel: [14164.382141]  ffffffffc06c73b8 ffffffffc094b02e ffff9cf649f5d300 0000000000000000
Jul  4 16:03:38 dumaty kernel: [14164.383372]  ffffffffc09d6da0 ffff9cf64b600000 ffffffffc06c73b8 ffff9cf64b600000
Jul  4 16:03:38 dumaty kernel: [14164.384541] Call Trace:
Jul  4 16:03:38 dumaty kernel: [14164.385717]  [<ffffffffc094b02e>] ? i915_gem_object_put_pages_gtt+0x3e/0x260 [i915]
Jul  4 16:03:38 dumaty kernel: [14164.386885]  [<ffffffffc09490e2>] ? i915_gem_object_put_pages+0x72/0xf0 [i915]
Jul  4 16:03:38 dumaty kernel: [14164.388043]  [<ffffffffc094de9c>] ? i915_gem_free_object+0xcc/0x280 [i915]
Jul  4 16:03:38 dumaty kernel: [14164.389419]  [<ffffffffc06a48c6>] ? drm_gem_object_unreference_unlocked+0x76/0x80 [drm]
Jul  4 16:03:38 dumaty kernel: [14164.391121]  [<ffffffffc06a49e1>] ? drm_gem_object_release_handle+0x51/0x90 [drm]
Jul  4 16:03:38 dumaty kernel: [14164.393000]  [<ffffffffc06a4a79>] ? drm_gem_handle_delete+0x59/0x80 [drm]
Jul  4 16:03:38 dumaty kernel: [14164.394899]  [<ffffffffc06a5c2a>] ? drm_ioctl+0x1fa/0x470 [drm]
Jul  4 16:03:38 dumaty kernel: [14164.396774]  [<ffffffffc06a5150>] ? drm_gem_handle_create+0x40/0x40 [drm]
Jul  4 16:03:38 dumaty kernel: [14164.398721]  [<ffffffffade2a5b6>] ? current_time+0x36/0x70
Jul  4 16:03:38 dumaty kernel: [14164.400573]  [<ffffffffadda43ec>] ? shmem_truncate_range+0x1c/0x40
Jul  4 16:03:38 dumaty kernel: [14164.402625]  [<ffffffffadd2d5a4>] ? __seccomp_filter+0x74/0x270
Jul  4 16:03:38 dumaty kernel: [14164.404488]  [<ffffffffade220e2>] ? do_vfs_ioctl+0xa2/0x620
Jul  4 16:03:38 dumaty kernel: [14164.406324]  [<ffffffffadc03337>] ? syscall_trace_enter+0x117/0x2c0
Jul  4 16:03:38 dumaty kernel: [14164.408169]  [<ffffffffade226d4>] ? SyS_ioctl+0x74/0x80
Jul  4 16:03:38 dumaty kernel: [14164.410000]  [<ffffffffadc03b7d>] ? do_syscall_64+0x8d/0x100
Jul  4 16:03:38 dumaty kernel: [14164.411822]  [<ffffffffae22238e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
Jul  4 16:03:38 dumaty kernel: [14164.413687] Code: 40 00 66 2e 0f 1f 84 00 00 00 00 00 83 f9 03 74 48 41 56 41 55 49 89 fe 41 54 55 31 ed 85 d2 53 41 89 d5 48 89 f3 41 89 cc 7e 25 <8b> 53 18 48 8b 73 10 44 89 e1 4c 89 f7 83 c5 01 e8 9c ff ff ff 
Jul  4 16:03:38 dumaty kernel: [14164.415765] RIP  [<ffffffffadf61e1f>] swiotlb_unmap_sg_attrs+0x1f/0x50
Jul  4 16:03:38 dumaty kernel: [14164.417692]  RSP <ffffb7c48257bc70>
Jul  4 16:03:38 dumaty kernel: [14164.419631] CR2: 0000000000000018
Jul  4 16:03:38 dumaty kernel: [14164.421605] ---[ end trace e62295838fbe3e50 ]---
Jul  4 16:04:00 dumaty kernel: [14186.946643] GpuWatchdog[1835]: segfault at 0 ip 00005564adf60a02 sp 00007fac7f8656f0 error 6 in chrome[5564a96c6000+7bf3000]
Jul  4 16:04:52 dumaty kernel: [14238.504806] BUG: unable to handle kernel paging request at 000000030ea51897
Jul  4 16:04:52 dumaty kernel: [14238.507190] IP: [<ffffffffadc98962>] __task_pid_nr_ns+0x42/0x90
Jul  4 16:04:52 dumaty kernel: [14238.509452] PGD 0 
Jul  4 16:04:52 dumaty kernel: [14238.509464] 
Jul  4 16:04:52 dumaty kernel: [14238.511711] Oops: 0000 [#3] SMP
Jul  4 16:04:52 dumaty kernel: [14238.513959] Modules linked in: ctr ccm binfmt_misc rfcomm fuse cmac bnep iTCO_wdt iTCO_vendor_support intel_rapl arc4 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi kvm iwlmvm irqbypass intel_cstate mac80211 joydev evdev intel_uncore pcspkr intel_rapl_perf snd_hda_codec_realtek rtsx_pci_ms serio_raw iwlwifi sg hid_multitouch snd_hda_codec_generic memstick uvcvideo lpc_ich cfg80211 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core btusb cdc_mbim btrtl cdc_wdm snd_hda_intel videodev btbcm shpchp btintel snd_hda_codec i915 media cdc_ncm cdc_acm snd_hda_core bluetooth usbnet drm_kms_helper mii snd_hwdep drm mei_me snd_pcm snd_timer mei i2c_algo_bit thinkpad_acpi wmi nvram snd soundcore ac rfkill battery video button parport_pc ppdev lp parport ip_tables x_tables
Jul  4 16:04:52 dumaty kernel: [14238.521475]  autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache algif_skcipher af_alg usbhid hid dm_crypt dm_mod sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel rtsx_pci_sdmmc mmc_core aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse ahci libahci i2c_i801 i2c_smbus libata xhci_pci scsi_mod ehci_pci xhci_hcd ehci_hcd rtsx_pci e1000e mfd_core ptp usbcore pps_core usb_common thermal
Jul  4 16:04:52 dumaty kernel: [14238.529141] CPU: 2 PID: 8675 Comm: top Tainted: G    B D W       4.9.0-16-amd64 #1 Debian 4.9.272-1
Jul  4 16:04:52 dumaty kernel: [14238.531734] Hardware name: LENOVO 20ARA0YL00/20ARA0YL00, BIOS GJET77WW (2.27 ) 05/20/2014
Jul  4 16:04:52 dumaty kernel: [14238.534341] task: ffff9cf60fe45100 task.stack: ffffb7c488158000
Jul  4 16:04:52 dumaty kernel: [14238.537059] RIP: 0010:[<ffffffffadc98962>]  [<ffffffffadc98962>] __task_pid_nr_ns+0x42/0x90
Jul  4 16:04:52 dumaty kernel: [14238.539736] RSP: 0018:ffffb7c48815bd78  EFLAGS: 00010286
Jul  4 16:04:52 dumaty kernel: [14238.542405] RAX: 0000000000000508 RBX: ffff9cf64a93de40 RCX: ffff9cf64d816e00
Jul  4 16:04:52 dumaty kernel: [14238.545097] RDX: 000000030ea51067 RSI: 0000000000000004 RDI: ffff9cf6354d1588
Jul  4 16:04:52 dumaty kernel: [14238.547772] RBP: ffff9cf64d816e00 R08: 000000000000044c R09: 0000000000000000
Jul  4 16:04:52 dumaty kernel: [14238.550439] R10: 0000000000000007 R11: ffff9cf64aa452a6 R12: ffff9cf6354d1080
Jul  4 16:04:52 dumaty kernel: [14238.553111] R13: ffffffffae61bb79 R14: 0000000000000066 R15: ffff9cf64da0c840
Jul  4 16:04:52 dumaty kernel: [14238.555792] FS:  00007fa93fea2280(0000) GS:ffff9cf65e280000(0000) knlGS:0000000000000000
Jul  4 16:04:52 dumaty kernel: [14238.558422] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul  4 16:04:52 dumaty kernel: [14238.560971] CR2: 000000030ea51897 CR3: 000000030b9c2000 CR4: 0000000000160670
Jul  4 16:04:52 dumaty kernel: [14238.563467] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul  4 16:04:52 dumaty kernel: [14238.565880] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jul  4 16:04:52 dumaty kernel: [14238.568210] Stack:
Jul  4 16:04:52 dumaty kernel: [14238.570452]  ffffffffade856ff ffffffffae83eee0 ffffffffae845d20 ffff9cf635772800
Jul  4 16:04:52 dumaty kernel: [14238.572715]  00000000000003ff 000000000000044c 0000000000000040 ffffb7c48815bed8
Jul  4 16:04:52 dumaty kernel: [14238.574970]  ffffb7c48815beec 0000000000000001 0000000000000000 0000000000000000
Jul  4 16:04:52 dumaty kernel: [14238.577206] Call Trace:
Jul  4 16:04:52 dumaty kernel: [14238.579412]  [<ffffffffade856ff>] ? proc_pid_status+0x46f/0x9f0
Jul  4 16:04:52 dumaty kernel: [14238.581621]  [<ffffffffaddea428>] ? __kmalloc+0x188/0x580
Jul  4 16:04:52 dumaty kernel: [14238.583821]  [<ffffffffade7ff51>] ? proc_single_show+0x51/0x80
Jul  4 16:04:52 dumaty kernel: [14238.586022]  [<ffffffffade34326>] ? seq_read+0x106/0x400
Jul  4 16:04:52 dumaty kernel: [14238.588217]  [<ffffffffade0d6e1>] ? vfs_read+0x91/0x130
Jul  4 16:04:52 dumaty kernel: [14238.590405]  [<ffffffffade0ebfa>] ? SyS_read+0x5a/0xd0
Jul  4 16:04:52 dumaty kernel: [14238.592578]  [<ffffffffadc03b7d>] ? do_syscall_64+0x8d/0x100
Jul  4 16:04:52 dumaty kernel: [14238.594739]  [<ffffffffae22238e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6
Jul  4 16:04:52 dumaty kernel: [14238.596894] Code: 08 05 00 00 74 1a 83 fe 04 74 0e 89 f6 48 8d 04 76 48 8d 04 c5 08 05 00 00 48 8b bf d0 04 00 00 48 01 c7 48 8b 0f 48 85 c9 74 1f <8b> b2 30 08 00 00 31 c0 3b 71 04 77 0d 48 c1 e6 05 48 01 f1 48 
Jul  4 16:04:52 dumaty kernel: [14238.599315] RIP  [<ffffffffadc98962>] __task_pid_nr_ns+0x42/0x90
Jul  4 16:04:52 dumaty kernel: [14238.601602]  RSP <ffffb7c48815bd78>
Jul  4 16:04:52 dumaty kernel: [14238.603887] CR2: 000000030ea51897
Jul  4 16:04:52 dumaty kernel: [14238.606195] ---[ end trace e62295838fbe3e51 ]---

我已经运行 memtest、memtest86+、stress-ng(cpu、hdd、vm 压力源)几个小时而没有触发故障。关闭交换使事情稳定了近一天。由于这个原因,而且smartctl -t short 似乎根本没有完成,我订购了一个替换SSD。上述故障发生后不久。我相信所有崩溃都是在观看 YouTube 时发生的(不过没有太多其他用途)。 glxgears 没有触发故障。

有什么想法可能导致这种情况以及如何诊断吗?

答案1

您没有说明您是 HDD 还是 SSD,但您可以使用以下命令查看累积的 SMART 错误计数摘要,而不是尝试磁盘短路测试:

sudo smartctl -A /dev/sda

Reallocated_Sector_Ct 的非零“原始值”尤其令人担忧......

许多指标声称“高龄”甚至“预失败”显然是完全正常的。

答案2

事实证明,这是 i915 驱动程序在内存压力下出现的问题,通过升级到 linux 4.19.0 解决了这个问题。我不知道为什么之前几个月都没有发生,这让我彻底放弃了软件问题的想法。

答案3

我只是面临着相同或非常相似的麻烦。 Thinkpad T470/i5-7200U,多年来一直使用内核 4.9.0 的 Debian 9。至少去年的 4.9.0-14 和 4.9.0-15 还可以(除了偶尔无法在第一次尝试时暂停)。最近我升级了内存,包括。打开机箱(任何地方都可能出错)并升级到 4.9.0-16。然后一天会出现几次各种内核错误,要么杀死一个应用程序,要么冻结整个系统。暂停系统也几乎不可能。错误是例如

Oct 15 10:19:48 tp470 kernel: [ 3214.743652] BUG: Bad page map in process chromium-browse  pte:ac00000000000000 pmd:19dedf067
Oct 15 10:19:48 tp470 kernel: [ 3214.743799] BUG: Bad page map in process chromium-browse  pte:1ac9625ced pmd:19dedf067
Oct 15 12:00:51 tp470 kernel: [ 9277.955817] BUG: unable to handle kernel NULL pointer dereference at 0000000000000017
Oct 15 12:12:22 tp470 kernel: [ 9968.541038] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007
Oct 15 12:26:32 tp470 kernel: [10818.769010] BUG: unable to handle kernel NULL pointer dereference at 0000000000000017
Oct 15 13:16:42 tp470 kernel: [ 2294.294541] BUG: unable to handle kernel paging request at ffff9d1ac9625ced
Oct 15 13:34:42 tp470 kernel: [ 3374.077080] BUG: Bad page map in process Privileged Cont  pte:8000001ac9625ced pmd:1d2f4e067
Oct 15 13:34:42 tp470 kernel: [ 3374.087870] BUG: Bad page cache in process firefox-esr  pfn:1a94c9
Oct 15 16:17:04 tp470 kernel: [13115.567010] BUG: unable to handle kernel paging request at ffff9d1ac9625d35

我的第一个想法是相同的 - 硬件损坏,测试了两个 RAM 模块等。然后我从 USB 测试了 Debian 11 - 瞧,一点问题都没有。现在我已经返回到4.9.0-14,内核也没有问题。所以看来版本 4.9.0-16-amd64 对于(至少)某些 Intel CPU 来说内部有问题......

相关内容