玩游戏时内核错误导致系统崩溃 - nvidia 驱动程序

玩游戏时内核错误导致系统崩溃 - nvidia 驱动程序

希望这是合适的地方。当我在运行 nvidia 驱动程序的 Linux Mint 17.3 上玩 Dota 2 时,我开始遇到随机系统崩溃。游戏几分钟后就会发生这种情况 - 屏幕冻结,最后一秒的音频循环,系统没有响应,需要硬重置。

当时我认为这可能是硬件故障,因为它是一台相当旧的机器,无论如何我都计划升级。所以我最终得到了一个新的主板、CPU 和内存(Gigabyte GA-H170N-WIFI、Intel Core i5 6500、Corsair Vengeance LPX CMK16GX4M2A2133C13 16GB (2x8GB) DDR4)并保留相同的 GPU (GTX 750 Ti)。我仍然遇到同样的崩溃。然后我尝试了以下方法,但没有任何效果:

  • 不同的 GPU (GTX 950) - 相同的问题
  • 不同的电源 - 相同的问题
  • 不同的 SSID 与全新的操作系统安装 - 相同的问题 此时我认为硬件故障已被排除,因为它是一个全新的系统。

  • 升级到 Linux Mint 18 beta

  • 尝试了 3 个不同的 nvidia 驱动程序版本(对于我尝试过的两个 GPU)
  • 尝试升级内核
  • 尝试过升级BIOS
  • 启用垂直同步
  • Dota 2 特定:启动选项中的 -gl_disable_buffer_storage
  • 内存测试

最终我发现,如果我在从另一台机器上 ssh 时跟踪内核日志,我可以看到错误(但是,重新启动后它不在日志中)。这就是我得到的:

Jun 15 23:15:02 lucas-desktop kernel: [ 1218.461993] BUG: unable to handle kernel paging request at ffffaa0001f4c120
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.461996] IP: [<ffffffff81194cc8>] free_pcppages_bulk+0x368/0x480
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462000] PGD 0 
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462001] Oops: 0002 [#1] SMP 
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462003] Modules linked in: rfcomm bnep binfmt_misc nls_iso8859_1 snd_hda_codec_hdmi arc4 nvidia_uvm(POE) snd_hda_codec_realtek snd_hda_codec_generic i915_bpo nvidia_drm(POE) nvidia_modeset(POE) intel_ips drm_kms_helper drm fb_sys_fops syscopyarea sysfillrect sysimgblt nvidia(POE) snd_hda_intel snd_hda_codec intel_rapl snd_hda_core iwlmvm snd_hwdep x86_pkg_temp_thermal intel_powerclamp coretemp mac80211 snd_pcm kvm_intel kvm joydev irqbypass input_leds snd_seq_midi snd_seq_midi_event crct10dif_pclmul btusb snd_rawmidi btrtl crc32_pclmul snd_seq aesni_intel snd_seq_device snd_timer aes_x86_64 iwlwifi lrw gf128mul glue_helper ablk_helper cryptd snd cfg80211 soundcore serio_raw mei_me shpchp mei hci_uart btbcm btqca btintel bluetooth wmi intel_lpss_acpi intel_lpss tpm_infineon mac_hid acpi_pad acpi_als kfifo_buf industrialio parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq dm_mirror dm_region_hash dm_log hid_generic hid_roccat_savu hid_roccat hid_roccat_common usbhid psmouse igb e1000e dca ptp pps_core ahci i2c_algo_bit libahci video pinctrl_sunrisepoint i2c_hid pinctrl_intel hid fjes [last unloaded: cpuid]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462031] CPU: 1 PID: 2601 Comm: steamwebhelper Tainted: P           OE   4.4.0-24-generic #43-Ubuntu
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462033] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H170N-WIFI-CF, BIOS F4c 01/13/2016
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462034] task: ffff8800861ebb00 ti: ffff8803e869c000 task.ti: ffff8803e869c000
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462035] RIP: 0010:[<ffffffff81194cc8>]  [<ffffffff81194cc8>] free_pcppages_bulk+0x368/0x480
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462037] RSP: 0018:ffff8803e869fa90  EFLAGS: 00210002
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462037] RAX: ffffea0001ef9600 RBX: 0000000000000002 RCX: ffffaa0001f4c120
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462038] RDX: 0000000000000658 RSI: ffffea00019b7120 RDI: dead000000000100
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462039] RBP: ffff8803e869fb00 R08: ffffea0001ef9700 R09: 0000000000200286
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462040] R10: 0000000000000009 R11: 000000000000000a R12: 000000000000065c
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462040] R13: 0000000000000001 R14: ffff8804767f97c0 R15: 0000000000000001
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462041] FS:  0000000000000000(0000) GS:ffff880476480000(0000) knlGS:0000000000000000
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462042] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462043] CR2: ffffaa0001f4c120 CR3: 0000000002e0a000 CR4: 00000000003406e0
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462044] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462044] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462045] Stack:
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462046]  ffff8804767f9d40 ffff88047649a6e0 0000001063927020 ffff88047649a700
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462047]  ffff88047649a6f0 ffffea0001ef9740 0000000100000010 0000000000000198
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462048]  0000000000000000 ffff8804767f97c0 ffff88047649a6e0 000000000000001f
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462050] Call Trace:
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462052]  [<ffffffff8119510c>] free_hot_cold_page+0x18c/0x1c0
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462054]  [<ffffffff81195188>] free_hot_cold_page_list+0x48/0xb0
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462055]  [<ffffffff8119dbdb>] release_pages+0xdb/0x2b0
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462058]  [<ffffffff811d2e1d>] free_pages_and_swap_cache+0x7d/0x90
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462060]  [<ffffffff811bbbe6>] tlb_flush_mmu_free+0x36/0x60
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462062]  [<ffffffff811bdddc>] unmap_page_range+0x58c/0x7a0
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462063]  [<ffffffff811be06d>] unmap_single_vma+0x7d/0xe0
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462065]  [<ffffffff811beb31>] unmap_vmas+0x51/0xa0
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462066]  [<ffffffff811c8027>] exit_mmap+0xa7/0x170
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462068]  [<ffffffff8107df77>] mmput+0x57/0x130
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462070]  [<ffffffff81083cad>] do_exit+0x27d/0xae0
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462072]  [<ffffffff810ac950>] ? wake_up_state+0x10/0x20
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462074]  [<ffffffff8108df0e>] ? signal_wake_up_state+0x1e/0x30
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462075]  [<ffffffff81084593>] do_group_exit+0x43/0xb0
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462076]  [<ffffffff81084614>] SyS_exit_group+0x14/0x20
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462078]  [<ffffffff81003dcc>] do_fast_syscall_32+0x9c/0x170
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462080]  [<ffffffff818281b2>] sysenter_flags_fixed+0x8/0x12
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462081] Code: 00 00 49 89 48 20 49 89 40 28 48 89 10 e9 d8 fe ff ff 48 8b 70 20 48 8b 48 28 48 bf 00 01 00 00 00 00 ad de 4c 21 e2 48 89 4e 08 <48> 89 31 89 d9 48 8d 34 49 48 89 78 20 48 bf 00 02 00 00 00 00 
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462097] RIP  [<ffffffff81194cc8>] free_pcppages_bulk+0x368/0x480
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462098]  RSP <ffff8803e869fa90>
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462099] CR2: ffffaa0001f4c120
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462100] ---[ end trace c04c45f3b1d98e2f ]---
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.462101] Fixing recursive fault but reboot is needed!
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477219] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477221] IP: [<ffffffffc0d2261a>] _nv011492rm+0x3a/0xd0 [nvidia]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477306] PGD 0 
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477307] Oops: 0002 [#2] SMP 
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477309] Modules linked in: rfcomm bnep binfmt_misc nls_iso8859_1 snd_hda_codec_hdmi arc4 nvidia_uvm(POE) snd_hda_codec_realtek snd_hda_codec_generic i915_bpo nvidia_drm(POE) nvidia_modeset(POE) intel_ips drm_kms_helper drm fb_sys_fops syscopyarea sysfillrect sysimgblt nvidia(POE) snd_hda_intel snd_hda_codec intel_rapl snd_hda_core iwlmvm snd_hwdep x86_pkg_temp_thermal intel_powerclamp coretemp mac80211 snd_pcm kvm_intel kvm joydev irqbypass input_leds snd_seq_midi snd_seq_midi_event crct10dif_pclmul btusb snd_rawmidi btrtl crc32_pclmul snd_seq aesni_intel snd_seq_device snd_timer aes_x86_64 iwlwifi lrw gf128mul glue_helper ablk_helper cryptd snd cfg80211 soundcore serio_raw mei_me shpchp mei hci_uart btbcm btqca btintel bluetooth wmi intel_lpss_acpi intel_lpss tpm_infineon mac_hid acpi_pad acpi_als kfifo_buf industrialio parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq dm_mirror dm_region_hash dm_log hid_generic hid_roccat_savu hid_roccat hid_roccat_common usbhid psmouse igb e1000e dca ptp pps_core ahci i2c_algo_bit libahci video pinctrl_sunrisepoint i2c_hid pinctrl_intel hid fjes [last unloaded: cpuid]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477343] CPU: 0 PID: 1084 Comm: Xorg Tainted: P      D    OE   4.4.0-24-generic #43-Ubuntu
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477344] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H170N-WIFI-CF, BIOS F4c 01/13/2016
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477345] task: ffff880460038ec0 ti: ffff88045d93c000 task.ti: ffff88045d93c000
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477346] RIP: 0010:[<ffffffffc0d2261a>]  [<ffffffffc0d2261a>] _nv011492rm+0x3a/0xd0 [nvidia]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477418] RSP: 0018:ffff88045d93fba0  EFLAGS: 00010206
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477419] RAX: 00000000007b0000 RBX: 0000000000000001 RCX: 0000000000000020
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477420] RDX: 0000000000000001 RSI: ffff88045dec5c24 RDI: 0000000000000001
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477421] RBP: ffff88045dec5c20 R08: ffffffffc11524f0 R09: 0000000000000000
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477421] R10: ffff880461fe8c00 R11: ffffffffc0d4a0b0 R12: ffff88045ce0a808
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477422] R13: ffff88045ce0a810 R14: ffff88045cd14008 R15: ffff880460f87008
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477423] FS:  00007fe2d7a66a00(0000) GS:ffff880476400000(0000) knlGS:0000000000000000
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477425] CR2: 0000000000000020 CR3: 000000046190d000 CR4: 00000000003406f0
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477425] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477426] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477427] Stack:
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477427]  ffffffff90ffc789 ffff880462a7c008 ffffffffc0d6d8fa 0000000000000000
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477429]  0000000000000000 0000000000000000 ffffffffc0a43461 ffff88045d97ac08
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477430]  0000000000000000 0000000000000000 ffff8804602d3408 ffff8803ccfba548
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477432] Call Trace:
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477497]  [<ffffffffc0d6d8fa>] ? _nv012701rm+0x2a/0x90 [nvidia]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477582]  [<ffffffffc0a43461>] ? _nv007552rm+0x101/0x1090 [nvidia]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477665]  [<ffffffffc0a3dc81>] ? _nv007551rm+0x101/0x250 [nvidia]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477748]  [<ffffffffc0a3dc0b>] ? _nv007551rm+0x8b/0x250 [nvidia]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477816]  [<ffffffffc0d2d82e>] ? _nv000897rm+0x1c7e/0x1e40 [nvidia]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477883]  [<ffffffffc0d2cef9>] ? _nv000897rm+0x1349/0x1e40 [nvidia]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.477950]  [<ffffffffc0d2db19>] ? _nv000845rm+0x129/0x4a0 [nvidia]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.478016]  [<ffffffffc0d2dada>] ? _nv000845rm+0xea/0x4a0 [nvidia]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.478082]  [<ffffffffc0d353c7>] ? _nv003162rm+0x1797/0x3280 [nvidia]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.478145]  [<ffffffffc0da2240>] ? _nv000839rm+0x250/0x800 [nvidia]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.478208]  [<ffffffffc0dac133>] ? rm_ioctl+0x73/0x100 [nvidia]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.478215]  [<ffffffffc1f55400>] ? _nv000320kms+0x50/0x70 [nvidia_modeset]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.478249]  [<ffffffffc083cca4>] ? nvidia_ioctl+0x144/0x4b0 [nvidia]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.478285]  [<ffffffffc083b080>] ? nvidia_frontend_compat_ioctl+0x40/0x50 [nvidia]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.478321]  [<ffffffffc083b09e>] ? nvidia_frontend_unlocked_ioctl+0xe/0x10 [nvidia]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.478324]  [<ffffffff8122062f>] ? do_vfs_ioctl+0x29f/0x490
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.478326]  [<ffffffff8103b32d>] ? fpu__restore_sig+0x4d/0x60
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.478327]  [<ffffffff81220899>] ? SyS_ioctl+0x79/0x90
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.478330]  [<ffffffff81825bf2>] ? entry_SYSCALL_64_fastpath+0x16/0x71
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.478330] Code: c7 45 04 00 00 00 00 48 8d 75 04 89 df e8 8f f0 ff ff 48 85 c0 74 02 8b b8 90 1d 00 00 85 ff 74 e6 48 85 c0 74 13 68 89 c7 ff 90 <18> 01 00 20 89 c2 48 83 c5 08 89 90 5b c3 c7 45 04 00 00 00 00 
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.478346] RIP  [<ffffffffc0d2261a>] _nv011492rm+0x3a/0xd0 [nvidia]
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.478414]  RSP <ffff88045d93fba0>
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.478414] CR2: 0000000000000020
Jun 15 23:15:02 lucas-desktop kernel: [ 1218.478416] ---[ end trace c04c45f3b1d98e30 ]---

我认为这里的问题是“无法处理内核分页请求”部分,但目前它超出了我的专业水平。

目前,我的系统是:

  • 技嘉 GA-H170N-WIFI
  • 英特尔酷睿 i5 6500
  • 海盗船复仇 LPX CMK16GX4M2A2133C13 16GB (2x8GB) DDR4
  • GTX 750 钛
  • Linux Mint 18 测试版
  • 英伟达 367 驱动程序

我不知道是什么原因导致了这种情况,因为我已经排除了硬件问题,并且在不同的操作系统版本和内核版本中仍然会发生这种情况。我找不到其他人遇到过我的一系列问题。

任何有关我下一步可以解决问题的建议都会非常有帮助。提前致谢。

相关内容