Windows10 上的 GPU 故障排除 BSOD

Windows10 上的 GPU 故障排除 BSOD

描述

我有一个RTX 3090 SuprimX.我挖矿以太坊温度相对较低,GPU最高温度为52,HotSpot最高温度为62-63。

问题出在我的背板,现在热浪来了,夏天来了,我得到了98-100恒定的温度记忆连接背面最多102(在一天中最热的时刻,这种现象不会偶尔持续几秒钟)。

GPU 以恒定速度运行100度VRam 温度(MemoryJunction)。自夏天以来,我的电脑本月崩溃了 2 次,而且天气很热。一次是在 6 月 9 日(当天 Windows 更新很困难),另一次是今天。原因是什么?同样是驱动程序 dxgkrnl.sys。

我的 GPU 是否已损坏,或者正在损坏?或者这只是驱动程序/软件问题?在这里最热的日子里,阳光下的室外温度可以达到 35-40 摄氏度。这导致室内环境温度很高。

这台电脑只用于全天候挖矿。我暂时不会用它做其他事情。

收集的信息

蓝屏查看器显示这是由驱动程序引起的:dxgkrnl.sys 事件查看器显示关机是 KernelPower ID 41、BugCheckCode 0x116 (0x00000116) Video_TDR_Failure。

当 WinDB(Windows 调试器)打开此事件后创建的内存转储时,它会显示此消息。

0x116 VIDEO_TDP_FAIL DATA

                                                                             
                         Bugcheck Analysis                                    
                                                                            


VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: ffffe1814ca0a460, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff8072a1751bc, The pointer into responsible device driver module (e.g. owner tag).
Arg3: ffffffffc000009a, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 0000000000000004, Optional internal context dependent data.

Debugging Details:
------------------

Unable to load image \SystemRoot\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_a494df49ba2f9f36\nvlddmkm.sys, Win32 error 0n2
*** WARNING: Unable to verify timestamp for nvlddmkm.sys
*** WARNING: Unable to verify checksum for win32k.sys

KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.mSec
    Value: 3827

    Key  : Analysis.DebugAnalysisManager
    Value: Create

    Key  : Analysis.Elapsed.mSec
    Value: 37803

    Key  : Analysis.Init.CPU.mSec
    Value: 374

    Key  : Analysis.Init.Elapsed.mSec
    Value: 36504

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 90

    Key  : WER.OS.Branch
    Value: vb_release

    Key  : WER.OS.Timestamp
    Value: 2019-12-06T14:06:00Z

    Key  : WER.OS.Version
    Value: 10.0.19041.1


BUGCHECK_CODE:  116

BUGCHECK_P1: ffffe1814ca0a460

BUGCHECK_P2: fffff8072a1751bc

BUGCHECK_P3: ffffffffc000009a

BUGCHECK_P4: 4

VIDEO_TDR_CONTEXT: dt dxgkrnl!_TDR_RECOVERY_CONTEXT ffffe1814ca0a460
Symbol dxgkrnl!_TDR_RECOVERY_CONTEXT not found.

PROCESS_OBJECT: 0000000000000004

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXPNP: 1 (!blackboxpnp)


BLACKBOXWINLOGON: 1

CUSTOMER_CRASH_COUNT:  1

PROCESS_NAME:  System

STACK_TEXT:  
ffffde84`f740f9d8 fffff807`242f1cce     : 00000000`00000116 ffffe181`4ca0a460 fffff807`2a1751bc ffffffff`c000009a : nt!KeBugCheckEx
ffffde84`f740f9e0 fffff807`242a24f4     : fffff807`2a1751bc ffffe181`5821b010 00000000`00002000 ffffe181`5821b0d0 : dxgkrnl!TdrBugcheckOnTimeout+0xfe
ffffde84`f740fa20 fffff807`2429b02f     : ffffe181`58218000 00000000`01000000 00000000`00000002 00000000`00000002 : dxgkrnl!ADAPTER_RENDER::Reset+0x174
ffffde84`f740fa50 fffff807`242f13f5     : 00000000`00000100 ffffe181`58218a58 00000000`460c13e0 fffff807`15d57f3c : dxgkrnl!DXGADAPTER::Reset+0x4df
ffffde84`f740fad0 fffff807`242f1567     : fffff807`16725440 ffffe181`5f62d460 00000000`00000000 00000000`00000500 : dxgkrnl!TdrResetFromTimeout+0x15
ffffde84`f740fb00 fffff807`15d41225     : ffffe181`716f4640 fffff807`242f1540 ffffe181`4477aa20 fffff807`00000000 : dxgkrnl!TdrResetFromTimeoutWorkItem+0x27
ffffde84`f740fb30 fffff807`15cf53b5     : ffffe181`716f4640 00000000`00000080 ffffe181`44670040 00000000`00000000 : nt!ExpWorkerThread+0x105
ffffde84`f740fbd0 fffff807`15dfe278     : ffffa300`5d863180 ffffe181`716f4640 fffff807`15cf5360 00000000`00000000 : nt!PspSystemThreadStartup+0x55
ffffde84`f740fc20 00000000`00000000     : ffffde84`f7410000 ffffde84`f7409000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x28


SYMBOL_NAME:  nvlddmkm+d851bc

MODULE_NAME: nvlddmkm

IMAGE_NAME:  nvlddmkm.sys

STACK_COMMAND:  .thread ; .cxr ; kb

FAILURE_BUCKET_ID:  0x116_IMAGE_nvlddmkm.sys

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {c89bfe8c-ed39-f658-ef27-f2898997fdbd}

Followup:     MachineOwner
---------

PART 2
0: kd> lmvm nvlddmkm
Browse full module list
start             end                 module name
fffff807`293f0000 fffff807`2b97c000   nvlddmkm T (no symbols)           
    Loaded symbol image file: nvlddmkm.sys
    Image path: \SystemRoot\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_a494df49ba2f9f36\nvlddmkm.sys
    Image name: nvlddmkm.sys
    Browse all global symbols  functions  data
    Timestamp:        Sat Apr 24 00:12:01 2021 (60833821)
    CheckSum:         024FD385
    ImageSize:        0258C000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4
    Information from resource tables:

相关内容