在 Ubuntu 20.04 上安装 nvidia-driver-510 后无法识别 GTX 2060,并且并排识别 GTX 3060

在 Ubuntu 20.04 上安装 nvidia-driver-510 后无法识别 GTX 2060,并且并排识别 GTX 3060

在 Ubuntu 20.04 上安装 nvidia-driver-510 后无法识别 GTX 2060 和 GTX 3060(我有 2 个 Nvidia GPU 和 1 个 Intel GPU)

这是我更换硬盘后的新系统。nvidia-dirver-510 与以前的系统中的 2060 和 3060 配合良好。

$ lspci|grep -i vga
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation Device 2504 (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation Device 1f03 (rev a1)
$ uname -a
Linux CMPLTRTOK-U20 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ apt list linux-headers-$(uname -r)
Listing... Done
linux-headers-5.15.0-72-generic/focal-updates,focal-security,now 5.15.0-72.79~20.04.1 amd64 [installed,automatic]
$ apt list nvidia-driver-*
Listing... Done
nvidia-driver-390/focal-updates,focal-security 390.157-0ubuntu0.20.04.1 amd64
nvidia-driver-390/focal-updates,focal-security 390.157-0ubuntu0.20.04.1 i386
nvidia-driver-418-server/focal-updates,focal-security 418.226.00-0ubuntu0.20.04.2 amd64
nvidia-driver-418/focal 430.50-0ubuntu3 amd64
nvidia-driver-430/focal-updates,focal-security 440.100-0ubuntu0.20.04.1 amd64
nvidia-driver-435/focal-updates 455.45.01-0ubuntu0.20.04.1 amd64
nvidia-driver-440-server/focal-updates,focal-security 450.236.01-0ubuntu0.20.04.1 amd64
nvidia-driver-440/focal-updates,focal-security 450.119.03-0ubuntu0.20.04.1 amd64
nvidia-driver-450-server/focal-updates,focal-security 450.236.01-0ubuntu0.20.04.1 amd64
nvidia-driver-450/focal-updates,focal-security 460.91.03-0ubuntu0.20.04.1 amd64
nvidia-driver-455/focal-updates,focal-security 460.91.03-0ubuntu0.20.04.1 amd64
nvidia-driver-460-server/focal-updates,focal-security 470.182.03-0ubuntu0.20.04.1 amd64
nvidia-driver-460/focal-updates,focal-security 470.182.03-0ubuntu0.20.04.1 amd64
nvidia-driver-465/focal-updates,focal-security 470.182.03-0ubuntu0.20.04.1 amd64
nvidia-driver-470-server/focal-updates,focal-security 470.182.03-0ubuntu0.20.04.1 amd64
nvidia-driver-470/focal-updates,focal-security 470.182.03-0ubuntu0.20.04.1 amd64
nvidia-driver-495/focal-updates,focal-security 510.108.03-0ubuntu0.20.04.1 amd64
nvidia-driver-510-server/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 amd64
nvidia-driver-510/focal-updates,focal-security,now 510.108.03-0ubuntu0.20.04.1 amd64 [installed]
nvidia-driver-515-open/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 amd64
nvidia-driver-515-server/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 amd64
nvidia-driver-515/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 amd64
nvidia-driver-520-open/focal-updates,focal-security 525.116.04-0ubuntu0.20.04.1 amd64
nvidia-driver-520/focal-updates,focal-security 525.116.04-0ubuntu0.20.04.1 amd64
nvidia-driver-525-open/focal-updates,focal-security 525.116.04-0ubuntu0.20.04.1 amd64
nvidia-driver-525-server/focal-updates,focal-security 525.105.17-0ubuntu0.20.04.1 amd64
nvidia-driver-525/focal-updates,focal-security 525.116.04-0ubuntu0.20.04.1 amd64
nvidia-driver-530-open/focal-updates,focal-security 530.41.03-0ubuntu0.20.04.2 amd64
nvidia-driver-530/focal-updates,focal-security 530.41.03-0ubuntu0.20.04.2 amd64

$ apt list libnvidia*-510*
Listing... Done
libnvidia-cfg1-510-server/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 amd64
libnvidia-cfg1-510/focal-updates,focal-security,now 510.108.03-0ubuntu0.20.04.1 amd64 [installed,automatic]
libnvidia-common-510-server/focal-updates,focal-updates,focal-security,focal-security 515.105.01-0ubuntu0.20.04.1 all
libnvidia-common-510/focal-updates,focal-updates,focal-security,focal-security,now 510.108.03-0ubuntu0.20.04.1 all [installed,automatic]
libnvidia-compute-510-server/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 amd64
libnvidia-compute-510-server/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 i386
libnvidia-compute-510/focal-updates,focal-security,now 510.108.03-0ubuntu0.20.04.1 amd64 [installed,automatic]
libnvidia-compute-510/focal-updates,focal-security,now 510.108.03-0ubuntu0.20.04.1 i386 [installed,automatic]
libnvidia-decode-510-server/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 amd64
libnvidia-decode-510-server/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 i386
libnvidia-decode-510/focal-updates,focal-security,now 510.108.03-0ubuntu0.20.04.1 amd64 [installed,automatic]
libnvidia-decode-510/focal-updates,focal-security,now 510.108.03-0ubuntu0.20.04.1 i386 [installed,automatic]
libnvidia-encode-510-server/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 amd64
libnvidia-encode-510-server/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 i386
libnvidia-encode-510/focal-updates,focal-security,now 510.108.03-0ubuntu0.20.04.1 amd64 [installed,automatic]
libnvidia-encode-510/focal-updates,focal-security,now 510.108.03-0ubuntu0.20.04.1 i386 [installed,automatic]
libnvidia-extra-510-server/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 amd64
libnvidia-extra-510-server/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 i386
libnvidia-extra-510/focal-updates,focal-security,now 510.108.03-0ubuntu0.20.04.1 amd64 [installed,automatic]
libnvidia-extra-510/focal-updates,focal-security 510.108.03-0ubuntu0.20.04.1 i386
libnvidia-fbc1-510-server/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 amd64
libnvidia-fbc1-510-server/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 i386
libnvidia-fbc1-510/focal-updates,focal-security,now 510.108.03-0ubuntu0.20.04.1 amd64 [installed,automatic]
libnvidia-fbc1-510/focal-updates,focal-security,now 510.108.03-0ubuntu0.20.04.1 i386 [installed,automatic]
libnvidia-gl-510-server/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 amd64
libnvidia-gl-510-server/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 i386
libnvidia-gl-510/focal-updates,focal-security,now 510.108.03-0ubuntu0.20.04.1 amd64 [installed,automatic]
libnvidia-gl-510/focal-updates,focal-security,now 510.108.03-0ubuntu0.20.04.1 i386 [installed,automatic]
libnvidia-nscq-510/focal-updates,focal-security 515.105.01-0ubuntu0.20.04.1 amd64
$ nvidia-smi -q

==============NVSMI LOG==============

Timestamp                                 : Thu Jun 22 17:05:49 2023
Driver Version                            : 510.108.03
CUDA Version                              : 11.6

Attached GPUs                             : 1
GPU 00000000:01:00.0
    Product Name                          : NVIDIA GeForce RTX 3060
    Product Brand                         : GeForce
    Product Architecture                  : Ampere
    Display Mode                          : Disabled
    Display Active                        : Disabled
    Persistence Mode                      : Disabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : N/A
    GPU UUID                              : GPU-5f52334c-826a-7900-978b-7fcb937de6ea
    Minor Number                          : 0
    VBIOS Version                         : 94.06.2F.00.9A
    MultiGPU Board                        : No
    Board ID                              : 0x100
    GPU Part Number                       : N/A
    Module ID                             : 0
    Inforom Version
        Image Version                     : G001.0000.03.03
        OEM Object                        : 2.0
        ECC Object                        : N/A
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x01
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x250410DE
        Bus Id                            : 00000000:01:00.0
        Sub System Id                     : 0x397D1462
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 1
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 0 KB/s
    Fan Speed                             : 0 %
    Performance State                     : P8
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 12288 MiB
        Reserved                          : 235 MiB
        Used                              : 3 MiB
        Free                              : 12049 MiB
    BAR1 Memory Usage
        Total                             : 256 MiB
        Used                              : 3 MiB
        Free                              : 253 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    Ecc Mode
        Current                           : N/A
        Pending                           : N/A
    ECC Errors
        Volatile
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
        Aggregate
            SRAM Correctable              : N/A
            SRAM Uncorrectable            : N/A
            DRAM Correctable              : N/A
            DRAM Uncorrectable            : N/A
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : 43 C
        GPU Shutdown Temp                 : 98 C
        GPU Slowdown Temp                 : 95 C
        GPU Max Operating Temp            : 93 C
        GPU Target Temperature            : 83 C
        Memory Current Temp               : N/A
        Memory Max Operating Temp         : N/A
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 15.30 W
        Power Limit                       : 170.00 W
        Default Power Limit               : 170.00 W
        Enforced Power Limit              : 170.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 170.00 W
    Clocks
        Graphics                          : 210 MHz
        SM                                : 210 MHz
        Memory                            : 405 MHz
        Video                             : 555 MHz
    Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Default Applications Clocks
        Graphics                          : N/A
        Memory                            : N/A
    Max Clocks
        Graphics                          : 2130 MHz
        SM                                : 2130 MHz
        Memory                            : 7501 MHz
        Video                             : 1950 MHz
    Max Customer Boost Clocks
        Graphics                          : N/A
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : 662.500 mV
    Processes
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 1151
            Type                          : G
            Name                          : /usr/bin/gnome-shell
            Used GPU Memory               : 2 MiB

答案1

我不知道为什么,但是在我发布这个问题后,系统重启两次后它就起作用了。

以下是详细信息,我写下来希望它能为其他人提供线索,也希望专家能解释根本原因。(我提供了一些有关 nvtop 安装的详细信息,我相信以前这并不重要,但现在猜想很重要。)

细节:

(1) 我有一台 Ubuntu 20.04,配备 RTX 3060(PCI-E 3.0x16)和英特尔集成显卡。后来我通过 PCI-E 2.0x4 将 RTX 2060 连接到系统。

(2)大约一周前,我在立即重启后通过 apt 安装了 nvidia-driver-510。

(3) 我昨天安装了 nvtop,以便在验证 GPU 状态之前对其进行监控。安装 nvtop 时,我从 git 获取源代码并进行构建。我根据 nvtop cmake 的构建错误通过 apt 安装了以下软件包:

libsystemd-dev

libdrm-dev

libgtest-开发环境

libudev-dev

(4) 我通过运行 pytorch (torch-1.10.1+cu113-p38-linux) 的分布式数据并行来验证 GPU。但是我发现 pytorch 找不到任何 GPU。此时,nvtop 只找到 Intel GPU 和 RTX 3060,nvidia-smi 只找到 RTX 3060,但命令 lspci 找到 Intel GPU 和两个 nvidia GPU。然后我发了这个问题,关机,去睡觉。

(5)(第一次重启。)我从睡眠状态唤醒后打开了系统。nvtop 找到了 Intel GPU 和 2 个 nvidia GPU,而 nvidia-smi 出现以下错误:

nvidia-smi -q
Unable to determine the device handle for GPU 0000:02:00.0: Unknown Error

(6) 我像以前一样通过 pytorch 进行了测试,它抱怨 CUDA 错误。它卡住了,我无法中止它,甚至无法用 -9 杀死它。它占用了我 CPU 的一个核心的 100%。即使我运行了“sudo poweroff”,系统仍然处于开启状态。然后我物理关闭了它。

(7)(第二次重启。)在我安装 nvtop 后第二次重启。nvtop、nvidia-smi 和 pytorch 代码都正常。我通过运行 pytorch 代码和使用 nvtop 监控来验证它们。它有效!

相关内容