无法安装 nvidia-cuda-toolkit。在 Ubuntu 22.04 中获取 unment 依赖项错误

无法安装 nvidia-cuda-toolkit。在 Ubuntu 22.04 中获取 unment 依赖项错误
The following packages have unmet dependencies:
 libcuinj64-11.5 : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or
                            libnvidia-compute-495-server (>= 495) but it is not installable or
                            libcuda.so.1 (>= 495) or
                            libcuda-11.5-1
 libnvidia-ml-dev : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or
                             libnvidia-compute-495-server (>= 495) but it is not installable or
                             libnvidia-ml.so.1 (>= 495)
 nvidia-cuda-dev : Depends: libnvidia-compute-495 (>= 495) but it is not going to be installed or
                            libnvidia-compute-495-server (>= 495) but it is not installable or
                            libcuda.so.1 (>= 495) or
                            libcuda-11.5-1
                   Recommends: libnvcuvid1 but it is not installable
E: Unable to correct problems, you have held broken packages.

答案1

从 nvidia 存储库升级到 后,我这边出现了这个问题nvidia-driver-535。我不知道是否需要这样做,但在这个过程中,我也首先摆脱了这个问题dpkg --remove-architecture i386。所以过程是:

警告!用你的大脑!

如果您的系统恰好是 i386,则下面的操作将会终止您的系统。

apt-get --allow-remove-essential  purge ".*:i386"
dpkg --remove-architecture i386

然后禁用 NVidia-Repositories(YMMV):

cd /etc/apt/sources.list.d
mv cuda-ubuntu2204-x86_64.list cuda-ubuntu2204-x86_64.list.disabled
apt update
cat cuda-ubuntu2204-x86_64.list.disabled
deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] http://HTTPS///developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /

http://HTTPS/// 是因为我apt-cacher-ng经常使用。这种方式apt-cacher-ng可以通过以下方式访问存储库HTTPS

然后清除 535 个驱动程序和 NVidia-Things 的所有痕迹:

警告!用你的大脑!

只清除不需要的内容。清除还会删除可能留下的配置。或者可能包含有价值的信息。

请注意,我是ssh在另一台计算机上执行此操作的,因为这可能也会关闭机器上的 X11:

dpkg --get-selections | grep 535
dpkg --get-selections | grep nvidia

给出

apt purge WHATEVER..

然后我清理了不再存在于存储库中的剩余内容

apt-show-versions | grep -v '/jammy' 

例如

apt purge libxnvctrl0

然后我再次从 Ubuntu 存储库重新安装了所有内容:

apt install nvidia-driver-535 nvidia-cuda-toolkit nvidia-settings

请注意,您可能还需要安装xserver-xorg-video-nvidia-535。我不需要它,该机器是 pytorch 的无头 GPU 服务器。

然后重新启动,瞧:

# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
# nvidia-smi 
Mon Sep 11 20:20:20 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:01:00.0 Off |                  Off |
|  0%   40C    P8               6W / 450W |      3MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

最后说明:

我的系统高度自动化。例如snapdkms等等initrd都是由配置管理脚本完成的,这样我就不需要自己考虑这些事情了。也许你需要一些额外的步骤来进行更新initrddkms重启,但我认为这些都是apt正确的处理方式。(但我不能确定。)

相关内容