PREEMPT_DYNAMIC 是否会导致 Ubuntu 冻结?

PREEMPT_DYNAMIC 是否会导致 Ubuntu 冻结?

我的 Ubuntu 死机了(没有响应,鼠标/键盘和桌面都显示旧时间)。这种情况每周发生一两次。我还没有找出真正的罪魁祸首,但日志包含以下错误消息:

  • 内核日志
    $ dmesg -T
    [Fri May 19 13:50:42 2023] ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
    [Fri May 19 13:50:42 2023] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
    [Fri May 19 13:50:42 2023] ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
    [Fri May 19 13:50:42 2023] No Local Variables are initialized for Method [_DSM]
    [Fri May 19 13:50:42 2023] Initialized Arguments for Method [_DSM]:  (4 arguments defined for method invocation)
    [Fri May 19 13:50:42 2023]   Arg0:   00000000ed5b63f3 <Obj>           Buffer(16) 75 0B A5 D4 C7 65 F7 46
    [Fri May 19 13:50:42 2023]   Arg1:   000000009f563d37 <Obj>           Integer 0000000000000102
    [Fri May 19 13:50:42 2023]   Arg2:   00000000aa83d0d2 <Obj>           Integer 0000000000000010
    [Fri May 19 13:50:42 2023]   Arg3:   00000000e5707ac8 <Obj>           Buffer(4) 00 00 50 4F
  • 系统日志
    $ grep -i "error" /var/log/syslog
    May 19 13:50:44 asus gnome-session[1597]: gnome-session-binary[1597]: GLib-GIO-CRITICAL: g_bus_get_sync: assertion 'error == NULL || *error == NULL' failed
    May 19 13:50:44 asus gnome-session-binary[1597]: GLib-GIO-CRITICAL: g_bus_get_sync: assertion 'error == NULL || *error == NULL' failed
    May 19 13:50:44 asus gsd-sharing[1665]: Failed to StopUnit service: GDBus.Error:org.freedesktop.DBus.Error.Spawn.ChildExited: Process org.freedesktop.systemd1 exited with status 1
    May 19 13:50:44 asus gnome-shell[1612]: Error looking up permission: GDBus.Error:org.freedesktop.portal.Error.NotFound: No entry for geolocation
    May 19 13:50:44 asus gnome-shell[1612]: JS ERROR: Failed to initialize fprintd service: Gio.IOErrorEnum: GDBus.Error:net.reactivated.Fprint.Error.NoSuchDevice: No devices available#012asyncCallback@resource:///org/gnome/gjs/modules/core/overrides/Gio.js:114:23
    May 19 13:50:45 asus gsd-sharing[1665]: Failed to StopUnit service: GDBus.Error:org.freedesktop.DBus.Error.Spawn.ChildExited: Process org.freedesktop.systemd1 exited with status 1
    May 19 13:50:45 asus hpfax: [1895]: error: Failed to create /var/spool/cups/tmp/.hplip
    May 19 13:50:50 asus /usr/libexec/gdm-x-session[2094]: #011(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
    May 19 13:50:52 asus gnome-session[2244]: gnome-session-binary[2244]: GnomeDesktop-WARNING: Could not create transient scope for PID 2379: GDBus.Error:org.freedesktop.DBus.Error.UnixProcessIdUnknown: Process with ID 2379 does not exist.
    May 19 13:50:52 asus gnome-session-binary[2244]: GnomeDesktop-WARNING: Could not create transient scope for PID 2379: GDBus.Error:org.freedesktop.DBus.Error.UnixProcessIdUnknown: Process with ID 2379 does not exist.
    May 19 13:50:52 asus gsd-color[1668]: failed to set xrandr-LG Electronics-LG HDR 4K-76144 color transform matrix: failed to load file: Error opening file /home/ravi/.local/share/icc/edid-8f7ad153e8611fa231dc1672517ef7af.icc: Permission denied
    May 19 13:50:52 asus gnome-shell[2263]: Error looking up permission: GDBus.Error:org.freedesktop.portal.Error.NotFound: No entry for geolocation
    May 19 13:50:52 asus gsd-color[1668]: failed to set xrandr-LG Electronics-LG HDR 4K-76144 color transform matrix: failed to load file: Error opening file /home/ravi/.local/share/icc/edid-8f7ad153e8611fa231dc1672517ef7af.icc: Permission denied
    May 19 13:50:53 asus gsd-sharing[1665]: Error releasing name org.gnome.SettingsDaemon.Sharing: The connection is closed
    May 19 13:50:53 asus gsd-print-notif[1671]: Error releasing name org.gnome.SettingsDaemon.PrintNotifications: The connection is closed
    May 19 13:50:53 asus gsd-rfkill[1673]: Error releasing name org.gnome.SettingsDaemon.Rfkill: The connection is closed
    May 19 13:50:53 asus gsd-screensaver[1682]: Error releasing name org.freedesktop.ScreenSaver: The connection is closed
    May 19 13:50:53 asus gsd-sound[1687]: Error releasing name org.gnome.SettingsDaemon.Sound: The connection is closed
    May 19 13:50:53 asus gsd-housekeepin[1696]: Error releasing name org.gnome.SettingsDaemon.Housekeeping: The connection is closed
    May 19 13:51:02 asus snap-store[2514]: not handling error failed for action refresh: E: Failed to fetch  #012E: Failed to fetch  #012E: Failed to fetch  #012E: Failed to fetch  #012E: Failed to fetch file:/var/cuda-repo-ubuntu2204-11-7-local/InRelease  #012E: Failed to fetch file:/var/cudnn-local-repo-ubuntu2204-  #012E: Failed to fetch  #012E: Failed to fetch  #012E: Failed to fetch  #012E: Some index files failed to download. They have been ignored, or old ones used instead.
    May 19 13:51:06 asus pulseaudio[1929]: GetManagedObjects() failed: org.freedesktop.DBus.Error.TimedOut: Failed to activate service 'org.bluez': timed out (service_start_timeout=25000ms)
    May 19 13:52:08 asus google-chrome.desktop[5432]: [5426:5490:0519/] Failed to create entity annotator: Failure to open metadata table of slice 'global', status = 'UNKNOWN: Failure to open metadata random access file with open sourced levedb '/home/ravi/.config/google-chrome/Profile 1/optimization_guide_prediction_model_downloads/fefbdd2f-f3f8-4e65-8d03-3142006f4408/global-entities_metadata', status = 'IO error: /home/ravi/.config/google-chrome/Profile 1/optimization_guide_prediction_model_downloads/fefbdd2f-f3f8-4e65-8d03-3142006f4408/global-entities_metadata: Invalid argument''. Failed to initialize SenseLevelDbModel while creating EntityAnnotator:
    May 19 13:55:18 asus google-chrome.desktop[5432]: [libprotobuf ERROR ../../third_party/protobuf/src/google/protobuf/] Can't parse message of type "safe_browsing.ClientDownloadRequest" because it is missing required fields: (cannot determine missing fields for lite message)
    May 19 14:22:21 asus google-chrome.desktop[5432]: [5426:5426:0519/] Every renderer should have at least one task provided by a primary task provider. If a "Renderer" fallback task is shown, it is a bug. If you have repro steps, please file a new bug and tag it as a dependency of
    May 19 14:32:11 asus google-chrome.desktop[5432]: [5475:5475:0519/] SharedImageManager::ProduceSkia: Trying to Produce a Skia representation from a non-existent mailbox.
    May 19 14:50:41 asus systemd[1]: Starting Process error reports when automatic reporting is enabled...
    May 19 14:50:41 asus whoopsie-upload-all[10487]: ERROR: whoopsie.path is not enabled
    May 19 14:50:41 asus systemd[1]: Failed to start Process error reports when automatic reporting is enabled.
    May 19 15:06:57 asus vlc[11117]: Qt: Session management error: Could not open network socket
    May 19 15:07:05 asus org.gnome.Nautilus[11117]: [00007f4bd8001f30] glconv_vaapi_x11 gl error: vaCreateSurfaces: attribute not supported
    May 19 15:07:05 asus org.gnome.Nautilus[11117]: [00007f4be004b840] main video output error: video output creation failed
    May 19 15:07:05 asus org.gnome.Nautilus[11117]: [00007f4becc21ff0] main decoder error: failed to create video output
    May 19 15:19:48 asus google-chrome.desktop[5432]: [5476:5482:0519/] handshake failed; returned -1, SSL error code 1, net_error -101
    May 19 15:27:10 asus google-chrome.desktop[5432]: Fontconfig error: Cannot load default config file: No such file: (null)
    May 19 15:51:18 asus gnome-shell[2263]: JS ERROR: Failed to initialize fprintd service: Gio.IOErrorEnum: GDBus.Error:net.reactivated.Fprint.Error.NoSuchDevice: No devices available#012asyncCallback@resource:///org/gnome/gjs/modules/core/overrides/Gio.js:114:23
    May 19 16:25:03 asus tracker-miner-f[1996]: SQLite error: database disk image is malformed (errno: Bad file descriptor)

根据 Google 搜索,AE_ALREADY_EXISTS如果显卡驱动太旧,就会报错。但是我在 Ubuntu 上用的是新驱动。请看下面:

  • NVIDIA 实用程序
    $ nvidia-smi 
    Fri May 19 16:25:15 2023       
    | NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
    |  0%   50C    P3    41W / 350W |    959MiB / 24576MiB |      4%      Default |
    |                               |                      |                  N/A |
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |    0   N/A  N/A      2094      G   /usr/lib/xorg/Xorg                508MiB |
    |    0   N/A  N/A      2263      G   /usr/bin/gnome-shell               61MiB |
    |    0   N/A  N/A      5035      G   ...RendererForSitePerProcess       99MiB |
    |    0   N/A  N/A      5475      G   ...025141959637500949,262144      286MiB |
  • Cuda 编译器
    $ nvcc --version
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2022 NVIDIA Corporation
    Built on Wed_Jun__8_16:49:14_PDT_2022
    Cuda compilation tools, release 11.7, V11.7.99
    Build cuda_11.7.r11.7/compiler.31442593_0
  • GPU 名称
    $ nvidia-smi --query-gpu=name --format=csv,noheader
    NVIDIA GeForce RTX 3090

令人惊讶的是,操作系统使用的PREEMPT_DYNAMIC是 而不是SMP。但是,我没有执行任何内核自定义。因此我预计它是 Ubuntu 中的默认设置,即SMP。请参见下文:

  • 内核信息
    $ uname -a
    Linux asus 5.19.0-41-generic #42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
  • 操作系统信息
    $ lsb_release -a
    No LSB modules are available.
    Distributor ID: Ubuntu
    Description:    Ubuntu 22.04.2 LTS
    Release:    22.04
    Codename:   jammy
  • GCC 信息
    $ gcc --version
    gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
    Copyright (C) 2021 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO

我正在寻找一些建议来防止我的 Ubuntu 冻结。我想知道设置是否PREEMPT_DYNAMIC是导致此问题的原因。
