16.04 使用 NVME 磁盘暂停恢复失败

16.04 使用 NVME 磁盘暂停恢复失败

我为此奋斗了一周+阅读了约 100 篇文章,但仍然无法解决。

ThinkPad P50 上的 Kubuntu 16.04 带有两个 NVME 磁盘,位于 raid1 + nvidia M1000M(使用 nouveau)。由于 grub 无法安装在 raid 中的 NVME 磁盘上(计算偏移量时出现错误),我从第一个磁盘的小物理分区启动,然后使用另一个 raid1 加密分区来存储 /home 和其他分区。

我的问题是挂起不起作用。当我尝试:pm-suspend,或 echo mem > /sys/power/state 或 systemclt suspend

它工作正常(不幸的是,并非总是如此,但总是与 /sys/power/state 一起),但当我尝试从挂起状态返回时,我甚至得到了 GUI,但它挂起了。我可以切换 ALT+CTRL+F1,登录,然后看到 raid1 故障和大量中断,导致 LoadAverage 增长到 10+。进一步检查 syslog 表明,所有组件都从挂起状态返回,但主磁盘除外,我确实有以下错误:

Nov 19 11:08:47 arrakis kernel: [  159.002849] thermal thermal_zone2: failed to read out thermal zone (-5)
Nov 19 11:09:16 arrakis kernel: [  188.023836] nvme nvme0: I/O 135 QID 2 timeout, aborting
Nov 19 11:09:16 arrakis kernel: [  188.024027] nvme nvme0: Abort status: 0x0
Nov 19 11:09:16 arrakis kernel: [  188.055867] nvme nvme1: I/O 66 QID 1 timeout, aborting
Nov 19 11:09:16 arrakis kernel: [  188.057419] nvme nvme1: Abort status: 0x0
Nov 19 11:09:46 arrakis mdadm[978]: Fail event detected on md device /dev/md4, component device /dev/nvme0n1p4
Nov 19 11:09:46 arrakis kernel: [  218.041194] nvme nvme0: I/O 135 QID 2 timeout, reset controller
Nov 19 11:09:46 arrakis kernel: [  218.041564] nvme nvme0: completing aborted command with status: fffffffc
Nov 19 11:09:46 arrakis kernel: [  218.041569] blk_update_request: I/O error, dev nvme0n1, sector 123734032
Nov 19 11:09:46 arrakis kernel: [  218.041594] md: super_written gets error=-5
Nov 19 11:09:46 arrakis kernel: [  218.041599] md/raid1:md4: Disk failure on nvme0n1p4, disabling device.
Nov 19 11:09:46 arrakis kernel: [  218.041599] md/raid1:md4: Operation continuing on 1 devices.
Nov 19 11:09:47 arrakis kernel: [  219.065321] nvme nvme1: I/O 66 QID 1 timeout, reset controller
Nov 19 11:09:47 arrakis kernel: [  219.065864] nvme nvme1: completing aborted command with status: fffffffc
Nov 19 11:09:47 arrakis kernel: [  219.065869] blk_update_request: I/O error, dev nvme1n1, sector 123734032
Nov 19 11:09:47 arrakis kernel: [  219.065894] md: super_written gets error=-5
Nov 19 11:09:47 arrakis kernel: [  219.195902] nvme nvme1: async event result 00010000

/cat/proc/mdstat 还确认一个磁盘已从 raid1 (_U) 中移除。

尝试过的内核:4.8.2、4.8、4.6、默认(4.4)。尝试过的内核选项:nomodeset(实际上它无法与 nouveau 配合使用)、noapic、nolapic(已删除,无法用它启动)、acpi_osi=Linux(也是空的)。

问题还是一样。BIOS 升级到最新版本(当前版本)。

我是否应该假设 NVME 磁盘在 Linux 中还没有得到很好的支持?在 Apple macosx 下,nvme 磁盘的暂停工作正常,但我那里只有一个 - 所以也许这与 mdadm/raid 有关?

请帮忙。

谢谢,Michal


根据要求我正在添加日志。

我已于 9:24:41 执行了 pm-suspend。之后屏幕一片空白,只有硬重置才有用。

pm-暂停.log:

sob, 19 lis 2016, 10:45:11 CET: performing suspend
Initial commandline parameters: 
nie, 20 lis 2016, 09:24:41 CET: Running hooks for suspend.
Running hook /usr/lib/pm-utils/sleep.d/000kernel-change suspend suspend:
/usr/lib/pm-utils/sleep.d/000kernel-change suspend suspend: success.

Running hook /usr/lib/pm-utils/sleep.d/000record-status suspend suspend:
/usr/lib/pm-utils/sleep.d/000record-status suspend suspend: success.

Running hook /usr/lib/pm-utils/sleep.d/00logging suspend suspend:
Linux arrakis 4.8.2-040802-generic #201610161339 SMP Sun Oct 16 17:41:46 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Module                  Size  Used by
ctr                    16384  6
ccm                    20480  3
vmnet                  53248  13
fuse                   98304  3
vmw_vsock_vmci_transport    28672  0
vsock                  36864  1 vmw_vsock_vmci_transport
vmw_vmci               69632  1 vmw_vsock_vmci_transport
nls_utf8               16384  0
vmmon                  86016  0
cifs                  675840  0
dns_resolver           16384  1 cifs
fscache                61440  1 cifs
ipt_MASQUERADE         16384  7
nf_nat_masquerade_ipv4    16384  1 ipt_MASQUERADE
iptable_nat            16384  1
nf_conntrack_ipv4      20480  1
nf_defrag_ipv4         16384  1 nf_conntrack_ipv4
nf_nat_ipv4            16384  1 iptable_nat
nf_nat                 28672  2 nf_nat_masquerade_ipv4,nf_nat_ipv4
nf_conntrack          114688  4 nf_conntrack_ipv4,nf_nat_masquerade_ipv4,nf_nat_ipv4,nf_nat
iptable_filter         16384  0
ip_tables              24576  2 iptable_filter,iptable_nat
x_tables               36864  3 ip_tables,iptable_filter,ipt_MASQUERADE
tun                    28672  2
binfmt_misc            20480  1
dm_crypt               24576  1
algif_skcipher         20480  0
af_alg                 16384  1 algif_skcipher
arc4                   16384  2
dm_mod                114688  3 dm_crypt
intel_rapl             20480  0
x86_pkg_temp_thermal    16384  0
intel_powerclamp       16384  0
coretemp               16384  0
kvm_intel             192512  0
kvm                   593920  1 kvm_intel
irqbypass              16384  1 kvm
crct10dif_pclmul       16384  0
iwlmvm                241664  0
crc32_pclmul           16384  0
mac80211              663552  1 iwlmvm
snd_hda_codec_realtek    86016  1
snd_hda_codec_generic    69632  1 snd_hda_codec_realtek
ghash_clmulni_intel    16384  0
uvcvideo               90112  0
videobuf2_vmalloc      16384  1 uvcvideo
videobuf2_memops       16384  1 videobuf2_vmalloc
videobuf2_v4l2         24576  1 uvcvideo
snd_seq_midi           16384  0
aesni_intel           167936  9
snd_seq_midi_event     16384  1 snd_seq_midi
videobuf2_core         40960  2 uvcvideo,videobuf2_v4l2
snd_rawmidi            32768  1 snd_seq_midi
snd_hda_intel          36864  3
aes_x86_64             20480  1 aesni_intel
lrw                    16384  1 aesni_intel
iwlwifi               147456  1 iwlmvm
gf128mul               16384  1 lrw
snd_hda_codec         135168  3 snd_hda_intel,snd_hda_codec_generic,snd_hda_codec_realtek
glue_helper            16384  1 aesni_intel
ablk_helper            16384  1 aesni_intel
videodev              180224  3 uvcvideo,videobuf2_core,videobuf2_v4l2
cryptd                 24576  4 ablk_helper,ghash_clmulni_intel,aesni_intel
efi_pstore             16384  0
rtsx_pci_ms            20480  0
snd_hda_core           81920  4 snd_hda_intel,snd_hda_codec,snd_hda_codec_generic,snd_hda_codec_realtek
nls_iso8859_1          16384  1
joydev                 20480  0
media                  40960  2 uvcvideo,videodev
thinkpad_acpi          86016  1
intel_cstate           20480  0
intel_rapl_perf        16384  0
serio_raw              16384  0
efivars                20480  1 efi_pstore
nvram                  16384  1 thinkpad_acpi
snd_hwdep              16384  1 snd_hda_codec
memstick               20480  1 rtsx_pci_ms
snd_seq                65536  2 snd_seq_midi_event,snd_seq_midi
cfg80211              589824  3 iwlmvm,iwlwifi,mac80211
snd_pcm               110592  3 snd_hda_intel,snd_hda_codec,snd_hda_core
snd_seq_device         16384  3 snd_seq,snd_rawmidi,snd_seq_midi
snd_timer              32768  2 snd_seq,snd_pcm
mei_me                 36864  0
rfkill                 24576  6 thinkpad_acpi,cfg80211
snd                    86016  17 snd_hda_intel,snd_hwdep,snd_seq,snd_hda_codec,snd_timer,thinkpad_acpi,snd_rawmidi,snd_hda_codec_generic,snd_seq_device,snd_hda_codec_realtek,snd_pcm
mei                   102400  1 mei_me
shpchp                 36864  0
battery                16384  0
ac                     16384  0
soundcore              16384  1 snd
tpm_crb                16384  0
evdev                  24576  25
parport_pc             28672  0
ppdev                  20480  0
lp                     20480  0
parport                49152  3 lp,parport_pc,ppdev
efivarfs               16384  1
autofs4                40960  2
ext4                  589824  2
crc16                  16384  1 ext4
jbd2                  110592  1 ext4
fscrypto               28672  1 ext4
mbcache                16384  3 ext4
raid10                 49152  0
raid456               110592  0
async_raid6_recov      20480  1 raid456
async_memcpy           16384  2 raid456,async_raid6_recov
async_pq               16384  2 raid456,async_raid6_recov
async_xor              16384  3 async_pq,raid456,async_raid6_recov
async_tx               16384  5 async_xor,async_pq,raid456,async_memcpy,async_raid6_recov
xor                    24576  1 async_xor
raid6_pq              102400  3 async_pq,raid456,async_raid6_recov
libcrc32c              16384  1 raid456
crc32c_generic         16384  0
raid0                  20480  0
multipath              16384  0
linear                 16384  0
hid_generic            16384  0
usbhid                 53248  0
hid                   118784  3 hid_generic,usbhid
raid1                  36864  1
md_mod                131072  7 raid1,raid10,multipath,linear,raid0,raid456
rtsx_pci_sdmmc         24576  0
mmc_core              147456  1 rtsx_pci_sdmmc
nouveau              1544192  6
mxm_wmi                16384  1 nouveau
i2c_algo_bit           16384  1 nouveau
ttm                    98304  1 nouveau
drm_kms_helper        167936  1 nouveau
syscopyarea            16384  1 drm_kms_helper
crc32c_intel           24576  1
e1000e                245760  0
sysfillrect            16384  1 drm_kms_helper
psmouse               131072  0
sysimgblt              16384  1 drm_kms_helper
fb_sys_fops            16384  1 drm_kms_helper
ptp                    20480  1 e1000e
pps_core               16384  1 ptp
drm                   368640  15 nouveau,ttm,drm_kms_helper
nvme                   28672  5
rtsx_pci               57344  2 rtsx_pci_sdmmc,rtsx_pci_ms
ahci                   36864  0
nvme_core              53248  8 nvme
libahci                32768  1 ahci
thermal                20480  0
wmi                    16384  2 mxm_wmi,nouveau
video                  40960  2 thinkpad_acpi,nouveau
fjes                   28672  0
button                 16384  1 nouveau
              total        used        free      shared  buff/cache   available
Mem:       49367596      625024    47741464       17636     1001108    48157740
Swap:       8388604           0     8388604
/usr/lib/pm-utils/sleep.d/00logging suspend suspend: success.

Running hook /usr/lib/pm-utils/sleep.d/00powersave suspend suspend:
/usr/lib/pm-utils/sleep.d/00powersave suspend suspend: success.

Running hook /etc/pm/sleep.d/10_grub-common suspend suspend:
/etc/pm/sleep.d/10_grub-common suspend suspend: success.

Running hook /etc/pm/sleep.d/10_unattended-upgrades-hibernate suspend suspend:
/etc/pm/sleep.d/10_unattended-upgrades-hibernate suspend suspend: success.

Running hook /usr/lib/pm-utils/sleep.d/40inputattach suspend suspend:
/usr/lib/pm-utils/sleep.d/40inputattach suspend suspend: success.

Running hook /usr/lib/pm-utils/sleep.d/50unload_alx suspend suspend:
/usr/lib/pm-utils/sleep.d/50unload_alx suspend suspend: success.

Running hook /usr/lib/pm-utils/sleep.d/60_wpa_supplicant suspend suspend:
Selected interface 'p2p-dev-wlp4s0'
OK
/usr/lib/pm-utils/sleep.d/60_wpa_supplicant suspend suspend: success.

Running hook /usr/lib/pm-utils/sleep.d/75modules suspend suspend:
/usr/lib/pm-utils/sleep.d/75modules suspend suspend: not applicable.

Running hook /usr/lib/pm-utils/sleep.d/90clock suspend suspend:
/usr/lib/pm-utils/sleep.d/90clock suspend suspend: not applicable.

Running hook /usr/lib/pm-utils/sleep.d/94cpufreq suspend suspend:
/usr/lib/pm-utils/sleep.d/94cpufreq suspend suspend: success.

Running hook /usr/lib/pm-utils/sleep.d/95anacron suspend suspend:
/usr/lib/pm-utils/sleep.d/95anacron suspend suspend: success.

Running hook /usr/lib/pm-utils/sleep.d/95hdparm-apm suspend suspend:
/usr/lib/pm-utils/sleep.d/95hdparm-apm suspend suspend: not applicable.

Running hook /usr/lib/pm-utils/sleep.d/95led suspend suspend:
/usr/lib/pm-utils/sleep.d/95led suspend suspend: success.

Running hook /usr/lib/pm-utils/sleep.d/98video-quirk-db-handler suspend suspend:
Kernel modesetting video driver detected, not using quirks.
/usr/lib/pm-utils/sleep.d/98video-quirk-db-handler suspend suspend: success.

Running hook /usr/lib/pm-utils/sleep.d/99video suspend suspend:
kernel.acpi_video_flags = 0
/usr/lib/pm-utils/sleep.d/99video suspend suspend: success.

Running hook /etc/pm/sleep.d/novatel_3g_suspend suspend suspend:
/etc/pm/sleep.d/novatel_3g_suspend suspend suspend: success.

nie, 20 lis 2016, 09:24:42 CET: performing suspend

/var/log/syslog:

Nov 20 09:18:22 arrakis systemd[1]: Started CUPS Scheduler.
Nov 20 09:22:47 arrakis wpa_supplicant[1087]: wlp4s0: WPA: Group rekeying completed with 4e:5e:0c:70:fc:24 [GTK=CCMP]
Nov 20 09:24:23 arrakis systemd[1]: Started CUPS Scheduler.
Nov 20 09:24:23 arrakis org.kde.KScreen[1904]: kscreen: Primary output changed from KScreen::Output(Id: 67 , Name: "eDP-1" ) ( "eDP-1" ) to KScreen::Output(Id: 67 , Name: "eDP-1" ) ( "eDP-1" )
Nov 20 09:24:24 arrakis org.kde.KScreen[1904]: message repeated 15 times: [ kscreen: Primary output changed from KScreen::Output(Id: 67 , Name: "eDP-1" ) ( "eDP-1" ) to KScreen::Output(Id: 67 , Name: "eDP-1" ) ( "eDP-1" )]
Nov 20 09:24:41 arrakis systemd[1]: Started Run anacron jobs.
Nov 20 09:24:41 arrakis anacron[4221]: Anacron 2.3 started on 2016-11-20
Nov 20 09:24:41 arrakis anacron[4221]: Normal exit (0 jobs run)
Nov 20 09:24:41 arrakis systemd[1]: Stopped Run anacron jobs.
Nov 20 09:25:26 arrakis rsyslogd: [origin software="rsyslogd" swVersion="8.16.0" x-pid="915" x-info="http://www.rsyslog.com"] start
Nov 20 09:25:26 arrakis rsyslogd-2222: command 'KLogPermitNonKernelFacility' is currently not permitted - did you already set it via a RainerScript command (v6+ config)? [v8.16.0 try http://www.rsyslog.com/e/2222 ]

在 9:25:26 时,我们已经可以看到硬重启后的消息 - 所以系统日志中没有什么有趣的东西。这次我没有 NVME 磁盘日志错误,只有在使用 /sys/power/state 暂停时才会出现错误(然后它几乎成功了,因为我可以回到系统/GUI,但磁盘无法恢复)。

X.org 日志中也没有什么奇怪的 - 日志与后续正确重启的日志相同。

哪里出了问题?为什么 pm-suspend 会杀死我的笔记本电脑,而 echo mem > /sys/power/state 几乎可以正常工作(磁盘除外?)

谢谢,

答案1

使用 GRUB

正如本文所提到的回答,这是一个已知错误。只需编辑文件/etc/default/grub并将此参数添加acpiphp.disable=1到 grub 配置变量中GRUB_CMDLINE_LINUX_DEFAULT即可阻止 ACPI 热插拔。但是,答案的 op 指出它不一定禁用热插拔。

保存文件后,更新您的 grub 引导加载程序。

$ sudo update-grub

无 GRUB(仅限 UEFI)

如果您不使用 grub 但有 UEFI 系统,您可以使用它efibootmgr来修改现有的引导记录。首先,只需输入以下内容即可检查您的引导记录:

$ sudo efibootmgr 
BootCurrent: 0002
Timeout: 1 seconds
BootOrder: 0002,0024,0025,0026,0027,0016,0019
Boot0002* Ubuntu
Boot0016  UEFI OS
Boot0019  TS120GSSD
Boot0025* WDC WDS120G
Boot0026* debian
Boot0027* Force MP600

现在,您需要使用参数“nvme_core.default_ps_max_latency_us=0”更新 Ubuntu 启动记录。这会强制 nvme 驱动器尽快进入低功率状态,从而帮助机器进入挂起模式。但是,在某些情况下,唤醒可能会有所延迟。

要进行更改,请输入以下命令:

$ sudo efibootmgr -b 0002 -@ "nvme_core.default_ps_max_latency_us=0"

然后重启机器

$ sudo reboot

现在您可以无错误地暂停机器。

相关内容