我在 Ubuntu Server 18.10 上经常收到以下错误:
00:00:30 systemd[1]: Starting Discard unused blocks...
00:00:30 systemd[1]: Starting Rotate log files...
00:00:30 systemd[1]: Started Rotate log files.
00:01:01 kernel: ata7.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x6 frozen
00:01:01 kernel: ata7.00: failed command: SEND FPDMA QUEUED
00:01:01 kernel: ata7.00: cmd 64/01:80:00:00:00/00:00:00:00:00/a0 tag 16 ncq dma 512 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:01:01 kernel: ata7.00: status: { DRDY }
00:01:01 kernel: ata7: hard resetting link
00:01:01 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:01:01 kernel: ata7.00: configured for UDMA/133
00:01:01 kernel: ata7.00: device reported invalid CHS sector 0
00:01:01 kernel: ata7: EH complete
00:01:32 kernel: ata7.00: exception Emask 0x0 SAct 0x40000 SErr 0x0 action 0x6 frozen
00:01:32 kernel: ata7.00: failed command: SEND FPDMA QUEUED
00:01:32 kernel: ata7.00: cmd 64/01:90:00:00:00/00:00:00:00:00/a0 tag 18 ncq dma 512 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:01:32 kernel: ata7.00: status: { DRDY }
00:01:32 kernel: ata7: hard resetting link
00:01:32 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:01:32 kernel: ata7.00: configured for UDMA/133
00:01:32 kernel: ata7.00: device reported invalid CHS sector 0
00:01:32 kernel: ata7: EH complete
00:02:04 kernel: ata7.00: exception Emask 0x0 SAct 0x20 SErr 0x0 action 0x6 frozen
00:02:04 kernel: ata7.00: failed command: SEND FPDMA QUEUED
00:02:04 kernel: ata7.00: cmd 64/01:28:00:00:00/00:00:00:00:00/a0 tag 5 ncq dma 512 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:02:04 kernel: ata7.00: status: { DRDY }
00:02:04 kernel: ata7: hard resetting link
00:02:05 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:02:05 kernel: ata7.00: configured for UDMA/133
00:02:05 kernel: ata7.00: device reported invalid CHS sector 0
00:02:05 kernel: ata7: EH complete
00:02:37 kernel: INFO: task fstrim:29514 blocked for more than 120 seconds.
00:02:37 kernel: Tainted: P O 4.18.0-17-generic #18-Ubuntu
00:02:37 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
00:02:37 kernel: fstrim D 0 29514 1 0x00000000
00:02:37 kernel: Call Trace:
00:02:37 kernel: __schedule+0x29e/0x840
00:02:37 kernel: schedule+0x2c/0x80
00:02:37 kernel: schedule_timeout+0x258/0x360
00:02:04 kernel: ata7.00: cmd 64/01:28:00:00:00/00:00:00:00:00/a0 tag 5 ncq dma 512 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:02:04 kernel: ata7.00: status: { DRDY }
00:02:04 kernel: ata7: hard resetting link
00:02:05 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:02:05 kernel: ata7.00: configured for UDMA/133
00:02:05 kernel: ata7.00: device reported invalid CHS sector 0
00:02:05 kernel: ata7: EH complete
00:02:37 kernel: INFO: task fstrim:29514 blocked for more than 120 seconds.
00:02:37 kernel: Tainted: P O 4.18.0-17-generic #18-Ubuntu
00:02:37 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
00:02:37 kernel: fstrim D 0 29514 1 0x00000000
00:02:37 kernel: Call Trace:
00:02:37 kernel: __schedule+0x29e/0x840
00:02:37 kernel: schedule+0x2c/0x80
00:02:37 kernel: schedule_timeout+0x258/0x360
00:02:37 kernel: io_schedule_timeout+0x1e/0x50
00:02:37 kernel: wait_for_completion_io+0xb7/0x140
00:02:37 kernel: ? wake_up_q+0x80/0x80
00:02:37 kernel: submit_bio_wait+0x61/0x90
00:02:37 kernel: blkdev_issue_discard+0x7a/0xd0
00:02:37 kernel: ext4_trim_fs+0x5a9/0x8b0
00:02:37 kernel: ? security_file_open+0x86/0x90
00:02:37 kernel: ext4_ioctl+0xd81/0x14a0
00:02:37 kernel: ? _copy_to_user+0x2b/0x40
00:02:37 kernel: ? cp_new_stat+0x152/0x180
00:02:37 kernel: do_vfs_ioctl+0xa8/0x620
00:02:37 kernel: ? __do_sys_newfstat+0x5f/0x70
00:02:37 kernel: ksys_ioctl+0x67/0x90
00:02:37 kernel: __x64_sys_ioctl+0x1a/0x20
00:02:37 kernel: do_syscall_64+0x5a/0x110
00:02:37 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
00:02:37 kernel: RIP: 0033:0x7faba5a9e3c7
00:02:37 kernel: Code: Bad RIP value.
00:02:37 kernel: RSP: 002b:00007ffec09ede88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
00:02:37 kernel: RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007faba5a9e3c7
00:02:37 kernel: RDX: 00007ffec09ede90 RSI: 00000000c0185879 RDI: 0000000000000004
00:02:37 kernel: RBP: 0000000000000004 R08: 0000000000000001 R09: 0000000000000000
00:02:37 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000561d21106dd0
00:02:37 kernel: R13: 00007faba5663ff8 R14: 00007ffec09edfc8 R15: 0000561d21106dd0
00:02:37 kernel: ata7.00: NCQ disabled due to excessive errors
00:02:37 kernel: ata7.00: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x6 frozen
00:02:37 kernel: ata7.00: failed command: SEND FPDMA QUEUED
00:02:37 kernel: ata7.00: cmd 64/01:c0:00:00:00/00:00:00:00:00/a0 tag 24 ncq dma 512 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:02:37 kernel: ata7.00: status: { DRDY }
00:02:37 kernel: ata7: hard resetting link
00:02:38 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:02:38 kernel: ata7.00: configured for UDMA/133
00:02:38 kernel: ata7.00: device reported invalid CHS sector 0
00:02:38 kernel: ata7: EH complete
00:03:02 fstrim[29514]: /home/caillou/downloads: 891.5 GiB (957190782976 bytes) trimmed
00:03:02 fstrim[29514]: /: 212.4 GiB (228063428608 bytes) trimmed
00:03:02 systemd[1]: Started Discard unused blocks.
不幸的是我不明白它想要告诉我什么。
- 什么是
ata7.00
? - 是什么
failed command: SEND FPDMA QUEUED
意思?这是什么FPDMA
? - 这是什么
device reported invalid CHS sector 0
意思?
我怀疑这与驱动器有关,但我不知道如何调试也不知道如何解决这个问题。
以下是输出lsblk
:
sda 8:0 0 7.3T 0 disk
|-sda1 8:1 0 2G 0 part
`-sda2 8:2 0 7.3T 0 part
sdb 8:16 0 7.3T 0 disk
|-sdb1 8:17 0 2G 0 part
`-sdb2 8:18 0 7.3T 0 part
sdc 8:32 0 7.3T 0 disk
|-sdc1 8:33 0 2G 0 part
`-sdc2 8:34 0 7.3T 0 part
sdd 8:48 0 7.3T 0 disk
|-sdd1 8:49 0 2G 0 part
`-sdd2 8:50 0 7.3T 0 part
sde 8:64 0 3.7T 0 disk
|-sde1 8:65 0 2G 0 part
`-sde2 8:66 0 3.7T 0 part
sdf 8:80 0 7.3T 0 disk
|-sdf1 8:81 0 2G 0 part
`-sdf2 8:82 0 7.3T 0 part
sdg 8:96 0 931.5G 0 disk
`-sdg1 8:97 0 931.5G 0 part /home/caillou/downloads
nvme0n1 259:0 0 232.9G 0 disk
nvme1n1 259:1 0 232.9G 0 disk
|-nvme1n1p1 259:2 0 512M 0 part /boot/efi
`-nvme1n1p2 259:3 0 232.4G 0 part /
sdg
是通过 PCIe 卡连接的 mSATA SSD。sda
-sdf
是带有 ZFS 的 SATA HDD。
驱动器的详细信息:
sda
WD Red 8T,主板 SATA 连接器。sdb
WD Red 8T,主板 SATA 连接器。sdc
WD Red 8T,主板 SATA 连接器。sdd
WD Red 8T,主板 SATA 连接器。sde
WD Red 4T,主板 SATA 连接器。sdf
WD Red 8T,主板 SATA 连接器。sdg
三星 mSATA 1T,由三星便携式 SSD T5 分离而成,通过 PCIe 卡连接。nvme0n1
以及nvme1n1
三星 970 EVO,连接到主板m.2
连接器。
系统未显示其他错误迹象。此外,除日志中的这些错误外,一切似乎都按预期运行。
答案1
笔记:最好先做好备份。
您需要检查/升级三星 SSD 上的 sdg 和 nvme* 固件。
前往三星的下载页面这里并下载他们的 Samsung Magician 软件工具来帮助升级固件。其他软件更新也在那里提供。
还要检查 sdg PCIe 卡的固件升级。
使用 检查主板 BIOS sudo dmidecode -s bios-version
。然后访问制造商的网站并查找较新的 BIOS。如果有,请下载并安装。
笔记:稍后,如果仍然有问题,我们将讨论 ncq 补丁。