我一直遇到系统进入只读状态的问题。我可以执行其他问题中描述的修复(例如只读文件系统错误),这样可以修复问题,大约一天。但是每天启动时,问题都会再次出现。
我的问题不是如何修复它,而是有没有什么方法可以诊断出哪个特定的软件可能导致了这个问题?
编辑:
在下面添加 syslog 输出,通过对这些错误消息进行一些搜索,我的理解是没有什么特别的事情发生;它只是反复无法找到好的块(即快要坏掉的驱动器)??
Jun 26 10:03:53 mal NetworkManager[1298]: <info> [1593180233.9376] manager: NetworkManager state is now CONNECTED_GLOBAL
Jun 26 10:04:03 mal systemd[1]: NetworkManager-dispatcher.service: Succeeded.
Jun 26 10:04:24 mal anacron[1290]: Job `cron.daily' terminated
Jun 26 10:04:24 mal anacron[1290]: Normal exit (1 job run)
Jun 26 10:04:24 mal systemd[1]: anacron.service: Succeeded.
Jun 26 10:05:01 mal CRON[7965]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jun 26 10:07:00 mal systemd-resolved[1282]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Jun 26 10:07:07 mal kernel: [ 529.845356] ata1.00: READ LOG DMA EXT failed, trying PIO
Jun 26 10:07:07 mal kernel: [ 529.846653] ata1.00: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x0
Jun 26 10:07:07 mal kernel: [ 529.846656] ata1.00: irq_stat 0x40000008
Jun 26 10:07:07 mal kernel: [ 529.846659] ata1.00: failed command: READ FPDMA QUEUED
Jun 26 10:07:07 mal kernel: [ 529.846663] ata1.00: cmd 60/08:c0:e8:47:91/00:00:14:00:00/40 tag 24 ncq dma 4096 in
Jun 26 10:07:07 mal kernel: [ 529.846663] res 41/40:00:ec:47:91/00:00:14:00:00/00 Emask 0x409 (media error) <F>
Jun 26 10:07:07 mal kernel: [ 529.846665] ata1.00: status: { DRDY ERR }
Jun 26 10:07:07 mal kernel: [ 529.846666] ata1.00: error: { UNC }
Jun 26 10:07:07 mal kernel: [ 529.852714] ata1.00: configured for UDMA/133
Jun 26 10:07:07 mal kernel: [ 529.852746] sd 0:0:0:0: [sda] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 26 10:07:07 mal kernel: [ 529.852749] sd 0:0:0:0: [sda] tag#24 Sense Key : Medium Error [current]
Jun 26 10:07:07 mal kernel: [ 529.852752] sd 0:0:0:0: [sda] tag#24 Add. Sense: Unrecovered read error - auto reallocate failed
Jun 26 10:07:07 mal kernel: [ 529.852755] sd 0:0:0:0: [sda] tag#24 CDB: Read(10) 28 00 14 91 47 e8 00 00 08 00
Jun 26 10:07:07 mal kernel: [ 529.852758] blk_update_request: I/O error, dev sda, sector 345065452 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jun 26 10:07:07 mal kernel: [ 529.852779] ata1: EH complete
Jun 26 10:07:07 mal kernel: [ 529.852805] EXT4-fs warning (device sda2): htree_dirblock_to_tree:997: inode #10763504: lblock 0: comm duplicity: error -5 reading directory block
Jun 26 10:07:07 mal kernel: [ 529.950532] ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x0
Jun 26 10:07:07 mal kernel: [ 529.950534] ata1.00: irq_stat 0x40000008
Jun 26 10:07:07 mal kernel: [ 529.950536] ata1.00: failed command: READ FPDMA QUEUED
Jun 26 10:07:07 mal kernel: [ 529.950538] ata1.00: cmd 60/08:00:e8:47:91/00:00:14:00:00/40 tag 0 ncq dma 4096 in
Jun 26 10:07:07 mal kernel: [ 529.950538] res 41/40:00:ec:47:91/00:00:14:00:00/00 Emask 0x409 (media error) <F>
Jun 26 10:07:07 mal kernel: [ 529.950539] ata1.00: status: { DRDY ERR }
Jun 26 10:07:07 mal kernel: [ 529.950540] ata1.00: error: { UNC }
Jun 26 10:07:07 mal kernel: [ 529.956788] ata1.00: configured for UDMA/133
Jun 26 10:07:07 mal kernel: [ 529.956808] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 26 10:07:07 mal kernel: [ 529.956810] sd 0:0:0:0: [sda] tag#0 Sense Key : Medium Error [current]
Jun 26 10:07:07 mal kernel: [ 529.956811] sd 0:0:0:0: [sda] tag#0 Add. Sense: Unrecovered read error - auto reallocate failed
Jun 26 10:07:07 mal kernel: [ 529.956813] sd 0:0:0:0: [sda] tag#0 CDB: Read(10) 28 00 14 91 47 e8 00 00 08 00
Jun 26 10:07:07 mal kernel: [ 529.956814] blk_update_request: I/O error, dev sda, sector 345065452 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jun 26 10:07:07 mal kernel: [ 529.956838] ata1: EH complete
Jun 26 10:07:07 mal kernel: [ 529.956840] EXT4-fs error (device sda2): __ext4_find_entry:1531: inode #10763504: comm deja-dup: reading directory lblock 0
6月29日更新
附注:自从我从桌面拔下 Kindle 以来,这个错误已经 3 天没有发生过了。我不知道这是否与此有关;但只是想在这里提一下,让那些更了解情况的人知道。
从实时 CD 启动:检查驱动器的输出
此外,smartctl 输出。我似乎无法真正-t long
完成测试,但短测试似乎没问题。
ubuntu@ubuntu:~$ sudo smartctl -a /dev/sda2
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-26-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Marvell based SanDisk SSDs
Device Model: SanDisk SSD PLUS 1000GB
Serial Number: 190532802370
LU WWN Device Id: 5 001b44 8b92df610
Firmware Version: UH5100RL
User Capacity: 1,000,207,286,272 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Jun 29 23:21:35 2020 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x15) SMART execute Offline immediate.
No Auto Offline data collection support.
Abort Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 182) minutes.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 3921
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 353
165 Total_Write/Erase_Count 0x0032 100 100 000 Old_age Always - 1045
166 Min_W/E_Cycle 0x0032 100 100 --- Old_age Always - 4
167 Min_Bad_Block/Die 0x0032 100 100 --- Old_age Always - 0
168 Maximum_Erase_Cycle 0x0032 100 100 --- Old_age Always - 14
169 Total_Bad_Block 0x0032 100 100 --- Old_age Always - 1688
170 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 Avg_Write/Erase_Count 0x0032 100 100 000 Old_age Always - 4
174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 17
184 End-to-End_Error 0x0032 100 100 --- Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 34
188 Command_Timeout 0x0032 100 100 --- Old_age Always - 0
194 Temperature_Celsius 0x0022 065 055 000 Old_age Always - 35 (Min/Max 17/55)
199 SATA_CRC_Error 0x0032 100 100 --- Old_age Always - 0
230 Perc_Write/Erase_Count 0x0032 100 100 000 Old_age Always - 610 80 610
232 Perc_Avail_Resrvd_Space 0x0033 100 100 005 Pre-fail Always - 100
233 Total_NAND_Writes_GiB 0x0032 100 100 --- Old_age Always - 4201
234 Perc_Write/Erase_Ct_BC 0x0032 100 100 000 Old_age Always - 16347
241 Total_Writes_GiB 0x0030 100 100 000 Old_age Offline - 6893
242 Total_Reads_GiB 0x0030 100 100 000 Old_age Offline - 3558
244 Thermal_Throttle 0x0032 000 100 --- Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 3921 -
# 2 Extended offline Fatal or unknown error 90% 3920 0
# 3 Short offline Completed without error 00% 3919 -
# 4 Extended offline Fatal or unknown error 90% 3889 0
# 5 Extended offline Self-test routine in progress 90% 3888 -
# 6 Extended offline Self-test routine in progress 90% 3888 -
# 7 Extended offline Self-test routine in progress 90% 3888 -
# 8 Extended offline Self-test routine in progress 90% 3888 -
# 9 Extended offline Self-test routine in progress 90% 3888 -
#10 Extended offline Self-test routine in progress 90% 3888 -
#11 Extended offline Self-test routine in progress 90% 3888 -
#12 Extended offline Self-test routine in progress 90% 3888 -
#13 Extended offline Self-test routine in progress 90% 3888 -
#14 Extended offline Self-test routine in progress 90% 3888 -
#15 Extended offline Self-test routine in progress 90% 3888 -
#16 Extended offline Self-test routine in progress 90% 3888 -
#17 Extended offline Self-test routine in progress 90% 3888 -
#18 Extended offline Self-test routine in progress 90% 3888 -
#19 Extended offline Self-test routine in progress 90% 3888 -
#20 Extended offline Self-test routine in progress 90% 3888 -
#21 Extended offline Self-test routine in progress 90% 3888 -
Selective Self-tests/Logging not supported
答案1
文件系统检查
- 以“试用 Ubuntu”模式启动 Ubuntu Live DVD/USB
terminal
按Ctrl+ Alt+打开窗口T- 类型
sudo fdisk -l
- 识别“Linux 文件系统”的 /dev/sdXX 设备名称
- 输入
sudo fsck -f /dev/sda2
,替换sdXX
为您之前找到的数字 fsck
如果有错误则重复命令- 类型
reboot
全国资格考试
您遇到了 NCQ 错误。扇区 345065452 上也可能存在错误。
Jun 26 10:07:07 mal kernel: [ 529.950536] ata1.00: failed command: READ FPDMA QUEUED
Jun 26 10:07:07 mal kernel: [ 529.950538] ata1.00: cmd 60/08:00:e8:47:91/00:00:14:00:00/40 tag 0 ncq dma 4096 in
Jun 26 10:07:07 mal kernel: [ 529.950538] res 41/40:00:ec:47:91/00:00:14:00:00/00 Emask 0x409 (media error) <F>
Jun 26 10:07:07 mal kernel: [ 529.950539] ata1.00: status: { DRDY ERR }
Jun 26 10:07:07 mal kernel: [ 529.950540] ata1.00: error: { UNC }
Jun 26 10:07:07 mal kernel: [ 529.852758] blk_update_request: I/O error, dev sda, sector 345065452 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
本机命令队列 (NCQ) 是串行 ATA 协议的扩展,允许硬盘驱动器内部优化接收的读写命令的执行顺序。
这样做可以修复 NCQ 错误...
编辑sudo -H gedit /etc/default/grub
并更改以下行以包含此额外参数。然后执行sudo update-grub
将更改写入磁盘。重新启动。监视器挂起,并观察/var/log/syslog
或dmesg
是否继续出现错误消息。
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash libata.force=noncq"
坏阻塞
如果NCQ 补丁之后错误仍然存在,然后......
注意:不要中止坏块扫描!
注意:不要对 SSD 造成坏块
注意:请先备份您的重要文件!
注意:这将花费很多小时
注意:您可能面临硬盘故障
在“尝试 Ubuntu”模式下启动 Ubuntu Live DVD/USB。
在terminal
...
sudo fdisk -l
# 识别所有“Linux 文件系统”分区
sudo e2fsck -fcky /dev/sdXX
# 只读测试
或者
sudo e2fsck -fccky /dev/sda2
# 非破坏性读写测试(受到推崇的)
-k 很重要,因为它会保存之前的坏块表,并将任何新的坏块添加到该表中。如果没有 -k,您将丢失所有之前的坏块信息。
-fccky 参数...
-f Force checking even if the file system seems clean.
-c This option causes e2fsck to use badblocks(8) program to do
a read-only scan of the device in order to find any bad blocks.
If any bad blocks are found, they are added to the bad block
inode to prevent them from being allocated to a file or direc‐
tory. If this option is specified twice, then the bad block scan
will be done using a non-destructive read-write test.
-k When combined with the -c option, any existing bad blocks in the
bad blocks list are preserved, and any new bad blocks found by
running badblocks(8) will be added to the existing bad blocks
list.
-y Assume an answer of `yes' to all questions; allows e2fsck to be
used non-interactively. This option may not be specified at the
same time as the -n or -p options.