从挂起恢复到 RAM 后,我的 Arch Linux 系统冻结并永久无响应。但是,我设法提取以下内核日志:
kernel: sas: Enter sas_scsi_recover_host busy: 2 failed: 2
kernel: sas: trying to find task 0xfff880008b5b680
kernel: sas: sas_scsi_find_task: aborting task 0xfff880008b5b680
kernel: sas: sas_scsi_find_task: task 0xfff880008b5b680 is aborted
kernel: sas: sas_eh_handle_sas_errors: task 0xfff880008b5b680 is aborted
kernel: sas: trying to find task 0xffff8804606ccb40
kernel: sas: sas_scsi_find_task: aborting task 0xffff8804606ccb40
kernel: sas: sas_scsi_find_task: task 0xffff8804606ccb40 is aborted
kernel: sas: sas_eh_handle_sas_errors: task 0xffff8804606ccb40 is aborted
kernel: sas: ata7: end_device-0:0: cmd error handler
kernel: sas: ata8: end_device-0:1: cmd error handler
kernel: sas: ata7: end_device-0:0: dev error handler
kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
kernel: ata7.00: failed command: READ VERIFY SECTOR(S)
kernel: ata7.00: cmd 40/00:01:00:00:00/00:00:00:00:00/e0 tag 11
res 40/00:48:a0:79:88/00:00:07:00:00/40 Emask 0x4 (timeout)
kernel: ata7.00: status { DRDY }
kernel: ata7: hard resetting link
kernel: sas: ata8: end_device-0:1: dev error handler
kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
kernel: ata8.00: failed command: READ VERIFY SECTOR(S)
kernel: ata8.00: cmd 40/00:01:00:00:00/00:00:00:00:00/e0 tag 11
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
kernel: ata8.00: status { DRDY }
kernel: ata8: hard resetting link
有问题的存储设备是:
$ lspci
06:00.0 SCSI storage controller: OCZ Technology Group, Inc. Device 1021 (rev 02)
$ lsblk -St
NAME HCTL TYPE VENDOR MODEL REV TRAN NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
sdc 0:0:0:0 disk ATA OCZ-REVODRIVE3 2.25 sas sdc 0 512 0 512 512 0 cfq 128 128 0B
sdd 0:0:1:0 disk ATA OCZ-REVODRIVE3 2.25 sas sdd 0 512 0 512 512 0 cfq 128 128 0B
$ lsblk -f
sdc
└─sdc1 linux_raid_member home:0 208937dc-2904-e71c-435a-9928671e07a3
└─md0 ext4 revodrive ffe9d38f-87f2-44e1-ae26-f36c910af3c5 /home
sdd
└─sdd1 linux_raid_member home:0 208937dc-2904-e71c-435a-9928671e07a3
└─md0 ext4 revodrive ffe9d38f-87f2-44e1-ae26-f36c910af3c5 /home
中的所有暂停调试模式/sys/power/pm_test
,即freezer
、devices
、platform
、processors
、core
都不会冻结系统,也不会产生这些错误消息。仅当系统在某个时间后挂起时
# echo none > /sys/power/pm_test
禁用测试挂起到 RAM 时会发生错误。
该错误是什么意思以及我可以采取什么措施来修复它?
编辑:该问题与文件系统或磁盘故障无关:
# e2fsck -cyv /dev/md0 | tee fsck.log
revodrive: Updating bad block inode.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
revodrive: ***** FILE SYSTEM WAS MODIFIED *****
241444 inodes used (1.65%, out of 14647296)
536 non-contiguous files (0.2%)
364 non-contiguous directories (0.2%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 241103/117/1
22526883 blocks used (38.46%, out of 58576896)
0 bad blocks
11 large files
219077 regular files
22022 directories
0 character device files
0 block device files
0 fifos
0 links
335 symbolic links (214 fast symbolic links)
1 socket
------------
241435 files
答案1
我有一个奇怪的错误,就像 WD 鱼子酱黑人特有的那样:
请参阅此处的内核错误报告:
https://bugzilla.kernel.org/show_bug.cgi?id=91921
当我有时间的时候,我需要 git bisect 它,因为它似乎是由内核 3.13 中的一些提交引起的。它在内核 3.12 及以下版本中工作。
尝试这个:
回声 0 > /sys/power/pm_async
然后尝试s2ram