从挂起恢复到 RAM 后内核 ATA 异常

从挂起恢复到 RAM 后内核 ATA 异常

从挂起恢复到 RAM 后,我的 Arch Linux 系统冻结并永久无响应。但是,我设法提取以下内核日志:

kernel: sas: Enter sas_scsi_recover_host busy: 2 failed: 2
kernel: sas: trying to find task 0xfff880008b5b680
kernel: sas: sas_scsi_find_task: aborting task 0xfff880008b5b680
kernel: sas: sas_scsi_find_task: task 0xfff880008b5b680 is aborted
kernel: sas: sas_eh_handle_sas_errors: task 0xfff880008b5b680 is aborted
kernel: sas: trying to find task 0xffff8804606ccb40
kernel: sas: sas_scsi_find_task: aborting task 0xffff8804606ccb40
kernel: sas: sas_scsi_find_task: task 0xffff8804606ccb40 is aborted
kernel: sas: sas_eh_handle_sas_errors: task 0xffff8804606ccb40 is aborted
kernel: sas: ata7: end_device-0:0: cmd error handler
kernel: sas: ata8: end_device-0:1: cmd error handler
kernel: sas: ata7: end_device-0:0: dev error handler
kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
kernel: ata7.00: failed command: READ VERIFY SECTOR(S)
kernel: ata7.00: cmd 40/00:01:00:00:00/00:00:00:00:00/e0 tag 11
                 res 40/00:48:a0:79:88/00:00:07:00:00/40 Emask 0x4 (timeout)
kernel: ata7.00: status { DRDY }
kernel: ata7: hard resetting link
kernel: sas: ata8: end_device-0:1: dev error handler
kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
kernel: ata8.00: failed command: READ VERIFY SECTOR(S)
kernel: ata8.00: cmd 40/00:01:00:00:00/00:00:00:00:00/e0 tag 11
                 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
kernel: ata8.00: status { DRDY }
kernel: ata8: hard resetting link

有问题的存储设备是:

$ lspci
06:00.0 SCSI storage controller: OCZ Technology Group, Inc. Device 1021 (rev 02)

$ lsblk -St
NAME HCTL       TYPE VENDOR   MODEL             REV TRAN   NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE  RA WSAME
sdc  0:0:0:0    disk ATA      OCZ-REVODRIVE3   2.25 sas    sdc          0    512      0     512     512    0 cfq       128 128    0B
sdd  0:0:1:0    disk ATA      OCZ-REVODRIVE3   2.25 sas    sdd          0    512      0     512     512    0 cfq       128 128    0B

$ lsblk -f
sdc                                                                       
└─sdc1  linux_raid_member home:0     208937dc-2904-e71c-435a-9928671e07a3 
  └─md0 ext4              revodrive  ffe9d38f-87f2-44e1-ae26-f36c910af3c5 /home
sdd                                                                       
└─sdd1  linux_raid_member home:0     208937dc-2904-e71c-435a-9928671e07a3 
  └─md0 ext4              revodrive  ffe9d38f-87f2-44e1-ae26-f36c910af3c5 /home

中的所有暂停调试模式/sys/power/pm_test,即freezerdevicesplatformprocessorscore都不会冻结系统,也不会产生这些错误消息。仅当系统在某个时间后挂起时

# echo none > /sys/power/pm_test

禁用测试挂起到 RAM 时会发生错误。

该错误是什么意思以及我可以采取什么措施来修复它?

编辑:该问题与文件系统或磁盘故障无关:

# e2fsck -cyv /dev/md0 | tee fsck.log
revodrive: Updating bad block inode.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

revodrive: ***** FILE SYSTEM WAS MODIFIED *****

      241444 inodes used (1.65%, out of 14647296)
         536 non-contiguous files (0.2%)
         364 non-contiguous directories (0.2%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 241103/117/1
    22526883 blocks used (38.46%, out of 58576896)
           0 bad blocks
          11 large files

      219077 regular files
       22022 directories
           0 character device files
           0 block device files
           0 fifos
           0 links
         335 symbolic links (214 fast symbolic links)
           1 socket
------------
      241435 files

答案1

我有一个奇怪的错误,就像 WD 鱼子酱黑人特有的那样:

请参阅此处的内核错误报告:

https://bugzilla.kernel.org/show_bug.cgi?id=91921

当我有时间的时候,我需要 git bisect 它,因为它似乎是由内核 3.13 中的一些提交引起的。它在内核 3.12 及以下版本中工作。

尝试这个:

回声 0 > /sys/power/pm_async

然后尝试s2ram

相关内容