Ubuntu启动时间READ FPDMA错误

Ubuntu启动时间READ FPDMA错误

在过去的一周里,我的 Ubuntu 启动时间从大约 50 秒缩短到了 5 分钟。

我尝试过dmesg,以下消息在日志中多次显示。我如何解决它?

[  101.186141] ata1.00: exception Emask 0x0 SAct 0x70 SErr 0x0 action 0x0
[  101.186148] ata1.00: irq_stat 0x40000008
[  101.186153] ata1.00: failed command: READ FPDMA QUEUED
[  101.186167] ata1.00: cmd 60/08:20:f0:ab:18/00:00:64:00:00/40 tag 4 ncq 4096 in
                        res 41/40:00:f0:ab:18/00:00:64:00:00/40 Emask 0x409 (media error) <F>
[  101.186168] ata1.00: status: { DRDY ERR }
[  101.186169] ata1.00: error: { UNC }


res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
    [  180.868343] ata1.00: status: { DRDY }
    [  180.868344] ata1.00: failed command: READ FPDMA QUEUED
    [  180.868346] ata1.00: cmd 60/f8:78:00:66:70/00:00:74:00:00/40 tag 15 ncq 126976 in
                            res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[180.868347] ata1.00: status: { DRDY }
[80.868348] ata1.00: failed command: READ FPDMA QUEUED
180.868351] ata1.00: cmd 60/08:80:80:a8:98/00:00:64:00:00/40 tag 16 ncq 4096 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

smartmontools 输出

smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-53-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Samsung SpinPoint M8 (AF)
Device Model:     ST1000LM024 HN-M101MBB
Serial Number:    S2Y9J9AD713451
LU WWN Device Id: 5 0004cf 20aa6a576
Firmware Version: 2AR20002
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Thu Dec  8 17:38:07 2016 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:        (13380) seconds.
Offline data collection
capabilities:            (0x51) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 223) minutes.
SCT capabilities:          (0x003d) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       1260
  2 Throughput_Performance  0x0027   252   252   000    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0023   089   082   025    Pre-fail  Always       -       3454
  4 Start_Stop_Count        0x0032   068   068   000    Old_age   Always       -       32547
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002f   252   252   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   252   252   015    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       6311
 10 Spin_Retry_Count        0x0033   252   252   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       328
 12 Power_Cycle_Count       0x0032   098   098   000    Old_age   Always       -       3027
181 Program_Fail_Cnt_Total  0x0022   100   100   000    Old_age   Always       -       5811877
183 Runtime_Bad_Block       0x0032   252   252   010    Old_age   Always       -       0
184 End-to-End_Error        0x0033   252   252   048    Pre-fail  Always       -       0
186 Unknown_Attribute       0x0032   252   252   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       728
188 Command_Timeout         0x0032   252   252   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0002   062   049   040    Old_age   Always       -       38 (Min/Max 19/51)
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       209
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   088   088   000    Old_age   Always       -       124383
194 Temperature_Celsius     0x0002   062   049   000    Old_age   Always       -       38 (Min/Max 19/51)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       9894

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      1209         -
# 2  Short offline       Aborted by host               90%      1209         -

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Completed [00% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

答案1

看起来那个驱动器快要死了。我敢打赌它也会发出奇怪的嘎吱声。 (它来自于磁头反复移动,因为驱动器试图重新读取数据,而它无法在没有不可纠正的错误的情况下读取数据。)

尽快从该驱动器复制您的个人文件,然后更换该驱动器。它可能很快就会发生灾难性的失败,然后就根本无法工作。

如果驱动器支持 SMART 并且您安装了 smartmontools,则可以运行sudo smartctl -a /dev/sda并在此处报告结果。根据结果​​,您可以运行长时间的离线检查(可能需要一两个小时,具体取决于磁盘大小)以准确找出驱动器的状态,但根据我的经验,我敢打赌它快要死了。唯一的问题是它彻底失败的速度有多快。

相关内容