e2fsck 运行需要很长时间

2024-5-31 • tag-icon

我正在我的一个磁盘分区 (ext4) 上运行 e2fsck，但似乎需要很长时间。它现在已经运行了将近 10 个小时左右，仍然处于 42%。分区的大小约为 800Gigs，总磁盘大小（分区所在的磁盘）约为 1TB。

运行 iostat 显示以下输出：

iostat -xzhcd  /dev/sdc 2 5
Linux 3.13.0-37-generic (divick-desktop)    Monday 03 April 2017    _x86_64_    (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.97    0.00    0.41   50.22    0.00   46.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc
                 49.12     0.00    6.87    0.00   223.95     0.02    65.20     1.01  147.22  145.40 4611.03 143.47  98.57

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.25    0.00    9.63   71.67    0.00   14.45

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc
                  0.00     0.00    1.50    0.00     6.00     0.00     8.00     1.00  592.00  592.00    0.00 665.33  99.80

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.71    0.00    6.63   59.34    0.00   31.33

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc
                  0.00     0.00    1.50    0.00     6.00     0.00     8.00     1.00  592.00  592.00    0.00 666.67 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.76    0.00    9.25   56.94    0.00   30.06

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc
                  0.00     0.00    3.50    0.00    14.00     0.00     8.00     1.00  508.00  508.00    0.00 285.71 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.39    0.00    7.63   73.73    0.00   15.25

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc
                  0.00     0.00    1.50    0.00     6.00     0.00     8.00     1.00  593.33  593.33    0.00 666.67 100.00

为什么 r_await 时间这么长（~0.5 毫秒）？这是磁盘故障的信号还是其他原因？

解释在磁盘上运行智能测试的结果似乎有点令人困惑。我在智能测试输出中看到以下几行：

SMART 整体健康自我评估测试结果：通过

但查看详细的输出我看到：

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   192   192   051    Pre-fail  Always       -       13824
  3 Spin_Up_Time            0x0027   119   111   021    Pre-fail  Always       -       7008
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       515
  5 Reallocated_Sector_Ct   0x0033   165   165   140    Pre-fail  Always       -       671
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       10561
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       511
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       182
193 Load_Cycle_Count        0x0032   128   128   000    Old_age   Always       -       218580
194 Temperature_Celsius     0x0022   101   080   000    Old_age   Always       -       46
196 Reallocated_Event_Count 0x0032   018   018   000    Old_age   Always       -       182
197 Current_Pending_Sector  0x0032   198   197   000    Old_age   Always       -       480
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       35
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       210

我不清楚磁盘是否真的出现故障。

答案1

列出的 SMART 输出似乎表明驱动器即将报废。特别是：

197 Current_Pending_Sector  0x0032   198   197   000    Old_age   Always       -       480
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       35

当这两个属性中的一个或两个的“RAW_VALUE”非零时，我建议立即更换驱动器。

答案2

从 SMART 输出的 13824 Raw_Read_Error_Rate 可以看出，驱动器的读取请求失败，这可能导致 sar 输出中的 r_await 和 iowait 值过高。驱动器的读取请求可能花费了很长时间，然后在超时后失败/中止。我还会检查 dmesg 输出中的驱动程序/设备错误以进一步确认。

答案3

首先，您应该检查问题是否由e2fsck以下原因引起。您可以通过运行命令来执行此top操作。

这是的手册页top。

答案1

答案2

答案3

相关内容