系统非常慢,除非运行 Smartmontools 自检

系统非常慢,除非运行 Smartmontools 自检

我对 Ubuntu 还不太熟悉。我有一台联想 L430,配备 i5-3210M CPU @ 2.50GHz × 4、4GB Ram、Windows 7 双启动。硬盘 500 GB 东芝 MK5061GSY。

每 10/15 秒,HDD 灯会亮起 2-3 秒,并且一切都冻结,包括键盘输入、切换窗口等。[我在 windows7 下也出现了同样的问题,事实上,这让我尝试使用 Ubuntu 来查看这是否是 windows 的问题 - 显然不是]

这对应于以下 99.99% I/OI 观察结果:

$ sudo iotop -qtoqq
11:45:03   282 be/3 root        0.00 B/s   27.44 K/s  0.00 % 99.99 % [jbd2/sda5-8]
11:45:03  2175 be/4 simone      0.00 B/s    3.92 K/s  0.00 %  0.12 % firefox
11:45:04  2234 be/4 simone      0.00 B/s   11.65 K/s  0.00 %  0.00 % gnome-terminal
11:45:09   282 be/3 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % [jbd2/sda5-8]
11:45:11   282 be/3 root        0.00 B/s   35.01 K/s  0.00 % 99.99 % [jbd2/sda5-8]
11:45:11  2234 be/4 simone      0.00 B/s    7.78 K/s  0.00 % 99.99 % gnome-terminal
11:45:11   563 be/4 syslog      0.00 B/s    3.89 K/s  0.00 %  0.00 % rsyslogd -c5

我尝试使用 Smartmontools 检查磁盘,虽然结果 - 据我所知 - 并未表明问题的可能来源,但当我启动“长”测试时

$ sudo smartctl -t long /dev/sda 
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.8.0-34-generic] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

令我惊讶的是,测试过程中问题消失了!不到一分钟,>iotop 输出中 99.99% 的值就消失了,HDD 灯只会短暂闪烁 - 没问题,我终于可以用电脑工作了!

$ sudo iotop -qtoqq
11:46:42   282 be/3 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % [jbd2/sda5-8]
11:46:44   282 be/3 root        0.00 B/s    0.00 B/s  0.00 %  3.84 % [jbd2/sda5-8]
11:46:45   282 be/3 root        0.00 B/s   15.63 K/s  0.00 % 99.99 % [jbd2/sda5-8]
11:46:49  2175 be/4 simone      0.00 B/s    3.91 K/s  0.00 %  5.76 % firefox
11:46:49  2200 be/4 simone      0.00 B/s  109.47 K/s  0.00 %  0.00 % firefox
11:46:49  2220 be/4 simone      0.00 B/s   62.56 K/s  0.00 %  0.00 % firefox
11:46:55   282 be/3 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % [jbd2/sda5-8]
11:47:00   282 be/3 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % [jbd2/sda5-8]
11:47:00  2234 be/4 simone      0.00 B/s    7.79 K/s  0.00 %  0.26 % gnome-terminal
11:47:28   282 be/3 root        0.00 B/s   43.06 K/s  0.00 % 99.99 % [jbd2/sda5-8]
11:47:28   563 be/4 syslog      0.00 B/s    3.91 K/s  0.00 %  0.00 % rsyslogd -c5
11:47:28  2175 be/4 simone      0.00 B/s    7.83 K/s  0.00 %  0.00 % firefox
11:47:29  2234 be/4 simone      0.00 B/s    7.76 K/s  0.00 %  0.00 % gnome-terminal
11:47:31   282 be/3 root        0.00 B/s    7.81 K/s  0.00 %  2.68 % [jbd2/sda5-8]
11:47:31  2175 be/4 simone      0.00 B/s    3.91 K/s  0.00 %  2.18 % firefox
11:47:32  2220 be/4 simone      0.00 B/s   62.18 K/s  0.00 %  0.00 % firefox
11:47:36   924 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.77 % [flush-8:0]>        
11:47:37   282 be/3 root        0.00 B/s   31.24 K/s  0.00 %  4.88 % [jbd2/sda5-8]
11:47:37  2234 be/4 simone      0.00 B/s    7.81 K/s  0.00 %  0.00 % gnome-terminal
11:47:41   282 be/3 root        0.00 B/s    0.00 B/s  0.00 %  1.43 % [jbd2/sda5-8]
11:47:42   282 be/3 root        0.00 B/s    0.00 B/s  0.00 %  7.86 % [jbd2/sda5-8]
11:47:42  2234 be/4 simone      0.00 B/s    7.80 K/s  0.00 %  0.00 % gnome-terminal
11:47:46   282 be/3 root        0.00 B/s    3.90 K/s  0.00 %  4.88 % [jbd2/sda5-8]
11:47:47  2234 be/4 simone      0.00 B/s    7.77 K/s  0.00 %  0.00 % gnome-terminal
11:47:53   282 be/3 root        0.00 B/s    3.89 K/s  0.00 %  3.40 % [jbd2/sda5-8]

相反,不幸的是,测试结束后不久(大约 2 小时)问题再次出现 :(

创建一个在启动时启动测试然后每 2 小时启动一次测试的任务肯定不是最好的选择。有什么建议吗?提前谢谢您!

请注意,我也尝试改变 PowerManagement 行为,但没有观察到任何变化:

sudo hdparm -B128 /dev/sda *[which was the defalut value]*
sudo hdparm -B1 /dev/sda
sudo hdparm -B254 /dev/sda
sudo hdparm -B254 /dev/sda

我也没有观察到高清温度的任何影响,通常在 47C 左右

测试结果如下:

$ sudo smartctl -l selftest /dev/sda 
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.8.0-34-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===>       
SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Aborted by host               90%      1859         -
# 2  Extended offline    Completed without error       00%      1849         -
# 3  Extended offline    Completed without error       00%      1834         -
# 4  Vendor (0x50)       Aborted by host               90%      1831         -
# 5  Short offline       Completed without error       00%      1831         -
# 6  Extended offline    Aborted by host               10%      1830         -
# 7  Extended offline    Completed without error       00%      1803         -
# 8  Extended offline    Completed without error       00%      1693         -
# 9  Short offline       Completed without error       00%      1690         -
#10  Short offline       Completed without error       00%      1636         -
#11  Vendor (0x50)       Completed without error       00%       929         -
#12  Short offline       Completed without error       00%       928         -
#13  Vendor (0x50)       Completed without error       00%       792         -
#14  Short offline       Completed without error       00%       792         -
#15  Vendor (0x50)       Completed without error       00%       791         -
#16  Short offline       Completed without error       00%       791         -
#17  Short offline       Aborted by host               90%       790         -
#18  Vendor (0x50)       Completed without error       00%       134         -
#19  Short offline       Completed without error       00%       134         -
#20  Short offline       Aborted by host               80%        28         -
#21  Vendor (0x50)       Completed without error       00%         0         -

我得到的sudo smartctl -d ata -a /dev/sda是:

smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.8.0-34-generic] (local
build) Copyright (C) 2002-11 by Bruce Allen,
http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION === Device Model:     TOSHIBA MK5061GSY Serial Number:    X2KCY12FF LU WWN Device Id: 5 000039
45d302ace Firmware Version: MC102E User Capacity:    500,107,862,016
bytes [500 GB] Sector Size:      512 bytes logical/physical Device is:
Not in smartctl database [for details use: -P showall] ATA Version is:
8 ATA Standard is:  Exact ATA specification draft version not
indicated Local Time is:    Thu Jan  2 10:12:51 2014 CET SMART support
is: Available - device has SMART capability. SMART support is: Enabled

=== START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED

General SMART Values: Offline data collection status:  (0x82)   Offline
data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled. Self-test execution status:      ( 249)  Self-test routine in progress...
                    90% of test remaining. Total time to complete Offline  data collection:         (  120) seconds. Offline data collection capabilities:
             (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported. SMART capabilities:            (0x0003)  Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer. Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported. Short self-test routine  recommended polling time:    (   2) minutes. Extended self-test routine
recommended polling time:    ( 121) minutes. SCT capabilities:        
(0x003f)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16 Vendor Specific
SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME          FLAG    
VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE   1
Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always  
-       0   2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0   3 Spin_Up_Time            0x0027  
100   100   001    Pre-fail  Always       -       2379   4
Start_Stop_Count        0x0032   100   100   000    Old_age   Always  
-       1147   5 Reallocated_Sector_Ct   0x0033   027   027   010    Pre-fail  Always       -       1497   7 Seek_Error_Rate         0x000b
100   100   050    Pre-fail  Always       -       0   8
Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline 
-       0   9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       1896  10 Spin_Retry_Count        0x0033
122   100   030    Pre-fail  Always       -       0  12
Power_Cycle_Count       0x0032   100   100   000    Old_age   Always  
-       1041 191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       5 192 Power-Off_Retract_Count 0x0032  
100   100   000    Old_age   Always       -       23 193
Load_Cycle_Count        0x0032   098   098   000    Old_age   Always  
-       23192 194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       48 (Min/Max 10/57) 196
Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always  
-       304 197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0 198 Offline_Uncorrectable   0x0030  
100   100   000    Old_age   Offline      -       0 199
UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always  
-       0 220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       128 222 Loaded_Hours            0x0032 
096   096   000    Old_age   Always       -       1737 223
Load_Retry_Count        0x0032   100   100   000    Old_age   Always  
-       0 224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0 226 Load-in_Time            0x0026  
100   100   000    Old_age   Always       -       297 240
Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline 
-       0

SMART Error Log Version: 1 No Errors Logged

SMART Self-test log structure revision number 1 Num  Test_Description 
Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               10%      1887         -
# 2  Extended offline    Aborted by host               90%      1864         -
# 3  Extended offline    Interrupted (host reset)      70%      1863         -
# 4  Extended offline    Aborted by host               50%      1862         -
# 5  Extended offline    Aborted by host               50%      1861         -
# 6  Extended offline    Aborted by host               90%      1860         -
# 7  Extended offline    Aborted by host               90%      1859         -
# 8  Extended offline    Completed without error       00%      1849         -
# 9  Extended offline    Completed without error       00%      1834         -
#10  Vendor (0x50)       Aborted by host               90%      1831         -
#11  Short offline       Completed without error       00%      1831         -
#12  Extended offline    Aborted by host               10%      1830         -
#13  Extended offline    Completed without error       00%      1803         -
#14  Extended offline    Completed without error       00%      1693         -
#15  Short offline       Completed without error       00%      1690         -
#16  Short offline       Completed without error       00%      1636         -
#17  Vendor (0x50)       Completed without error       00%       929         -
#18  Short offline       Completed without error       00%       928         -
#19  Vendor (0x50)       Completed without error       00%       792         -
#20  Short offline       Completed without error       00%       792         -
#21  Vendor (0x50)       Completed without error       00%       791         -

SMART Selective self-test log data structure revision number 1  SPAN 
MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing Selective self-test flags (0x0):   After scanning selected spans, do NOT read-scan remainder of disk. If
Selective self-test is pending on power-up, resume after 0 minute
delay.

也许这些测试能揭示一些有用的信息?谢谢!

相关内容