我有一台装有 CentOS7 的存储服务器,其中有 4 个 HDD 在 raidz(Raid 5)上运行。我通过 samba 访问其上的文件。
当我从中复制文件或观看电影时,有时 Samba 的读取会被阻止 10 秒,然后继续。我不知道它阻塞的频率是多少,但大约每 5-10 分钟一次。可能是每读取 X MB 数据一次...
同一台计算机在装有软件 raid 的 CentOS6 上运行良好。
/var/log/messages 中在我访问文件时没有任何内容
ZFS 状态(我之前已经手动运行过 scrub 并且没有发现任何错误):
# zpool status -v
pool: backup
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: scrub repaired 0B in 2h45m with 0 errors on Sun Nov 12 08:15:09 2017
config:
NAME STATE READ WRITE CKSUM
backup ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda6 ONLINE 0 0 0
sdb6 ONLINE 0 0 0
cache
sde3 ONLINE 0 0 0
errors: No known data errors
pool: storage
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sda5 ONLINE 0 0 0
sdb5 ONLINE 0 0 0
sdc5 ONLINE 0 0 0
sdd5 ONLINE 0 0 0
cache
sde2 ONLINE 0 0 0
errors: No known data errors
在 smbd.log 中我只看到(我禁用了宽链接):
[2018/02/13 10:08:59.532259, 0] ../source3/param/loadparm.c:4485(widelinks_warning)
Share 'storage' has wide links and unix extensions enabled. These parameters are incompatible. Wide links will be disabled for this share.
我注意到 dmesg 中有这些消息(可能与停顿无关):
[ 1247.680530] perf: interrupt took too long (2513 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[ 1551.176577] perf: interrupt took too long (3145 > 3141), lowering kernel.perf_event_max_sample_rate to 63000
[ 5137.178646] perf: interrupt took too long (3970 > 3931), lowering kernel.perf_event_max_sample_rate to 50000
[ 5231.736533] perf: interrupt took too long (4969 > 4962), lowering kernel.perf_event_max_sample_rate to 40000
[ 5824.261569] perf: interrupt took too long (6215 > 6211), lowering kernel.perf_event_max_sample_rate to 32000
[ 7051.322619] perf: interrupt took too long (7783 > 7768), lowering kernel.perf_event_max_sample_rate to 25000
更新
以下是一些 smartctl 信息:
/dev/sda
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 528 -
/dev/sdb
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 39047 -
/dev/sdc
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 19547 -
/dev/sdd
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 22117 -
# 2 Short offline Completed without error 00% 4022 -
所有驱动器的总体运行状况读取“通过”
错误日志中没有条目。
以下是智能属性:
/dev/sda
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 253 253 021 Pre-fail Always - 8975
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 74
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 528
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 74
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 8
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 65
194 Temperature_Celsius 0x0022 121 090 000 Old_age Always - 31
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
/dev/sdb
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 3
3 Spin_Up_Time 0x0027 253 253 021 Pre-fail Always - 8441
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 149
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 047 047 000 Old_age Always - 39048
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 147
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 63
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 86
194 Temperature_Celsius 0x0022 120 100 000 Old_age Always - 32
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
/dev/sdc
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 171 171 021 Pre-fail Always - 4416
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 130
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 074 074 000 Old_age Always - 19549
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 130
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 47
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 97
194 Temperature_Celsius 0x0022 117 101 000 Old_age Always - 30
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 3
/dev/sdd
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 172 172 021 Pre-fail Always - 4366
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 149
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 070 070 000 Old_age Always - 22120
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 149
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 61
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 147
194 Temperature_Celsius 0x0022 114 099 000 Old_age Always - 33
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0