我的服务器运行 CentOS 时遇到问题。由于备份脚本错误未能删除旧数据,系统分区“md3”(raid1)已满。结果我的CentOS系统就变得无法访问了。
当我尝试重新启动服务器时,系统无法挂载与“md3”分区关联的/sysroot。该过程最终超时,导致系统无法启动。
我还尝试在救援模式下访问“md3”来执行一些清理,但安装命令无法安装分区。该进程无限期挂起,而不显示任何错误。 (我尝试仅以读取模式挂载并且详细,没有显示)。
对于其他上下文,我检查了 RAID1 和硬盘状态,一切正常(通过托管支持进行验证)。
root@rescue-customer-eu (/////) ~ # free --mega
total used free shared buff/cache available
Mem: 33266 2256 30356 15 653 30101
Swap: 0 0 0
root@rescue-customer-eu (/////) ~ # dmidecode -q -t memory | grep Size
Size: No Module Installed
Size: 16384 MB
Size: No Module Installed
Size: 16384 MB
root@rescue-customer-eu (/////) ~ # fdisk -l
Disk /dev/ram0: 50 MiB, 52428800 bytes, 102400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram1: 50 MiB, 52428800 bytes, 102400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram2: 50 MiB, 52428800 bytes, 102400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram3: 50 MiB, 52428800 bytes, 102400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram4: 50 MiB, 52428800 bytes, 102400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram5: 50 MiB, 52428800 bytes, 102400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram6: 50 MiB, 52428800 bytes, 102400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram7: 50 MiB, 52428800 bytes, 102400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram8: 50 MiB, 52428800 bytes, 102400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram9: 50 MiB, 52428800 bytes, 102400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram10: 50 MiB, 52428800 bytes, 102400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram11: 50 MiB, 52428800 bytes, 102400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram12: 50 MiB, 52428800 bytes, 102400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram13: 50 MiB, 52428800 bytes, 102400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram14: 50 MiB, 52428800 bytes, 102400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram15: 50 MiB, 52428800 bytes, 102400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sda: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: HGST HUS724020AL
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 5F909BBA-6950-429A-BB7E-BBBCC2C704EA
Device Start End Sectors Size Type
/dev/sda1 2048 1048575 1046528 511M EFI System
/dev/sda2 1048576 2095103 1046528 511M Linux RAID
/dev/sda3 2095104 3907026943 3904931840 1.8T Linux RAID
/dev/sda5 3907026944 3907028991 2048 1M Linux filesystem
Disk /dev/sdb: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: HGST HUS724020AL
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 71084CB3-27E5-4CBF-A406-98371DDCFB44
Device Start End Sectors Size Type
/dev/sdb1 2048 1048575 1046528 511M EFI System
/dev/sdb2 1048576 2095103 1046528 511M Linux RAID
/dev/sdb3 2095104 3907029134 3904934031 1.8T Linux RAID
Disk /dev/md3: 1.8 TiB, 1999325036544 bytes, 3904931712 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/md2: 511 MiB, 535756800 bytes, 1046400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
root@rescue-customer-eu (/////) ~ # smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.1.38-mod-std] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Hitachi/HGST Ultrastar 7K4000
Device Model: HGST HUS724020ALA640
Serial Number: PN1134P6H1DGLN
LU WWN Device Id: 5 000cca 22dcebab0
Firmware Version: MF6OABY0
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Sep 11 07:08:28 2023 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 24) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 319) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 136 136 054 Pre-fail Offline - 82
3 Spin_Up_Time 0x0007 214 214 024 Pre-fail Always - 274 (Average 308)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 264
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 142 142 020 Pre-fail Offline - 25
9 Power_On_Hours 0x0012 092 092 000 Old_age Always - 58834
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 264
192 Power-Off_Retract_Count 0x0032 099 099 000 Old_age Always - 1257
193 Load_Cycle_Count 0x0012 099 099 000 Old_age Always - 1257
194 Temperature_Celsius 0x0002 171 171 000 Old_age Always - 35 (Min/Max 15/75)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 33143 -
# 2 Short offline Completed without error 00% 33142 -
# 3 Short offline Completed without error 00% 33141 -
# 4 Short offline Completed without error 00% 33141 -
# 5 Short offline Completed without error 00% 33133 -
# 6 Short offline Completed without error 00% 33133 -
# 7 Short offline Completed without error 00% 33115 -
# 8 Short offline Completed without error 00% 33099 -
# 9 Short offline Completed without error 00% 33091 -
#10 Short offline Completed without error 00% 30865 -
#11 Short offline Completed without error 00% 30847 -
#12 Short offline Completed without error 00% 30811 -
#13 Short offline Completed without error 00% 30803 -
#14 Short offline Completed without error 00% 30803 -
#15 Short offline Completed without error 00% 30794 -
#16 Short offline Completed without error 00% 30794 -
#17 Short offline Completed without error 00% 26924 -
#18 Short offline Completed without error 00% 26921 -
#19 Short offline Completed without error 00% 26921 -
#20 Short offline Completed without error 00% 26888 -
#21 Short offline Completed without error 00% 26880 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
root@rescue-customer-eu (/////) ~ # smartctl -a /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.1.38-mod-std] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Hitachi/HGST Ultrastar 7K4000
Device Model: HGST HUS724020ALA640
Serial Number: PN2134P6HEY5NP
LU WWN Device Id: 5 000cca 22dd46de0
Firmware Version: MF6OABY0
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Sep 11 07:09:41 2023 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 28) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 326) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 137 137 054 Pre-fail Offline - 79
3 Spin_Up_Time 0x0007 215 215 024 Pre-fail Always - 274 (Average 306)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 120
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 140 140 020 Pre-fail Offline - 26
9 Power_On_Hours 0x0012 090 090 000 Old_age Always - 70019
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 120
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1114
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 1114
194 Temperature_Celsius 0x0002 166 166 000 Old_age Always - 36 (Min/Max 14/52)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 44327 -
# 2 Short offline Completed without error 00% 44325 -
# 3 Short offline Completed without error 00% 44325 -
# 4 Short offline Completed without error 00% 44324 -
# 5 Short offline Completed without error 00% 44316 -
# 6 Short offline Completed without error 00% 44316 -
# 7 Short offline Completed without error 00% 44298 -
# 8 Short offline Completed without error 00% 44282 -
# 9 Short offline Completed without error 00% 44274 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
root@rescue-customer-eu (/////) ~ # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]
md2 : active raid1 sda2[0] sdb2[1]
523200 blocks [2/2] [UU]
md3 : active raid1 sda3[0] sdb3[1]
1952465856 blocks [2/2] [UU]
unused devices: <none>
root@rescue-customer-eu (/////) ~ # mdadm -D /dev/md3
/dev/md3:
Version : 0.90
Creation Time : Fri Oct 16 14:46:33 2020
Raid Level : raid1
Array Size : 1952465856 (1862.02 GiB 1999.33 GB)
Used Dev Size : 1952465856 (1862.02 GiB 1999.33 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 3
Persistence : Superblock is persistent
Update Time : Sun Sep 10 12:37:01 2023
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Consistency Policy : resync
UUID : e9ffe126:3edcd095:a4d2adc2:26fd5302
Events : 0.681359
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 19 1 active sync /dev/sdb3
有人对如何解决这个问题有任何建议吗?
非常感谢您的建议和帮助。
答案1
快速更新,经过几次测试,我跑了:
xfs_repair -L /dev/md3
这旨在删除损坏的 XFS 日志。显然,要小心:总是有必要进行备份,因为这可能会导致损坏或额外的数据丢失。
谢谢。