昨天我们的服务器(Ubuntu 18.04)的存储容量达到了 100%
并将我们的一个文件系统设置为只读模式,请参阅:/dev/md3 / ext4 ro,relatime,errors=remount-ro,data=ordered 0 0
。我尝试了 serverfault 上其他答案中的几种解决方案,但似乎都不适合我的情况。
例如,我尝试执行以下命令:sudo mount -o remount,rw /dev/md3 /
,但结果显示以下消息:mount: /: cannot remount /dev/md3 read-write, is write-protected.
我该如何解决这个问题,让文件系统再次可读写?
谢谢!
更新调试信息:
mdadm --detail /dev/md3
/dev/md3:
Version : 0.90
Creation Time : Fri Nov 10 10:07:34 2017
Raid Level : raid1
Array Size : 20478912 (19.53 GiB 20.97 GB)
Used Dev Size : 20478912 (19.53 GiB 20.97 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 3
Persistence : Superblock is persistent
Update Time : Sat Sep 18 09:15:35 2021
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Consistency Policy : unknown
UUID : 4b632ac4:ae1a7c2b:a4d2adc2:26fd5302
Events : 0.861
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 19 1 active sync /dev/sdb3
使用 dmesg:
dmesg | grep "md3"
[67448453.830094] EXT4-fs error (device md3): ext4_remount:4840: Abort forced by user
执行tune2fs
:
tune2fs -l /dev/md3
tune2fs 1.44.1 (24-Mar-2018)
Filesystem volume name: /
Last mounted on: /
Filesystem UUID: d1a985c4-8c5e-4034-93e0-629b8e65f161
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean with errors
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 1281120
Block count: 5119728
Reserved block count: 255986
Free blocks: 445848
Free inodes: 1001361
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 1022
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8160
Inode blocks per group: 510
Flex block group size: 16
Filesystem created: Fri Nov 10 10:07:39 2017
Last mount time: Tue Jul 30 17:51:41 2019
Last write time: Thu Sep 16 20:06:05 2021
Mount count: 7
Maximum mount count: -1
Last checked: Fri Nov 10 10:07:39 2017
Check interval: 0 (<none>)
Lifetime writes: 4013 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
First orphan inode: 663035
Default directory hash: half_md4
Directory Hash Seed: ae316af1-086d-470f-af27-0c10ca25f3c8
Journal backup: inode blocks
FS Error count: 8
First error time: Thu Sep 16 20:06:04 2021
First error function: ext4_lookup
First error line #: 1607
First error inode #: 930317
First error block #: 0
Last error time: Sat Sep 18 09:15:35 2021
Last error function: ext4_remount
Last error line #: 4840
Last error inode #: 685456
Last error block #: 0
调试信息使用e2fsck -n /dev/md3
:
e2fsck -n /dev/md3
e2fsck 1.44.1 (24-Mar-2018)
Warning: skipping journal recovery because doing a read-only filesystem check.
/ contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found. Fix? no
Inode 101 was part of the orphaned inode list. IGNORED.
Inode 117 was part of the orphaned inode list. IGNORED.
Inode 292 was part of the orphaned inode list. IGNORED.
Inode 460 was part of the orphaned inode list. IGNORED.
Inode 465 was part of the orphaned inode list. IGNORED.
Inode 471 was part of the orphaned inode list. IGNORED.
Inode 487 was part of the orphaned inode list. IGNORED.
Inode 529 was part of the orphaned inode list. IGNORED.
Inode 562 was part of the orphaned inode list. IGNORED.
Inode 564 was part of the orphaned inode list. IGNORED.
Inode 707 was part of the orphaned inode list. IGNORED.
Inode 723 was part of the orphaned inode list. IGNORED.
Inode 918 was part of the orphaned inode list. IGNORED.
...
Deleted inode 402614 has zero dtime. Fix? no
...
Inode 783370, end of extent exceeds allowed value
(logical block 1024, physical block 3068928, len 76)
Clear? no
Inode 783370, i_blocks is 8784, should be 8200. Fix? no
Inode 783470, end of extent exceeds allowed value
(logical block 2708, physical block 1322783, len 193)
Clear? no
Inode 783470, i_blocks is 23200, should be 21672. Fix? no
Inode 1047956 was part of the orphaned inode list. IGNORED.
Pass 2: Checking directory structure
Entry 'tmp' in /tmp/systemd-private-bb09aae54cab4e12844e5844d11ca5eb-certbot.service-VSBnVY (685456) has deleted/unused inode 685457. Clear? no
Entry '1159_key-certbot.pem' in /etc/letsencrypt/keys (930317) has deleted/unused inode 920168. Clear? no
Entry '1159_key-certbot.pem' in /etc/letsencrypt/keys (930317) has an incorrect filetype (was 1, should be 0).
Fix? no
Entry '1110_csr-certbot.pem' in /etc/letsencrypt/csr (930318) has deleted/unused inode 920176. Clear? no
Entry '1110_csr-certbot.pem' in /etc/letsencrypt/csr (930318) has an incorrect filetype (was 1, should be 0).
Fix? no
Entry '1106_key-certbot.pem' in /etc/letsencrypt/keys (930317) has deleted/unused inode 920166. Clear? no
Entry '1106_key-certbot.pem' in /etc/letsencrypt/keys (930317) has an incorrect filetype (was 1, should be 0).
Fix? no
Entry '1109_key-certbot.pem' in /etc/letsencrypt/keys (930317) has deleted/unused inode 920173. Clear? no
Entry '1109_key-certbot.pem' in /etc/letsencrypt/keys (930317) has an incorrect filetype (was 1, should be 0).
Fix? no
Entry '1146_csr-certbot.pem' in /etc/letsencrypt/csr (930318) has deleted/unused inode 920172. Clear? no
Entry '1146_csr-certbot.pem' in /etc/letsencrypt/csr (930318) has an incorrect filetype (was 1, should be 0).
Fix? no
...
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 685456 ref count is 3, should be 2. Fix? no
Pass 5: Checking group summary information
Block bitmap differences: -34565 -(53721--53734) -(59721--59761) -(59981--59983) -(61106--61184) -(61540--61544) -(70964--71007) -(71274--71313) -(84938--84989) -(85084--85107) -(85592--85599) -(116400--116408) -(116423--116436) -(128700--128703) -(128708--128721) -(138904--138914) -(165045--165150) -(169691--169713) -(169717--169742) -(464896--471464) -(471552--471989) -(472928--472947) -(499200--499612) -(501408--501434) -(503808--504070) -(513024--513301) -(513408--513491) -(589477--589480) -(711431--711441) -(747968--748030) -(838733--838740) -(838755--838758) -(838772--838783) -(838791--838800) -(838805--838816) -(838824--838835) -(848384--848972) -(875840--875880) -(1032187--1033031) -(1083840--1083878) -(1120110--1120132) -(1322783--1322975) -(1631196--1631251) -(1635150--1635169) -(1635360--1635391) -(1635571--1635575) -(1635848--1635855) -(1635996--1636001) -1648860 -1648880 -(1715533--1715536) -(1740800--1741311) -(1746432--1746573) -(1750528--1750729) -(1867776--1867880) -(1870717--1871294) -(1880576--1880791) -(1888256--1888258) -1888260 -(1888272--1888273) -(1888275--1888767) -(2226402--2226405) -(2235495--2235719) -(2266304--2266332) -(2301560--2301629) -(2528723--2528753) -(2589088--2589117) -(2597312--2597374) -(2597696--2597757) -(2614784--2615295) -(2619392--2619458) -(2619904--2620297) -2636181 -(2671360--2671491) -(2687328--2687350) -(3068928--3069003) -(3196998--3197002) -(3228728--3228738) -(3236697--3236703) -(3252961--3252970) -(3264276--3264277) -(3264287--3264298) -(3285164--3285170) -(3299518--3299524) -(3399680--3400062) -(3441024--3441129) -(3574080--3574142) -(3601664--3601795) -(3659648--3659724) -(3660672--3660755) -(3704233--3704234) -(3704237--3704242) -3707626 -3708898 -3709310 -3709356 -3709398 -3709984 -(3751694--3751696) -(3751707--3751711) -(3751767--3751768) -(3751774--3751775) -(3751800--3751814) -(3771264--3771343) -(3830025--3830040) -(3860480--3867203) -(3867616--3867644) -(3868160--3868618) -(3869696--3870139) -(4045457--4045483) -(4087936--4088023) -(4088032--4088055) -(4088320--4088780) -(4088960--4089064) -(4089088--4089126) -(4091136--4091324) -(4091392--4092119) -(4092928--4094514) -(4094976--4095854) -(4097088--4097120) -(4097536--4097816) -(4109312--4110157) -(4250368--4250378) -(4278497--4278513) -(4296960--4297014) -(4325486--4325616) -(4325632--4325707) -(4326688--4327074) -(4328826--4328961) -(4329202--4329314) -(4329600--4329666) -(4329764--4329804) -(4332027--4332178) -(4332406--4332476) -(4333568--4333942) -(4334372--4334454) -(4334564--4335227) -(4621153--4621176) -(4669781--4670170) -(4696470--4696548) -(4697074--4697429) -(4697662--4697711) -(4726778--4727894) -(5055921--5056185) -(5056648--5056667) -(5106412--5106620) -(5106668--5107034)
Fix? no
Free blocks count wrong for group #76 (3374, counted=3375).
Fix? no
Free blocks count wrong (445848, counted=445849).
Fix? no
Inode bitmap differences: -101 -117 -292 -460 -465 -471 -487 -529 -562 -564 -707 -723 -918 -(1837--1838) -2041 -2714 -3593 -3654 -3659 -3894 -3976 -4336 -4425 -5193 -5244 -5252 -5930 -5951 -5967 -(7066--7069) -7431 -8492 -8651 -9298 -9583 -9592 -14261 -14270 -18093 -19214 -21301 -(27843--27844) -27847 -27849 -(27853--27856) -(27868--27869) -(27872--27873) -27875 -27879 -27883 -27885 -(27889--27890) -27892 -162842 -391708 -391741 -391759 -391763 -(391800--391802) -(391804--391805) -(391812--391814) -(391831--391833) -391870 -391873 -391878 -391900 -391902 -(391910--391911) -391915 -391919 -391927 -391956 -392493 -392719 -393759 -393795 -395132 -395134 -395161 -395165 -395221 -395234 -395267 -395289 -(395312--395313) -395315 -395325 -395336 -395387 -395630 -396550 -396589 -(396699--396700) -402594 -(402596--402598) -402601 -(402604--402606) -402608 -(402611--402614) -407918 -413872 -413874 -413881 -413885 -413897 -413900 -413908 -421042 -421202 -421226 -426391 -652905 -(652931--652935) -663035 -685457 -920162 -(920164--920176) -1047956
Fix? no
Directories count wrong for group #84 (17, counted=16).
Fix? no
Free inodes count wrong for group #96 (80, counted=82).
Fix? no
Free inodes count wrong for group #112 (486, counted=487).
Fix? no
Free inodes count wrong (1001361, counted=1001364).
Fix? no
/: ********** WARNING: Filesystem still has errors **********
/: 279759/1281120 files (0.7% non-contiguous), 4673880/5119728 blocks
答案1
正是文件系统损坏导致此开关进入只读模式,而不是其溢出,完全遵循挂载选项errors=remount-ro
。
备份重要数据和配置并将它们下载到某处。准备一个恢复计划,以防某些重要的启动程序出现故障。如果可能的话,将重要服务移到另一台机器上。会有一些停机时间。
我注意到这个系统不经常重启(自 2017 年以来只重启了 7 次,最后一次重启是在 2019 年)。所以我建议设置最大挂载数为 1,因此每次启动都会检查:
tune2fs -c 1 /dev/md3
然后重新启动。启动时,初始化脚本应该会检查并修复文件系统。但是,损坏可能非常严重,因此可能需要手动交互,因此请确保有人在服务器附近并随时准备帮助您。而且,如果此损坏影响了某些重要的东西,您可能会遇到奇怪的问题。
在最坏的情况下,您必须重新安装系统。但不要忘记将最大安装数再次设置为 1。
文件系统为什么会损坏?它就是会发生。块存储在内存中,损坏可能发生在那里,比如宇宙射线。这种情况很少见,有时会发生。然后,磁盘也不是理想的,无法检测到所有错误;存在非零位错误率(在设备数据表中查找实际值),因此读取数据损坏的可能性非常低,但仍然有可能。如果这种情况发生在元数据块上,问题可能会累积(由错误信息指导的文件系统驱动程序可能会做出一些错误的假设并进一步破坏文件系统),这就是为什么不时检查它很重要的原因。