Linux 软件 RAID 1 锁定为只读模式
设置:
Centos 5.2,2x 320 GB SATA 硬盘,RAID 1。
- /dev/md0 (/dev/sda1 + /dev/sdb1) 是 /boot
- /dev/md1 (/dev/sda1 + /dev/sdb1) 是一个 LVM 分区,其中包含 /、/data 和 swap 分区
除 swap 之外的所有文件系统均为 ext3
我们在几个系统上遇到了问题,其中一个驱动器的故障导致根文件系统锁定为只读,这显然会导致问题。
[root@myserver /]# mount | grep Root
/dev/mapper/VolGroup00-LogVolRoot on / type ext3 (rw)
[root@myserver /]# touch /foo
touch: cannot touch `/foo': Read-only file system
我可以看到阵列中的一个分区有故障:
[root@myserver /]# mdadm --detail /dev/md1
/dev/md1:
[...]
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
[...]
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 18 1 active sync /dev/sdb2
2 8 2 - faulty spare /dev/sda2
以 rw 身份重新挂载失败:
[root@myserver /]# mount -n -o remount /
mount: block device /dev/VolGroup00/LogVolRoot is write-protected, mounting read-only
除非使用 --ignorelockingfailure,否则 LVM 工具会给出错误(因为它们无法写入 /var),但会将卷组显示为 rw:
[root@myserver /]# lvm vgdisplay
Locking type 1 initialisation failed.
[root@myserver /]# lvm pvdisplay --ignorelockingfailure
--- Physical volume ---
PV Name /dev/md1
VG Name VolGroup00
PV Size 279.36 GB / not usable 15.56 MB
Allocatable yes (but full)
[...]
[root@myserver /]# lvm vgdisplay --ignorelockingfailure
--- Volume group ---
VG Name VolGroup00
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 4
VG Access read/write
VG Status resizable
[...]
[root@myserver /]# lvm lvdisplay /dev/VolGroup00/LogVolRoot --ignorelockingfailure
--- Logical volume ---
LV Name /dev/VolGroup00/LogVolRoot
VG Name VolGroup00
LV UUID PGoY0f-rXqj-xH4v-WMbw-jy6I-nE04-yZD3Gx
LV Write Access read/write
[...]
在这种情况下,/boot(单独的 RAID 元设备)和 /data(同一卷组中的不同逻辑卷)仍然可写。从以前的事件中,我知道重新启动将使系统恢复为可读/写的根文件系统和适当降级的 RAID 阵列。
因此,我有两个问题:
1)发生这种情况时,如何在不重新启动系统的情况下让根文件系统恢复读/写状态?
2) 需要进行哪些更改才能停止此文件系统锁定?当单个磁盘上的 RAID 1 发生故障时,我们不希望文件系统锁定,我们希望系统继续运行,直到我们可以更换坏磁盘为止。
编辑:我可以在 dmesg 输出中看到这一点 - 这是否表示 /dev/sda 发生故障,然后 /dev/sdb 发生单独故障,导致文件系统被设置为只读?
sda: Current [descriptor]: sense key: Aborted Command
Add. Sense: Recorded entity not found
Descriptor sense data with sense descriptors (in hex):
72 0b 14 00 00 00 00 0c 00 0a 80 00 00 00 00 00
00 03 ce 85
end_request: I/O error, dev sda, sector 249477
raid1: Disk failure on sda2, disabling device.
Operation continuing on 1 devices
ata1: EH complete
SCSI device sda: 586072368 512-byte hdwr sectors (300069 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:0, dev:sda2
disk 1, wo:0, o:1, dev:sdb2
RAID1 conf printout:
--- wd:1 rd:2
disk 1, wo:0, o:1, dev:sdb2
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: irq_stat 0x40000001
ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
res 51/04:00:34:cf:f3/00:00:00:f3:40/a3 Emask 0x1 (device error)
ata2.00: status: { DRDY ERR }
ata2.00: error: { ABRT }
ata2.00: configured for UDMA/133
ata2: EH complete
sdb: Current [descriptor]: sense key: Aborted Command
Add. Sense: Recorded entity not found
Descriptor sense data with sense descriptors (in hex):
72 0b 14 00 00 00 00 0c 00 0a 80 00 00 00 00 00
01 e3 5e 2d
end_request: I/O error, dev sdb, sector 31677997
Buffer I/O error on device dm-0, logical block 3933596
lost page write due to I/O error on dm-0
ata2: EH complete
SCSI device sdb: 586072368 512-byte hdwr sectors (300069 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
ata2.00: irq_stat 0x40000008
ata2.00: cmd 61/38:00:f5:d6:03/00:00:00:00:00/40 tag 0 ncq 28672 out
res 41/10:00:f5:d6:03/00:00:00:00:00/40 Emask 0x481 (invalid argument) <F>
ata2.00: status: { DRDY ERR }
ata2.00: error: { IDNF }
ata2.00: configured for UDMA/133
sd 1:0:0:0: SCSI error: return code = 0x08000002
sdb: Current [descriptor]: sense key: Aborted Command
Add. Sense: Recorded entity not found
Descriptor sense data with sense descriptors (in hex):
72 0b 14 00 00 00 00 0c 00 0a 80 00 00 00 00 00
00 03 d6 f5
end_request: I/O error, dev sdb, sector 251637
ata2: EH complete
SCSI device sdb: 586072368 512-byte hdwr sectors (300069 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
Aborting journal on device dm-0.
journal commit I/O error
ext3_abort called.
EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
答案1
您的dmesg
输出应该会告诉您为什么它会向 PV 发出故障信号;这不应该发生。至于让系统再次可写,将 VG 和 LV 踢出为只读,然后返回到读写状态,这在内存中是可行的,但真正的解决方案是让 md 不再不必要地担心 LVM。