标题主要概括了这一点。基本上,我有一个用常规分区分区的盒子/boot
,然后用 LVM 物理卷填充驱动器的其余部分。在 LVM 中,我有一个卷组,其中包含一个根分区、一个/home
分区和一个交换分区。
当 LVM 在 中创建设备节点时/dev/mapper
,它会很好地创建交换分区和主分区。但是,在尝试创建根设备节点时它通常会挂起。这种情况发生在 Live CD(pvscan; vgscan; vgchange -ay
我使用的,IIRC)和初始 ramdisk 上,导致盒子无法启动。我也尝试过 initrd 恢复 shell(lvm pvscan; lvm vgscan; lvm vgchange -ay
我使用的是 IIRC),它也以同样的方式失败。
有时,vgchange -ay
实际上创建了根设备节点(经过很长时间的延迟)但从未退出,让我手动杀死它。发生这种情况时,我尝试安装该设备,但它总是无限期地挂起。请注意,当这两个命令都在运行时,控制台会输出一堆有关失败的命令“READ DMA”或其他内容的消息。
我跑过smartctl -a /dev/sda
几次了每次它都会给出相当多的关于坏块(IIRC)的错误,但最终表明驱动器状况良好。
我已经放了一个粘贴箱dmesg
受影响的机器上的。日志来自启动 Arch Linux live CD,然后运行pvscan; vgscan; vgchange -ay
.vgchange -ay
这次永远挂着,我最终杀了它。这是 的结尾dmesg
,为了后代(所以我[不使用 Pastebin2):
[ 46.332920] end_request: I/O error, dev fd0, sector 0
[ 58.503496] end_request: I/O error, dev fd0, sector 0
[167992.304649] EXT4-fs (sda1): recovery complete
[167992.304660] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[168092.874016] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[168163.318923] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[168459.839738] end_request: I/O error, dev fd0, sector 0
[168472.010337] end_request: I/O error, dev fd0, sector 0
[168614.642035] bio: create slab <bio-2> at 2
[168630.045526] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[168630.045649] ata1.00: BMDMA stat 0x65
[168630.045710] ata1.00: failed command: READ DMA
[168630.045787] ata1.00: cmd c8/00:08:00:10:10/00:00:00:00:00/e6 tag 0 dma 4096 in
res 51/40:08:00:10:10/00:00:00:00:00/e6 Emask 0x9 (media error)
[168630.046006] ata1.00: status: { DRDY ERR }
[168630.046071] ata1.00: error: { UNC }
[168630.066286] ata1.00: configured for UDMA/100
[168630.079493] ata1.01: configured for UDMA/66
[168630.079514] sd 0:0:0:0: [sda] Unhandled sense code
[168630.079517] sd 0:0:0:0: [sda]
[168630.079520] Result: hostbyte=0x00 driverbyte=0x08
[168630.079523] sd 0:0:0:0: [sda]
[168630.079525] Sense Key : 0x3 [current] [descriptor]
[168630.079530] Descriptor sense data with sense descriptors (in hex):
[168630.079532] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[168630.079544] 06 10 10 00
[168630.079549] sd 0:0:0:0: [sda]
[168630.079551] ASC=0x11 ASCQ=0x4
[168630.079554] sd 0:0:0:0: [sda] CDB:
[168630.079556] cdb[0]=0x28: 28 00 06 10 10 00 00 00 08 00
[168630.079567] end_request: I/O error, dev sda, sector 101715968
[168630.079665] Buffer I/O error on device dm-3, logical block 0
[168630.079775] ata1: EH complete
[168634.564062] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[168634.564165] ata1.00: BMDMA stat 0x64
[168634.564225] ata1.00: failed command: READ DMA
[168634.564301] ata1.00: cmd c8/00:08:80:0f:10/00:00:00:00:00/e6 tag 0 dma 4096 in
res 51/10:00:83:0f:10/00:00:00:00:00/e6 Emask 0x81 (invalid argument)
[168634.564527] ata1.00: status: { DRDY ERR }
[168634.564592] ata1.00: error: { IDNF }
[168634.584336] ata1.00: configured for UDMA/100
[168634.597559] ata1.01: configured for UDMA/66
[168634.597578] ata1: EH complete
[168639.087353] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[168639.087462] ata1.00: BMDMA stat 0x64
[168639.087521] ata1.00: failed command: READ DMA
[168639.087596] ata1.00: cmd c8/00:08:80:0f:10/00:00:00:00:00/e6 tag 0 dma 4096 in
res 51/10:00:83:0f:10/00:00:00:00:00/e6 Emask 0x81 (invalid argument)
[168639.087822] ata1.00: status: { DRDY ERR }
[168639.087886] ata1.00: error: { IDNF }
[168639.105791] ata1.00: configured for UDMA/100
[168639.118999] ata1.01: configured for UDMA/66
[168639.119017] ata1: EH complete
[168645.896986] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[168645.897095] ata1.00: BMDMA stat 0x64
[168645.897155] ata1.00: failed command: READ DMA
[168645.900373] ata1.00: cmd c8/00:08:80:0f:10/00:00:00:00:00/e6 tag 0 dma 4096 in
res 51/40:00:83:0f:10/00:00:00:00:00/e6 Emask 0x9 (media error)
[168645.906936] ata1.00: status: { DRDY ERR }
[168645.910263] ata1.00: error: { UNC }
[168645.931315] ata1.00: configured for UDMA/100
[168645.944504] ata1.01: configured for UDMA/66
[168645.944525] sd 0:0:0:0: [sda] Unhandled sense code
[168645.944529] sd 0:0:0:0: [sda]
[168645.944531] Result: hostbyte=0x00 driverbyte=0x08
[168645.944534] sd 0:0:0:0: [sda]
[168645.944537] Sense Key : 0x3 [current] [descriptor]
[168645.944541] Descriptor sense data with sense descriptors (in hex):
[168645.944543] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[168645.944554] 06 10 0f 83
[168645.944559] sd 0:0:0:0: [sda]
[168645.944561] ASC=0x11 ASCQ=0x4
[168645.944564] sd 0:0:0:0: [sda] CDB:
[168645.944566] cdb[0]=0x28: 28 00 06 10 0f 80 00 00 08 00
[168645.944578] end_request: I/O error, dev sda, sector 101715843
[168645.947946] Buffer I/O error on device dm-2, logical block 10485744
[168645.951439] ata1: EH complete
[168650.445911] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[168650.449275] ata1.00: BMDMA stat 0x65
[168650.452579] ata1.00: failed command: READ DMA
[168650.455873] ata1.00: cmd c8/00:08:00:10:10/00:00:00:00:00/e6 tag 0 dma 4096 in
res 51/40:08:00:10:10/00:00:00:00:00/e6 Emask 0x9 (media error)
[168650.462537] ata1.00: status: { DRDY ERR }
[168650.465714] ata1.00: error: { UNC }
[168650.486063] ata1.00: configured for UDMA/100
[168650.499326] ata1.01: configured for UDMA/66
[168650.499344] sd 0:0:0:0: [sda] Unhandled sense code
[168650.499348] sd 0:0:0:0: [sda]
[168650.499350] Result: hostbyte=0x00 driverbyte=0x08
[168650.499353] sd 0:0:0:0: [sda]
[168650.499355] Sense Key : 0x3 [current] [descriptor]
[168650.499360] Descriptor sense data with sense descriptors (in hex):
[168650.499362] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[168650.499373] 06 10 10 00
[168650.499378] sd 0:0:0:0: [sda]
[168650.499380] ASC=0x11 ASCQ=0x4
[168650.499383] sd 0:0:0:0: [sda] CDB:
[168650.499385] cdb[0]=0x28: 28 00 06 10 10 00 00 00 08 00
[168650.499396] end_request: I/O error, dev sda, sector 101715968
[168650.502757] Buffer I/O error on device dm-3, logical block 0
[168650.506189] ata1: EH complete
[168798.816025] usb 9-2: new high-speed USB device number 2 using ehci-pci
这只是日志的结尾,错误开始的地方,因为我达到了帖子限制。要了解整个内容,请查看pastebin。
很抱歉没有提供具体信息,但我现在不在受影响的盒子前。
答案1
从您提供的额外信息来看,听起来您的驱动器损坏(坏块)。如果您愿意,可以尝试解决这些问题,但我会认真考虑更换驱动器。
如果您想解决该问题,基本上您必须找到位于坏块顶部的 LVM 物理盘区,并将这些物理盘区添加到不得使用的逻辑卷中。
实际上 linux-lvm 邮件列表上有一个关于这个主题的相当新的电子邮件链(我读过整个链,它包含很多信息):
https://www.redhat.com/archives/linux-lvm/2012-November/msg00033.html
在此特定消息中,似乎有人创建了一个 python 脚本来帮助完成该任务:
https://www.redhat.com/archives/linux-lvm/2012-November/msg00038.html
在帮助过处于这种情况(互联网至少可以正常工作)的人们之后,我使用了附加的脚本来帮助查找受影响的 LV 和文件。
#!/usr/bin/python
# Identify partition, LV, file containing a sector
# Copyright (C) 2010,2012 Stuart D. Gathman
# Shared under GNU Public License v2 or later
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
import sys
from subprocess import Popen,PIPE
ID_LVM = 0x8e
ID_LINUX = 0x83
ID_EXT = 0x05
ID_RAID = 0xfd
def idtoname(id):
if id == ID_LVM: return "Linux LVM"
if id == ID_LINUX: return "Linux Filesystem"
if id == ID_EXT: return "Extended Partition"
if id == ID_RAID: return "Software RAID"
return hex(id)
class Segment(object):
__slots__ = ('pe1st','pelst','lvpath','le1st','lelst')
def __init__(self,pe1st,pelst):
self.pe1st = pe1st;
self.pelst = pelst;
def __str__(self):
return "Seg:%d-%d:%s:%d-%d" % (
self.pe1st,self.pelst,self.lvpath,self.le1st,self.lelst)
def cmdoutput(cmd):
p = Popen(cmd, shell=True, stdout=PIPE)
try:
for ln in p.stdout:
yield ln
finally:
p.stdout.close()
p.wait()
def icheck(fs,blk):
"Return inum from block number, or 0 if free space."
for ln in cmdoutput("debugfs -R 'icheck %d' '%s' 2>/dev/null"%(blk,fs)):
b,i = ln.strip().split(None,1)
if not b[0].isdigit(): continue
if int(b) == blk:
if i.startswith('<'):
return 0
return int(i)
raise ValueError('%s: invalid block: %d'%(fs,blk))
def ncheck(fs,inum):
"Return filename from inode number, or None if not linked."
for ln in cmdoutput("debugfs -R 'ncheck %d' '%s' 2>/dev/null"%(inum,fs)):
i,n = ln.strip().split(None,1)
if not i[0].isdigit(): continue
if int(i) == inum:
return n
return None
def blkid(fs):
"Return dictionary of block device attributes"
d = {}
for ln in cmdoutput("blkid -o export '%s'"%fs):
k,v = ln.strip().split('=',1)
d[k] = v
return d
def getpvmap(pv):
pe_start = 192 * 2
pe_size = None
seg = None
segs = []
for ln in cmdoutput("pvdisplay --units k -m %s"%pv):
a = ln.strip().split()
if not a: continue
if a[0] == 'Physical' and a[4].endswith(':'):
pe1st = int(a[2])
pelst = int(a[4][:-1])
seg = Segment(pe1st,pelst)
elif seg and a[0] == 'Logical':
if a[1] == 'volume':
seg.lvpath = a[2]
elif a[1] == 'extents':
seg.le1st = int(a[2])
seg.lelst = int(a[4])
segs.append(seg)
elif a[0] == 'PE' and a[1] == 'Size':
if a[2] == "(KByte)":
pe_size = int(a[3]) * 2
elif a[3] == 'KiB':
pe_size = int(float(a[2])) * 2
if segs:
for ln in cmdoutput("pvs --units k -o+pe_start %s"%pv):
a = ln.split()
if a[0] == pv:
lst = a[-1]
if lst.lower().endswith('k'):
pe_start = int(float(lst[:-1]))*2
return pe_start,pe_size,segs
return None
def findlv(pv,sect):
res = getpvmap(pv)
if not res: return None
pe_start,pe_size,m = res
if sect < pe_start:
raise Exception("Bad sector in PV metadata area")
pe = int((sect - pe_start)/pe_size)
pebeg = pe * pe_size + pe_start
peoff = sect - pebeg
for s in m:
if s.pe1st <= pe <= s.pelst:
le = s.le1st + pe - s.pe1st
return s.lvpath,le * pe_size + peoff
def getmdmap():
with open('/proc/mdstat','rt') as fp:
m = []
for ln in fp:
if ln.startswith('md'):
a = ln.split(':')
raid = a[0].strip()
devs = []
a = a[1].split()
for d in a[2:]:
devs.append(d.split('[')[0])
m.append((raid,devs))
return m
def parse_sfdisk(s):
for ln in s:
try:
part,desc = ln.split(':')
if part.startswith('/dev/'):
d = {}
for p in desc.split(','):
name,val = p.split('=')
name = name.strip()
if name.lower() == 'id':
d[name] = int(val,16)
else:
d[name] = int(val)
yield part.strip(),d
except ValueError:
continue
def findpart(wd,lba):
s = cmdoutput("sfdisk -d %s"%wd)
parts = [ (part,d['start'],d['size'],d['Id']) for part,d in parse_sfdisk(s) ]
for part,start,sz,Id in parts:
if Id == ID_EXT: continue
if start <= lba < start + sz:
return part,lba - start,Id
return None
if __name__ == '__main__':
wd = sys.argv[1]
lba = int(sys.argv[2])
print wd,lba,"Whole Disk"
res = findpart(wd,lba)
if not res:
print "LBA is outside any partition"
sys.exit(1)
part,sect,Id = res
print part,sect,idtoname(Id)
if Id == ID_LVM:
bd,sect = findlv(part,sect)
# FIXME: problems if LV is snapshot
elif Id == ID_LINUX:
bd = part
else:
if Id == ID_RAID:
for md,devs in getmdmap():
for dev in devs:
if part == "/dev/"+dev:
part = "/dev/"+md
break
else: continue
break
res = findlv(part,sect)
if res:
print "PV =",part
bd,sect = res
else:
bd = part
blksiz = 4096
blk = int(sect * 512 / blksiz)
p = blkid(bd)
try:
t = p['TYPE']
except:
print bd,p
raise
print "fs=%s block=%d %s"%(bd,blk,t)
if t.startswith('ext'):
inum = icheck(bd,blk)
if inum:
fn = ncheck(bd,inum)
print "file=%s inum=%d"%(fn,inum)
else:
print "<free space>"