TrueCrypt 正在破坏 mdadm 卷

TrueCrypt 正在破坏 mdadm 卷

我正在使用与现有服务器相​​同的存储配置来设置新的文件服务器,但该过程失败,我无法弄清楚原因。我的目标是在 RAID 10 卷之上创建 TrueCrypt 卷。但是,当我启动时truecrypt -c,它会破坏 RAID 卷。同样的过程适用于我以前的服务器,所以我不确定发生了什么。

我的程序:

# create a data partition on each disk (/dev/sdb, /dev/sdc, /dev/sdd, /dev/sde):
fdisk /dev/sdX
new, p, 1, 4096, 2930273071, type, da, write

# combine data partitions into raid10 array:
mdadm --create /dev/md0 -v --raid-devices=4 --chunk=512 --level=raid10 --layout=f2 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

# create a truecrypt volume on the new data partition /dev/md0:
truecrypt -c /dev/md0

truecrypt 启动后不久,我从 mdadm 收到一个或多个磁盘的组件故障:

$ cat /proc/mdstat; echo; mdadm --misc --detail /dev/md0 

Personalities : [raid10]
md0 : active raid10 sdd1[4] sdc1[1] sde1[3] sdb1[0](F)
      2930006016 blocks super 1.2 512K chunks 2 far-copies [4/3] [_UUU]

unused devices: <none>

/dev/md0:
        Version : 1.2
  Creation Time : Fri Sep 21 14:27:31 2012
     Raid Level : raid10
     Array Size : 2930006016 (2794.27 GiB 3000.33 GB)
  Used Dev Size : 1465003008 (1397.14 GiB 1500.16 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Tue Sep 25 10:54:06 2012
          State : active, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0

         Layout : far=2
     Chunk Size : 512K

           Name : emma:0  (local to host emma)
           UUID : 21c2f9b7:923dacab:805375f8:96a2959b
         Events : 33268

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       33        1      active sync   /dev/sdc1
       4       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1

       0       8       17        -      faulty spare   /dev/sdb1

dmesg 给出了这些错误消息:

[326876.652057] ata3.00: status: { DRDY ERR }
[326876.652801] ata3.00: error: { ABRT }
[326876.653543] ata3.00: failed command: WRITE FPDMA QUEUED
[326876.654301] ata3.00: cmd 61/80:f0:80:f6:58/00:00:57:00:00/40 tag 30 ncq 65536 out
[326876.654301]          res 41/04:00:00:00:00/04:00:00:00:00/00 Emask 0x1 (device error)
[326876.655812] ata3.00: status: { DRDY ERR }
[326876.656563] ata3.00: error: { ABRT }
[326876.657326] ata3: hard resetting link
[326876.657328] ata3: nv: skipping hardreset on occupied port
[326877.124117] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[326877.138346] ata3.00: configured for UDMA/133
[326877.138397] sd 2:0:0:0: [sdb]
[326877.138399] Result: hostbyte=0x00 driverbyte=0x08
[326877.138402] sd 2:0:0:0: [sdb]
[326877.138404] Sense Key : 0xb [current] [descriptor]
[326877.138408] Descriptor sense data with sense descriptors (in hex):
[326877.138411]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[326877.138423]         00 00 00 00
[326877.138428] sd 2:0:0:0: [sdb]
[326877.138430] ASC=0x0 ASCQ=0x0
[326877.138434] sd 2:0:0:0: [sdb] CDB:
[326877.138435] cdb[0]=0x2a: 2a 00 57 58 f0 80 00 00 80 00
[326877.138446] end_request: I/O error, dev sdb, sector 1465446528
[326877.138844] sd 2:0:0:0: [sdb]
[326877.138846] Result: hostbyte=0x00 driverbyte=0x08
[326877.138847] sd 2:0:0:0: [sdb]
[326877.138849] Sense Key : 0xb [current] [descriptor]
[326877.138851] Descriptor sense data with sense descriptors (in hex):
[326877.138852]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[326877.138860]         00 00 00 00
[326877.138864] sd 2:0:0:0: [sdb]
[326877.138865] ASC=0x0 ASCQ=0x0
[326877.138867] sd 2:0:0:0: [sdb] CDB:
[326877.138868] cdb[0]=0x2a: 2a 00 57 58 f1 00 00 00 80 00
[326877.138875] end_request: I/O error, dev sdb, sector 1465446656
[326877.139208] sd 2:0:0:0: [sdb]
[326877.139210] Result: hostbyte=0x00 driverbyte=0x08
[326877.139212] sd 2:0:0:0: [sdb]
[326877.139213] Sense Key : 0xb [current] [descriptor]
[326877.139215] Descriptor sense data with sense descriptors (in hex):
[326877.139217]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[326877.139224]         00 00 00 00
...
[326877.155726] sd 2:0:0:0: [sdb]
[326877.155727] ASC=0x0 ASCQ=0x0
[326877.155729] sd 2:0:0:0: [sdb] CDB:
[326877.155730] cdb[0]=0x2a: 2a 00 57 58 f6 80 00 00 80 00
[326877.155736] end_request: I/O error, dev sdb, sector 1465448064
[326877.155987] ata3: EH complete
[326877.281684] md/raid10:md0: Disk failure on sdb1, disabling device.
[326877.281684] md/raid10:md0: Operation continuing on 3 devices.
[326877.801033] RAID10 conf printout:
[326877.801038]  --- wd:3 rd:4
[326877.801040]  disk 0, wo:1, o:0, dev:sdb1
[326877.801042]  disk 1, wo:0, o:1, dev:sdc1
[326877.801044]  disk 2, wo:0, o:1, dev:sdd1
[326877.801046]  disk 3, wo:0, o:1, dev:sde1
[326877.801071] RAID10 conf printout:
[326877.801074]  --- wd:3 rd:4
[326877.801076]  disk 1, wo:0, o:1, dev:sdc1
[326877.801078]  disk 2, wo:0, o:1, dev:sdd1
[326877.801079]  disk 3, wo:0, o:1, dev:sde1
[326899.233166] ata4: EH in SWNCQ mode,QC:qc_active 0x7 sactive 0x7
[326899.233384] ata4: SWNCQ:qc_active 0x1 defer_bits 0x6 last_issue_tag 0x0
[326899.233384]   dhfis 0x1 dmafis 0x0 sdbfis 0x0
[326899.233643] ata4: ATA_REG 0x41 ERR_REG 0x4
[326899.233775] ata4: tag : dhfis dmafis sdbfis sactive
[326899.234078] ata4: tag 0x0: 1 0 0 1
[326899.234458] ata4.00: exception Emask 0x1 SAct 0x7 SErr 0x0 action 0x6 frozen
[326899.234843] ata4.00: Ata error. fis:0x41
[326899.235230] ata4.00: failed command: WRITE FPDMA QUEUED
[326899.235617] ata4.00: cmd 61/80:00:80:e0:5b/00:00:57:00:00/40 tag 0 ncq 65536 out
[326899.235617]          res 41/04:00:00:00:00/04:00:00:00:00/00 Emask 0x1 (device error)
[326899.236423] ata4.00: status: { DRDY ERR }
[326899.236818] ata4.00: error: { ABRT }
[326899.237200] ata4.00: failed command: WRITE FPDMA QUEUED
[326899.237609] ata4.00: cmd 61/80:08:00:e1:5b/00:00:57:00:00/40 tag 1 ncq 65536 out
[326899.237609]          res 41/04:00:00:00:00/04:00:00:00:00/00 Emask 0x1 (device error)
[326899.238428] ata4.00: status: { DRDY ERR }
[326899.238865] ata4.00: error: { ABRT }
[326899.239288] ata4.00: failed command: WRITE FPDMA QUEUED
[326899.239730] ata4.00: cmd 61/80:10:80:e1:5b/00:00:57:00:00/40 tag 2 ncq 65536 out
[326899.239730]          res 41/04:00:00:00:00/04:00:00:00:00/00 Emask 0x1 (device error)
[326899.240682] ata4.00: status: { DRDY ERR }
[326899.241162] ata4.00: error: { ABRT }
[326899.241653] ata4: hard resetting link
[326899.241654] ata4: nv: skipping hardreset on occupied port
[326899.760685] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[326899.774644] ata4.00: configured for UDMA/133
[326899.774695] sd 3:0:0:0: [sdc]
[326899.774698] Result: hostbyte=0x00 driverbyte=0x08
[326899.774700] sd 3:0:0:0: [sdc]
[326899.774702] Sense Key : 0xb [current] [descriptor]
[326899.774707] Descriptor sense data with sense descriptors (in hex):
[326899.774709]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[326899.774721]         00 00 00 00
[326899.774727] sd 3:0:0:0: [sdc]
[326899.774728] ASC=0x0 ASCQ=0x0
[326899.774732] sd 3:0:0:0: [sdc] CDB:
[326899.774734] cdb[0]=0x2a: 2a 00 57 5b e0 80 00 00 80 00
[326899.774744] end_request: I/O error, dev sdc, sector 1465639040
[326899.775097] sd 3:0:0:0: [sdc]
[326899.775098] Result: hostbyte=0x00 driverbyte=0x08
[326899.775100] sd 3:0:0:0: [sdc]
[326899.775102] Sense Key : 0xb [current] [descriptor]
[326899.775104] Descriptor sense data with sense descriptors (in hex):
[326899.775105]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[326899.775113]         00 00 00 00
[326899.775117] sd 3:0:0:0: [sdc]
[326899.775118] ASC=0x0 ASCQ=0x0
[326899.775120] sd 3:0:0:0: [sdc] CDB:
[326899.775121] cdb[0]=0x2a: 2a 00 57 5b e1 00 00 00 80 00
[326899.775128] end_request: I/O error, dev sdc, sector 1465639168
[326899.775404] sd 3:0:0:0: [sdc]
[326899.775405] Result: hostbyte=0x00 driverbyte=0x08
[326899.775407] sd 3:0:0:0: [sdc]
[326899.775408] Sense Key : 0xb [current] [descriptor]
[326899.775410] Descriptor sense data with sense descriptors (in hex):
[326899.775412]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[326899.775412]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[326899.775420]         00 00 00 00
[326899.775423] sd 3:0:0:0: [sdc]
[326899.775424] ASC=0x0 ASCQ=0x0
[326899.775427] sd 3:0:0:0: [sdc] CDB:
[326899.775428] cdb[0]=0x2a: 2a 00 57 5b e1 80 00 00 80 00
[326899.775434] end_request: I/O error, dev sdc, sector 1465639296
[326899.775691] ata4: EH complete
[326899.830768] Buffer I/O error on device md0p1, logical block 1474688
[326899.830965] lost page write due to I/O error on md0p1
[326899.831257] Buffer I/O error on device md0p1, logical block 1474689
[326899.831419] lost page write due to I/O error on md0p1
[326899.831424] Buffer I/O error on device md0p1, logical block 1474690
[326899.831585] lost page write due to I/O error on md0p1
[326899.831589] Buffer I/O error on device md0p1, logical block 1474691
[326899.831751] lost page write due to I/O error on md0p1

这不是真正的磁盘故障,因为 a) smartd 没有发现磁盘有任何问题,b) 我可以在每个磁盘上单独创建一个完整的 TrueCrypt 卷。

我还尝试在 /dev/md0 (83/Linux 和 da/Non-FS 数据)上创建一个分区,然后在分区 /dev/md0p1 (此 dmesg 输出来自哪里)上创建一个 TrueCrypt 卷,但这并不工作也可以。

我假设 TrueCrypt 正在以某种方式破坏 mdadm 重要的元数据。但奇怪的是,这个程序以前运行良好。这里发生了什么?

[root@emma]# uname -a
Linux emma 3.5.4-1-ARCH #1 SMP PREEMPT Sat Sep 15 08:12:04 CEST 2012 x86_64 GNU/Linux
[root@emma]# mdadm --version
mdadm - v3.2.5 - 18th May 2012
[root@emma]# truecrypt --version
TrueCrypt 7.1a
[root@emma]# fdisk -l

Disk /dev/sda: 160.0 GB, 160040803840 bytes
255 heads, 63 sectors/track, 19457 cylinders, total 312579695 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xe256e256

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *          63      208844      104391   83  Linux
/dev/sda2          208845      738989      265072+  82  Linux swap / Solaris
/dev/sda3          738990    62187614    30724312+  83  Linux
/dev/sda4        62187615   312579694   125196040   83  Linux

Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
16 heads, 62 sectors/track, 2953908 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0xcbb904fc

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            4096  2930273071  1465134488   da  Non-FS data

Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
16 heads, 62 sectors/track, 2953908 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x6978c214

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1            4096  2930273071  1465134488   da  Non-FS data

Disk /dev/sdd: 1500.3 GB, 1500301910016 bytes
16 heads, 62 sectors/track, 2953908 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x8dd1e314

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1            4096  2930273071  1465134488   da  Non-FS data

Disk /dev/sde: 1500.3 GB, 1500301910016 bytes
16 heads, 62 sectors/track, 2953908 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x70b7ece7

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1            4096  2930273071  1465134488   da  Non-FS data

Disk /dev/md0: 3000.3 GB, 3000326160384 bytes
2 heads, 3 sectors/track, 976668672 cylinders, total 5860012032 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 524288 bytes / 2097152 bytes
Disk identifier: 0xce6d6f88

    Device Boot      Start         End      Blocks   Id  System
/dev/md0p1            4096  4294967294  2147481599+  83  Linux

编辑:当这个过程起作用时,它可能使用的是 TrueCrypt 6——我将尝试使用 6 看看会发生什么。我将更新结果...

答案1

这些错误消息读起来确实像磁盘错误,而不是元数据损坏。它们来自 libata,而不是 mdraid。

它们可能与实际情况没有问题磁盘尽管。例如,可能是 SATA 驱动程序错误、SATA 控制器有缺陷、连接器损坏、电缆损坏等。

由于 I/O 模式不同,您可能只能在 mdraid 阵列上创建时看到它。但我非常有信心,即使你有其他东西可以工作,它也不会稳定,因为你实际上有不稳定的驱动程序或硬件。

顺便说一句:什么smartctl -xsmartctl -a说了什么?它有 SATA 错误计数器吗?

相关内容