我是否又丢失了 RAID?

我是否又丢失了 RAID?

一点历史:两年前,我非常兴奋地发现 mdadm 非常强大,甚至可以重塑阵列,因此您可以从较小的阵列开始,然后根据需要扩大它。我买了 3x1Tb 驱动器并做了一个 RAID-5。一年来一直很好。

然后我又买了 2 块,并尝试将 5 块硬盘中的一块重新整形为 RAID-6,但由于超级块版本出现问题,所有内容都丢失了。不得不从头开始重建,但 2TB 的数据都丢失了。

昨天我又买了 2 个驱动器,这次我什么都有了:正确构建的阵列、UPS。我已禁用写入意图映射,添加了 2 个新驱动器作为备用驱动器,并运行命令将阵列扩展到 7 个磁盘。

它开始工作了,但速度慢得离谱,大约 100kb/秒。在以如此惊人的速度处理完前 37Mb 后,其中一个旧硬盘出现故障。我正确关闭了 PC 并断开了故障驱动器的连接。启动后,它似乎重新创建了意图映射,因为它仍在 mdadm 配置中,所以我将其从配置中删除并再次重新启动。

现在我看到的是所有 mdadm 进程都死锁了,并且什么也不做。

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1937 root      20   0 12992  608  444 D    0  0.1   0:00.00 mdadm
 2283 root      20   0 12992  852  704 D    0  0.1   0:00.01 mdadm
 2287 root      20   0     0    0    0 D    0  0.0   0:00.01 md0_reshape
 2288 root      18  -2 12992  820  676 D    0  0.1   0:00.01 mdadm

我在 mdstat 中看到的是:

$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid6 sdb1[1] sdg1[4] sdf1[7] sde1[6] sdd1[0] sdc1[5]
      2929683456 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [7/6] [UU_UUUU]
      [>....................]  reshape =  0.0% (37888/976561152) finish=567604147.2min speed=0K/sec

我已经尝试过 mdadm 2.6.7、3.1.4 和 3.2 - 都无济于事。我又丢失数据了吗?有什么建议可以解决这个问题吗?

操作系统是 Ubuntu Server 10.04.2。

PS. 不用说,数据无法访问 - 我无法挂载 /dev/md0 来保存最有价值的数据。

您可以看到我的失望 - 我所兴奋的那个具体的东西失败了两次,丢失了我的 5Tb 数据。

更新:看来 kern.log 中有一些有用的信息:

21:38:48 ...: [  166.522055] raid5: reshape will continue
21:38:48 ...: [  166.522085] raid5: device sdb1 operational as raid disk 1
21:38:48 ...: [  166.522091] raid5: device sdg1 operational as raid disk 4
21:38:48 ...: [  166.522097] raid5: device sdf1 operational as raid disk 5
21:38:48 ...: [  166.522102] raid5: device sde1 operational as raid disk 6
21:38:48 ...: [  166.522107] raid5: device sdd1 operational as raid disk 0
21:38:48 ...: [  166.522111] raid5: device sdc1 operational as raid disk 3
21:38:48 ...: [  166.523942] raid5: allocated 7438kB for md0
21:38:48 ...: [  166.524041] 1: w=1 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524050] 4: w=2 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524056] 5: w=3 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524062] 6: w=4 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524068] 0: w=5 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524073] 3: w=6 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524079] raid5: raid level 6 set md0 active with 6 out of 7 devices, algorithm 2
21:38:48 ...: [  166.524519] RAID5 conf printout:
21:38:48 ...: [  166.524523]  --- rd:7 wd:6
21:38:48 ...: [  166.524528]  disk 0, o:1, dev:sdd1
21:38:48 ...: [  166.524532]  disk 1, o:1, dev:sdb1
21:38:48 ...: [  166.524537]  disk 3, o:1, dev:sdc1
21:38:48 ...: [  166.524541]  disk 4, o:1, dev:sdg1
21:38:48 ...: [  166.524545]  disk 5, o:1, dev:sdf1
21:38:48 ...: [  166.524550]  disk 6, o:1, dev:sde1
21:38:48 ...: [  166.524553] ...ok start reshape thread
21:38:48 ...: [  166.524727] md0: detected capacity change from 0 to 2999995858944
21:38:48 ...: [  166.524735] md: reshape of RAID array md0
21:38:48 ...: [  166.524740] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
21:38:48 ...: [  166.524745] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
21:38:48 ...: [  166.524756] md: using 128k window, over a total of 976561152 blocks.
21:39:05 ...: [  166.525013]  md0:
21:42:04 ...: [  362.520063] INFO: task mdadm:1937 blocked for more than 120 seconds.
21:42:04 ...: [  362.520068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:42:04 ...: [  362.520073] mdadm         D 00000000ffffffff     0  1937      1 0x00000000
21:42:04 ...: [  362.520083]  ffff88002ef4f5d8 0000000000000082 0000000000015bc0 0000000000015bc0
21:42:04 ...: [  362.520092]  ffff88002eb5b198 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5ade0
21:42:04 ...: [  362.520100]  0000000000015bc0 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5b198
21:42:04 ...: [  362.520107] Call Trace:
21:42:04 ...: [  362.520133]  [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456]
21:42:04 ...: [  362.520148]  [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20
21:42:04 ...: [  362.520159]  [<ffffffffa0228413>] make_request+0x243/0x4b0 [raid456]
21:42:04 ...: [  362.520169]  [<ffffffffa0221a90>] ? release_stripe+0x50/0x70 [raid456]
21:42:04 ...: [  362.520179]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:42:04 ...: [  362.520188]  [<ffffffff81414df0>] md_make_request+0xc0/0x130
21:42:04 ...: [  362.520194]  [<ffffffff81414df0>] ? md_make_request+0xc0/0x130
21:42:04 ...: [  362.520205]  [<ffffffff8129f8c1>] generic_make_request+0x1b1/0x4f0
21:42:04 ...: [  362.520214]  [<ffffffff810f6515>] ? mempool_alloc_slab+0x15/0x20
21:42:04 ...: [  362.520222]  [<ffffffff8116c2ec>] ? alloc_buffer_head+0x1c/0x60
21:42:04 ...: [  362.520230]  [<ffffffff8129fc80>] submit_bio+0x80/0x110
21:42:04 ...: [  362.520236]  [<ffffffff8116c849>] submit_bh+0xf9/0x140
21:42:04 ...: [  362.520244]  [<ffffffff8116f124>] block_read_full_page+0x274/0x3b0
21:42:04 ...: [  362.520251]  [<ffffffff81172c90>] ? blkdev_get_block+0x0/0x70
21:42:04 ...: [  362.520258]  [<ffffffff8110d875>] ? __inc_zone_page_state+0x35/0x40
21:42:04 ...: [  362.520265]  [<ffffffff810f46d8>] ? add_to_page_cache_locked+0xe8/0x160
21:42:04 ...: [  362.520272]  [<ffffffff81173d78>] blkdev_readpage+0x18/0x20
21:42:04 ...: [  362.520279]  [<ffffffff810f484b>] __read_cache_page+0x7b/0xe0
21:42:04 ...: [  362.520285]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:42:04 ...: [  362.520290]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:42:04 ...: [  362.520297]  [<ffffffff810f57dc>] do_read_cache_page+0x3c/0x120
21:42:04 ...: [  362.520304]  [<ffffffff810f5909>] read_cache_page_async+0x19/0x20
21:42:04 ...: [  362.520310]  [<ffffffff810f591e>] read_cache_page+0xe/0x20
21:42:04 ...: [  362.520317]  [<ffffffff811a6cb0>] read_dev_sector+0x30/0xa0
21:42:04 ...: [  362.520324]  [<ffffffff811a7fcd>] amiga_partition+0x6d/0x460
21:42:04 ...: [  362.520331]  [<ffffffff811a7938>] check_partition+0x138/0x190
21:42:04 ...: [  362.520338]  [<ffffffff811a7a7a>] rescan_partitions+0xea/0x2f0
21:42:04 ...: [  362.520344]  [<ffffffff811744c7>] __blkdev_get+0x267/0x3d0
21:42:04 ...: [  362.520350]  [<ffffffff81174650>] ? blkdev_open+0x0/0xc0
21:42:04 ...: [  362.520356]  [<ffffffff81174640>] blkdev_get+0x10/0x20
21:42:04 ...: [  362.520362]  [<ffffffff811746c1>] blkdev_open+0x71/0xc0
21:42:04 ...: [  362.520369]  [<ffffffff811419f3>] __dentry_open+0x113/0x370
21:42:04 ...: [  362.520377]  [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30
21:42:04 ...: [  362.520385]  [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0
21:42:04 ...: [  362.520391]  [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70
21:42:04 ...: [  362.520398]  [<ffffffff8115207a>] do_filp_open+0x2da/0xba0
21:42:04 ...: [  362.520406]  [<ffffffff811134a8>] ? unmap_vmas+0x178/0x310
21:42:04 ...: [  362.520414]  [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150
21:42:04 ...: [  362.520421]  [<ffffffff81141769>] do_sys_open+0x69/0x170
21:42:04 ...: [  362.520428]  [<ffffffff811418b0>] sys_open+0x20/0x30
21:42:04 ...: [  362.520437]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:42:04 ...: [  362.520446] INFO: task mdadm:2283 blocked for more than 120 seconds.
21:42:04 ...: [  362.520450] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:42:04 ...: [  362.520454] mdadm         D 0000000000000000     0  2283   2212 0x00000000
21:42:04 ...: [  362.520462]  ffff88002cca7d98 0000000000000086 0000000000015bc0 0000000000015bc0
21:42:04 ...: [  362.520470]  ffff88002ededf78 ffff88002cca7fd8 0000000000015bc0 ffff88002ededbc0
21:42:04 ...: [  362.520478]  0000000000015bc0 ffff88002cca7fd8 0000000000015bc0 ffff88002ededf78
21:42:04 ...: [  362.520485] Call Trace:
21:42:04 ...: [  362.520495]  [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180
21:42:04 ...: [  362.520502]  [<ffffffff8154397b>] mutex_lock+0x2b/0x50
21:42:04 ...: [  362.520508]  [<ffffffff8117404d>] __blkdev_put+0x3d/0x190
21:42:04 ...: [  362.520514]  [<ffffffff811741b0>] blkdev_put+0x10/0x20
21:42:04 ...: [  362.520520]  [<ffffffff811741f3>] blkdev_close+0x33/0x60
21:42:04 ...: [  362.520527]  [<ffffffff81145375>] __fput+0xf5/0x210
21:42:04 ...: [  362.520534]  [<ffffffff811454b5>] fput+0x25/0x30
21:42:04 ...: [  362.520540]  [<ffffffff811415ad>] filp_close+0x5d/0x90
21:42:04 ...: [  362.520546]  [<ffffffff81141697>] sys_close+0xb7/0x120
21:42:04 ...: [  362.520553]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:42:04 ...: [  362.520559] INFO: task md0_reshape:2287 blocked for more than 120 seconds.
21:42:04 ...: [  362.520563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:42:04 ...: [  362.520567] md0_reshape   D ffff88003aee96f0     0  2287      2 0x00000000
21:42:04 ...: [  362.520575]  ffff88003cf05a70 0000000000000046 0000000000015bc0 0000000000015bc0
21:42:04 ...: [  362.520582]  ffff88003aee9aa8 ffff88003cf05fd8 0000000000015bc0 ffff88003aee96f0
21:42:04 ...: [  362.520590]  0000000000015bc0 ffff88003cf05fd8 0000000000015bc0 ffff88003aee9aa8
21:42:04 ...: [  362.520597] Call Trace:
21:42:04 ...: [  362.520608]  [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456]
21:42:04 ...: [  362.520616]  [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20
21:42:04 ...: [  362.520626]  [<ffffffffa0226f80>] reshape_request+0x4c0/0x9a0 [raid456]
21:42:04 ...: [  362.520634]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:42:04 ...: [  362.520644]  [<ffffffffa022777a>] sync_request+0x31a/0x3a0 [raid456]
21:42:04 ...: [  362.520651]  [<ffffffff81052713>] ? __wake_up+0x53/0x70
21:42:04 ...: [  362.520658]  [<ffffffff814156b1>] md_do_sync+0x621/0xbb0
21:42:04 ...: [  362.520668]  [<ffffffff810387b9>] ? default_spin_lock_flags+0x9/0x10
21:42:04 ...: [  362.520675]  [<ffffffff8141640c>] md_thread+0x5c/0x130
21:42:04 ...: [  362.520681]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:42:04 ...: [  362.520688]  [<ffffffff814163b0>] ? md_thread+0x0/0x130
21:42:04 ...: [  362.520694]  [<ffffffff81084416>] kthread+0x96/0xa0
21:42:04 ...: [  362.520701]  [<ffffffff810131ea>] child_rip+0xa/0x20
21:42:04 ...: [  362.520707]  [<ffffffff81084380>] ? kthread+0x0/0xa0
21:42:04 ...: [  362.520713]  [<ffffffff810131e0>] ? child_rip+0x0/0x20
21:42:04 ...: [  362.520718] INFO: task mdadm:2288 blocked for more than 120 seconds.
21:42:04 ...: [  362.520721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:42:04 ...: [  362.520725] mdadm         D 0000000000000000     0  2288      1 0x00000000
21:42:04 ...: [  362.520733]  ffff88002cca9c18 0000000000000086 0000000000015bc0 0000000000015bc0
21:42:04 ...: [  362.520741]  ffff88003aee83b8 ffff88002cca9fd8 0000000000015bc0 ffff88003aee8000
21:42:04 ...: [  362.520748]  0000000000015bc0 ffff88002cca9fd8 0000000000015bc0 ffff88003aee83b8
21:42:04 ...: [  362.520755] Call Trace:
21:42:04 ...: [  362.520763]  [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180
21:42:04 ...: [  362.520771]  [<ffffffff812a6d50>] ? exact_match+0x0/0x10
21:42:04 ...: [  362.520777]  [<ffffffff8154397b>] mutex_lock+0x2b/0x50
21:42:04 ...: [  362.520783]  [<ffffffff811742c8>] __blkdev_get+0x68/0x3d0
21:42:04 ...: [  362.520790]  [<ffffffff81174650>] ? blkdev_open+0x0/0xc0
21:42:04 ...: [  362.520795]  [<ffffffff81174640>] blkdev_get+0x10/0x20
21:42:04 ...: [  362.520801]  [<ffffffff811746c1>] blkdev_open+0x71/0xc0
21:42:04 ...: [  362.520808]  [<ffffffff811419f3>] __dentry_open+0x113/0x370
21:42:04 ...: [  362.520815]  [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30
21:42:04 ...: [  362.520821]  [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0
21:42:04 ...: [  362.520828]  [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70
21:42:04 ...: [  362.520834]  [<ffffffff8115207a>] do_filp_open+0x2da/0xba0
21:42:04 ...: [  362.520841]  [<ffffffff810ff0e1>] ? lru_cache_add_lru+0x21/0x40
21:42:04 ...: [  362.520848]  [<ffffffff8111109c>] ? do_anonymous_page+0x11c/0x330
21:42:04 ...: [  362.520855]  [<ffffffff81115d5f>] ? handle_mm_fault+0x31f/0x3c0
21:42:04 ...: [  362.520862]  [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150
21:42:04 ...: [  362.520868]  [<ffffffff81141769>] do_sys_open+0x69/0x170
21:42:04 ...: [  362.520874]  [<ffffffff811418b0>] sys_open+0x20/0x30
21:42:04 ...: [  362.520882]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:44:04 ...: [  482.520065] INFO: task mdadm:1937 blocked for more than 120 seconds.
21:44:04 ...: [  482.520071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:44:04 ...: [  482.520077] mdadm         D 00000000ffffffff     0  1937      1 0x00000000
21:44:04 ...: [  482.520087]  ffff88002ef4f5d8 0000000000000082 0000000000015bc0 0000000000015bc0
21:44:04 ...: [  482.520096]  ffff88002eb5b198 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5ade0
21:44:04 ...: [  482.520104]  0000000000015bc0 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5b198
21:44:04 ...: [  482.520112] Call Trace:
21:44:04 ...: [  482.520139]  [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456]
21:44:04 ...: [  482.520154]  [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20
21:44:04 ...: [  482.520165]  [<ffffffffa0228413>] make_request+0x243/0x4b0 [raid456]
21:44:04 ...: [  482.520175]  [<ffffffffa0221a90>] ? release_stripe+0x50/0x70 [raid456]
21:44:04 ...: [  482.520185]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:44:04 ...: [  482.520194]  [<ffffffff81414df0>] md_make_request+0xc0/0x130
21:44:04 ...: [  482.520201]  [<ffffffff81414df0>] ? md_make_request+0xc0/0x130
21:44:04 ...: [  482.520212]  [<ffffffff8129f8c1>] generic_make_request+0x1b1/0x4f0
21:44:04 ...: [  482.520221]  [<ffffffff810f6515>] ? mempool_alloc_slab+0x15/0x20
21:44:04 ...: [  482.520229]  [<ffffffff8116c2ec>] ? alloc_buffer_head+0x1c/0x60
21:44:04 ...: [  482.520237]  [<ffffffff8129fc80>] submit_bio+0x80/0x110
21:44:04 ...: [  482.520244]  [<ffffffff8116c849>] submit_bh+0xf9/0x140
21:44:04 ...: [  482.520252]  [<ffffffff8116f124>] block_read_full_page+0x274/0x3b0
21:44:04 ...: [  482.520258]  [<ffffffff81172c90>] ? blkdev_get_block+0x0/0x70
21:44:04 ...: [  482.520266]  [<ffffffff8110d875>] ? __inc_zone_page_state+0x35/0x40
21:44:04 ...: [  482.520273]  [<ffffffff810f46d8>] ? add_to_page_cache_locked+0xe8/0x160
21:44:04 ...: [  482.520280]  [<ffffffff81173d78>] blkdev_readpage+0x18/0x20
21:44:04 ...: [  482.520286]  [<ffffffff810f484b>] __read_cache_page+0x7b/0xe0
21:44:04 ...: [  482.520293]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:44:04 ...: [  482.520299]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:44:04 ...: [  482.520306]  [<ffffffff810f57dc>] do_read_cache_page+0x3c/0x120
21:44:04 ...: [  482.520313]  [<ffffffff810f5909>] read_cache_page_async+0x19/0x20
21:44:04 ...: [  482.520319]  [<ffffffff810f591e>] read_cache_page+0xe/0x20
21:44:04 ...: [  482.520327]  [<ffffffff811a6cb0>] read_dev_sector+0x30/0xa0
21:44:04 ...: [  482.520334]  [<ffffffff811a7fcd>] amiga_partition+0x6d/0x460
21:44:04 ...: [  482.520341]  [<ffffffff811a7938>] check_partition+0x138/0x190
21:44:04 ...: [  482.520348]  [<ffffffff811a7a7a>] rescan_partitions+0xea/0x2f0
21:44:04 ...: [  482.520355]  [<ffffffff811744c7>] __blkdev_get+0x267/0x3d0
21:44:04 ...: [  482.520361]  [<ffffffff81174650>] ? blkdev_open+0x0/0xc0
21:44:04 ...: [  482.520367]  [<ffffffff81174640>] blkdev_get+0x10/0x20
21:44:04 ...: [  482.520373]  [<ffffffff811746c1>] blkdev_open+0x71/0xc0
21:44:04 ...: [  482.520380]  [<ffffffff811419f3>] __dentry_open+0x113/0x370
21:44:04 ...: [  482.520388]  [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30
21:44:04 ...: [  482.520396]  [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0
21:44:04 ...: [  482.520403]  [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70
21:44:04 ...: [  482.520410]  [<ffffffff8115207a>] do_filp_open+0x2da/0xba0
21:44:04 ...: [  482.520417]  [<ffffffff811134a8>] ? unmap_vmas+0x178/0x310
21:44:04 ...: [  482.520426]  [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150
21:44:04 ...: [  482.520432]  [<ffffffff81141769>] do_sys_open+0x69/0x170
21:44:04 ...: [  482.520438]  [<ffffffff811418b0>] sys_open+0x20/0x30
21:44:04 ...: [  482.520447]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:44:04 ...: [  482.520458] INFO: task mdadm:2283 blocked for more than 120 seconds.
21:44:04 ...: [  482.520462] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:44:04 ...: [  482.520467] mdadm         D 0000000000000000     0  2283   2212 0x00000000
21:44:04 ...: [  482.520475]  ffff88002cca7d98 0000000000000086 0000000000015bc0 0000000000015bc0
21:44:04 ...: [  482.520483]  ffff88002ededf78 ffff88002cca7fd8 0000000000015bc0 ffff88002ededbc0
21:44:04 ...: [  482.520490]  0000000000015bc0 ffff88002cca7fd8 0000000000015bc0 ffff88002ededf78
21:44:04 ...: [  482.520498] Call Trace:
21:44:04 ...: [  482.520508]  [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180
21:44:04 ...: [  482.520515]  [<ffffffff8154397b>] mutex_lock+0x2b/0x50
21:44:04 ...: [  482.520521]  [<ffffffff8117404d>] __blkdev_put+0x3d/0x190
21:44:04 ...: [  482.520527]  [<ffffffff811741b0>] blkdev_put+0x10/0x20
21:44:04 ...: [  482.520533]  [<ffffffff811741f3>] blkdev_close+0x33/0x60
21:44:04 ...: [  482.520541]  [<ffffffff81145375>] __fput+0xf5/0x210
21:44:04 ...: [  482.520547]  [<ffffffff811454b5>] fput+0x25/0x30
21:44:04 ...: [  482.520554]  [<ffffffff811415ad>] filp_close+0x5d/0x90
21:44:04 ...: [  482.520560]  [<ffffffff81141697>] sys_close+0xb7/0x120
21:44:04 ...: [  482.520568]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:44:04 ...: [  482.520574] INFO: task md0_reshape:2287 blocked for more than 120 seconds.
21:44:04 ...: [  482.520578] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:44:04 ...: [  482.520582] md0_reshape   D ffff88003aee96f0     0  2287      2 0x00000000
21:44:04 ...: [  482.520590]  ffff88003cf05a70 0000000000000046 0000000000015bc0 0000000000015bc0
21:44:04 ...: [  482.520597]  ffff88003aee9aa8 ffff88003cf05fd8 0000000000015bc0 ffff88003aee96f0
21:44:04 ...: [  482.520605]  0000000000015bc0 ffff88003cf05fd8 0000000000015bc0 ffff88003aee9aa8
21:44:04 ...: [  482.520612] Call Trace:
21:44:04 ...: [  482.520623]  [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456]
21:44:04 ...: [  482.520633]  [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20
21:44:04 ...: [  482.520643]  [<ffffffffa0226f80>] reshape_request+0x4c0/0x9a0 [raid456]
21:44:04 ...: [  482.520651]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:44:04 ...: [  482.520661]  [<ffffffffa022777a>] sync_request+0x31a/0x3a0 [raid456]
21:44:04 ...: [  482.520668]  [<ffffffff81052713>] ? __wake_up+0x53/0x70
21:44:04 ...: [  482.520675]  [<ffffffff814156b1>] md_do_sync+0x621/0xbb0
21:44:04 ...: [  482.520685]  [<ffffffff810387b9>] ? default_spin_lock_flags+0x9/0x10
21:44:04 ...: [  482.520692]  [<ffffffff8141640c>] md_thread+0x5c/0x130
21:44:04 ...: [  482.520699]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:44:04 ...: [  482.520705]  [<ffffffff814163b0>] ? md_thread+0x0/0x130
21:44:04 ...: [  482.520711]  [<ffffffff81084416>] kthread+0x96/0xa0
21:44:04 ...: [  482.520718]  [<ffffffff810131ea>] child_rip+0xa/0x20
21:44:04 ...: [  482.520725]  [<ffffffff81084380>] ? kthread+0x0/0xa0
21:44:04 ...: [  482.520730]  [<ffffffff810131e0>] ? child_rip+0x0/0x20
21:44:04 ...: [  482.520735] INFO: task mdadm:2288 blocked for more than 120 seconds.
21:44:04 ...: [  482.520739] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:44:04 ...: [  482.520743] mdadm         D 0000000000000000     0  2288      1 0x00000000
21:44:04 ...: [  482.520751]  ffff88002cca9c18 0000000000000086 0000000000015bc0 0000000000015bc0
21:44:04 ...: [  482.520759]  ffff88003aee83b8 ffff88002cca9fd8 0000000000015bc0 ffff88003aee8000
21:44:04 ...: [  482.520767]  0000000000015bc0 ffff88002cca9fd8 0000000000015bc0 ffff88003aee83b8
21:44:04 ...: [  482.520774] Call Trace:
21:44:04 ...: [  482.520782]  [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180
21:44:04 ...: [  482.520790]  [<ffffffff812a6d50>] ? exact_match+0x0/0x10
21:44:04 ...: [  482.520797]  [<ffffffff8154397b>] mutex_lock+0x2b/0x50
21:44:04 ...: [  482.520804]  [<ffffffff811742c8>] __blkdev_get+0x68/0x3d0
21:44:04 ...: [  482.520810]  [<ffffffff81174650>] ? blkdev_open+0x0/0xc0
21:44:04 ...: [  482.520816]  [<ffffffff81174640>] blkdev_get+0x10/0x20
21:44:04 ...: [  482.520822]  [<ffffffff811746c1>] blkdev_open+0x71/0xc0
21:44:04 ...: [  482.520829]  [<ffffffff811419f3>] __dentry_open+0x113/0x370
21:44:04 ...: [  482.520837]  [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30
21:44:04 ...: [  482.520843]  [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0
21:44:04 ...: [  482.520850]  [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70
21:44:04 ...: [  482.520857]  [<ffffffff8115207a>] do_filp_open+0x2da/0xba0
21:44:04 ...: [  482.520864]  [<ffffffff810ff0e1>] ? lru_cache_add_lru+0x21/0x40
21:44:04 ...: [  482.520871]  [<ffffffff8111109c>] ? do_anonymous_page+0x11c/0x330
21:44:04 ...: [  482.520878]  [<ffffffff81115d5f>] ? handle_mm_fault+0x31f/0x3c0
21:44:04 ...: [  482.520885]  [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150
21:44:04 ...: [  482.520891]  [<ffffffff81141769>] do_sys_open+0x69/0x170
21:44:04 ...: [  482.520897]  [<ffffffff811418b0>] sys_open+0x20/0x30
21:44:04 ...: [  482.520905]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:46:04 ...: [  602.520053] INFO: task mdadm:1937 blocked for more than 120 seconds.
21:46:04 ...: [  602.520059] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:46:04 ...: [  602.520065] mdadm         D 00000000ffffffff     0  1937      1 0x00000000
21:46:04 ...: [  602.520075]  ffff88002ef4f5d8 0000000000000082 0000000000015bc0 0000000000015bc0
21:46:04 ...: [  602.520084]  ffff88002eb5b198 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5ade0
21:46:04 ...: [  602.520091]  0000000000015bc0 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5b198
21:46:04 ...: [  602.520099] Call Trace:
21:46:04 ...: [  602.520127]  [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456]
21:46:04 ...: [  602.520142]  [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20
21:46:04 ...: [  602.520153]  [<ffffffffa0228413>] make_request+0x243/0x4b0 [raid456]
21:46:04 ...: [  602.520162]  [<ffffffffa0221a90>] ? release_stripe+0x50/0x70 [raid456]
21:46:04 ...: [  602.520171]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:46:04 ...: [  602.520180]  [<ffffffff81414df0>] md_make_request+0xc0/0x130
21:46:04 ...: [  602.520187]  [<ffffffff81414df0>] ? md_make_request+0xc0/0x130
21:46:04 ...: [  602.520197]  [<ffffffff8129f8c1>] generic_make_request+0x1b1/0x4f0
21:46:04 ...: [  602.520206]  [<ffffffff810f6515>] ? mempool_alloc_slab+0x15/0x20
21:46:04 ...: [  602.520215]  [<ffffffff8116c2ec>] ? alloc_buffer_head+0x1c/0x60
21:46:04 ...: [  602.520222]  [<ffffffff8129fc80>] submit_bio+0x80/0x110
21:46:04 ...: [  602.520229]  [<ffffffff8116c849>] submit_bh+0xf9/0x140
21:46:04 ...: [  602.520237]  [<ffffffff8116f124>] block_read_full_page+0x274/0x3b0
21:46:04 ...: [  602.520244]  [<ffffffff81172c90>] ? blkdev_get_block+0x0/0x70
21:46:04 ...: [  602.520252]  [<ffffffff8110d875>] ? __inc_zone_page_state+0x35/0x40
21:46:04 ...: [  602.520259]  [<ffffffff810f46d8>] ? add_to_page_cache_locked+0xe8/0x160
21:46:04 ...: [  602.520266]  [<ffffffff81173d78>] blkdev_readpage+0x18/0x20
21:46:04 ...: [  602.520273]  [<ffffffff810f484b>] __read_cache_page+0x7b/0xe0
21:46:04 ...: [  602.520279]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:46:04 ...: [  602.520285]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:46:04 ...: [  602.520292]  [<ffffffff810f57dc>] do_read_cache_page+0x3c/0x120
21:46:04 ...: [  602.520300]  [<ffffffff810f5909>] read_cache_page_async+0x19/0x20
21:46:04 ...: [  602.520306]  [<ffffffff810f591e>] read_cache_page+0xe/0x20
21:46:04 ...: [  602.520314]  [<ffffffff811a6cb0>] read_dev_sector+0x30/0xa0
21:46:04 ...: [  602.520321]  [<ffffffff811a7fcd>] amiga_partition+0x6d/0x460
21:46:04 ...: [  602.520328]  [<ffffffff811a7938>] check_partition+0x138/0x190
21:46:04 ...: [  602.520335]  [<ffffffff811a7a7a>] rescan_partitions+0xea/0x2f0
21:46:04 ...: [  602.520342]  [<ffffffff811744c7>] __blkdev_get+0x267/0x3d0
21:46:04 ...: [  602.520348]  [<ffffffff81174650>] ? blkdev_open+0x0/0xc0
21:46:04 ...: [  602.520354]  [<ffffffff81174640>] blkdev_get+0x10/0x20
21:46:04 ...: [  602.520359]  [<ffffffff811746c1>] blkdev_open+0x71/0xc0
21:46:04 ...: [  602.520367]  [<ffffffff811419f3>] __dentry_open+0x113/0x370
21:46:04 ...: [  602.520375]  [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30
21:46:04 ...: [  602.520383]  [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0
21:46:04 ...: [  602.520390]  [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70
21:46:04 ...: [  602.520397]  [<ffffffff8115207a>] do_filp_open+0x2da/0xba0
21:46:04 ...: [  602.520404]  [<ffffffff811134a8>] ? unmap_vmas+0x178/0x310
21:46:04 ...: [  602.520413]  [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150
21:46:04 ...: [  602.520419]  [<ffffffff81141769>] do_sys_open+0x69/0x170
21:46:04 ...: [  602.520425]  [<ffffffff811418b0>] sys_open+0x20/0x30
21:46:04 ...: [  602.520434]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:46:04 ...: [  602.520443] INFO: task mdadm:2283 blocked for more than 120 seconds.
21:46:04 ...: [  602.520447] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:46:04 ...: [  602.520451] mdadm         D 0000000000000000     0  2283   2212 0x00000000
21:46:04 ...: [  602.520460]  ffff88002cca7d98 0000000000000086 0000000000015bc0 0000000000015bc0
21:46:04 ...: [  602.520468]  ffff88002ededf78 ffff88002cca7fd8 0000000000015bc0 ffff88002ededbc0
21:46:04 ...: [  602.520475]  0000000000015bc0 ffff88002cca7fd8 0000000000015bc0 ffff88002ededf78
21:46:04 ...: [  602.520483] Call Trace:
21:46:04 ...: [  602.520492]  [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180
21:46:04 ...: [  602.520500]  [<ffffffff8154397b>] mutex_lock+0x2b/0x50
21:46:04 ...: [  602.520506]  [<ffffffff8117404d>] __blkdev_put+0x3d/0x190
21:46:04 ...: [  602.520512]  [<ffffffff811741b0>] blkdev_put+0x10/0x20
21:46:04 ...: [  602.520518]  [<ffffffff811741f3>] blkdev_close+0x33/0x60
21:46:04 ...: [  602.520526]  [<ffffffff81145375>] __fput+0xf5/0x210
21:46:04 ...: [  602.520533]  [<ffffffff811454b5>] fput+0x25/0x30
21:46:04 ...: [  602.520539]  [<ffffffff811415ad>] filp_close+0x5d/0x90
21:46:04 ...: [  602.520545]  [<ffffffff81141697>] sys_close+0xb7/0x120
21:46:04 ...: [  602.520552]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b

答案1

我联系了 Neil Brown(开发人员),他立即建议将 stripe_cache_size 至少增加到 2048。这类似于我之前的问题,我无法使该设置永久生效。

因此,设置 8192 后,重塑继续,问题就解决了。上帝保佑 Neil Brown :-)

答案2

有时重塑将停留在速度=0K/秒,因为备份文件无法创建或在处理过程中丢失。

在这种情况下,解决方案是由 Neil Brown 在回复电子邮件时提供的[电子邮件保护]

您应该能够简单地停止阵列并使用不同的备份文件和魔术标志“--invalid-backup”重新组装(需要 mdadm 3.2 或更高版本)。

只有在崩溃的情况下才真正需要备份文件。由于您将彻底停止阵列,因此在重新组装时无需恢复任何内容,因此 --invalid-backup(表示“备份文件中没有任何内容,但没关系”)非常安全。

尼尔·布朗


对于 RAID5,作为设备/dev/md0,其中 7 个磁盘安装在/mnt/data;他的回答过程是:

以下所有命令均必须以 root 或同等身份运行。

查找与驱动器的所有打开的连接:

lsof /mnt/data

关闭它们,或者停止可能与其交互的服务。
通常:

systemctl stop <SERVICE_NAME>

或者

service <SERVICE_NAME> stop

卸载、停止,然后重新组装:

umount /mnt/data
mdadm --stop /dev/md0
mdadm --assemble --invalid-backup --backup-file=/root/mdadm0.bak /dev/md0 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1

根据之前的配置,设备可能会在执行组装命令后自动重新挂载。如果没有,请使用以下命令挂载:

mount /dev/md0 /mnt/data

然后就可以安全地重新启动那里运行的任何服务或连接。

相关内容