昨天我的数据中心出现了电源问题,我的 nobreakers 在 30 分钟后就失效了,这是我迄今为止见过的最糟糕的情况之一。我正在运行一个 freeNas 服务器,使用 raidz1-0。开机后,我注意到一个严重警报:
卷 Raid (ZFS) 状态为“降级”:一个或多个设备出现错误,导致数据损坏。应用程序可能会受到影响。
所以我检查了磁盘状态,结果比我想象的要严重,运行“zpool 状态 -v“
我收到以下消息:
pool: Raid
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: scrub in progress since Sun Feb 11 19:47:09 2018
14.0T scanned out of 18.1T at 155M/s, 7h48m to go
8K repaired, 77.14% done
config:
NAME STATE READ WRITE CKSUM
Raid DEGRADED 0 0 75.1K
raidz1-0 DEGRADED 0 0 150K
gptid/d5a65a3d-4eac-11e6-aebb-b083fed00972 DEGRADED 0 0 0 too many errors (repairing)
gptid/d642db6c-4eac-11e6-aebb-b083fed00972 DEGRADED 0 0 0 too many errors (repairing)
gptid/d6d69c95-4eac-11e6-aebb-b083fed00972 DEGRADED 0 0 0 too many errors (repairing)
gptid/d7860535-4eac-11e6-aebb-b083fed00972 DEGRADED 0 0 0 too many errors
gptid/d82ec964-4eac-11e6-aebb-b083fed00972 DEGRADED 0 0 0 too many errors
gptid/aec9036c-4f4b-11e6-a2f2-b083fed00972 DEGRADED 0 0 0 too many errors
gptid/d97ceea1-4eac-11e6-aebb-b083fed00972 DEGRADED 0 0 9 too many errors (repairing)
gptid/da14eaee-4eac-11e6-aebb-b083fed00972 DEGRADED 0 0 0 too many errors (repairing)
gptid/dabd3055-4eac-11e6-aebb-b083fed00972 DEGRADED 0 0 0 too many errors (repairing)
gptid/db58a590-4eac-11e6-aebb-b083fed00972 DEGRADED 0 0 0 too many errors (repairing)
我的整个磁盘阵列都已降级,但闪烁的 LED 显示“正常”。现在我正在尝试清理,也许这不会起作用。我很恐慌,因为有两个 ISCSI 卷,包含 6 个 VM 服务器。我将这些 iscsi 磁盘安装在 Linux 机器上,以便从那里移动我的服务器文件,但在运行 cp 和 rsync 时出现 I/O 错误。
有人遇到过类似的事情吗?有什么办法吗?
我的服务器设置是:Dell PowerEdge R720 存储服务器 10x HD Dell 4TB 15k RPM 65GB RAM Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
任何建议都将受到赞赏。