我有一台服务器(Ubuntu 22.04 LTS)运行带有 mdadm 的软件 RAID 1,由 2 个 WD Red 磁盘组成。启动服务器后,Windows 客户端找不到 Samba 共享。我登录服务器并运行:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 931.5G 0 disk
├─sda1 8:1 0 1G 0 part /boot/efi
└─sda2 8:2 0 930.5G 0 part /
sdb 8:16 0 5.5T 0 disk
sdc 8:32 0 5.5T 0 disk
驱动器在那里,但是md0 设备消失!我有点害怕。我查看了 RAID 的状态:
$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>
我在日志中搜索了md0:
$ grep "md0" /var/log/syslog
Mar 29 08:44:07 saturno systemd[1]: dev-md0.device: Job dev-md0.device/start timed out.
Mar 29 08:44:07 saturno systemd[1]: Timed out waiting for device /dev/md0.
Mar 29 08:44:07 saturno systemd[1]: dev-md0.device: Job dev-md0.device/start failed with result 'timeout'.
Mar 29 08:55:04 saturno systemd[1]: dev-md0.device: Job dev-md0.device/start timed out.
Mar 29 08:55:04 saturno systemd[1]: Timed out waiting for device /dev/md0.
Mar 29 08:55:04 saturno systemd[1]: dev-md0.device: Job dev-md0.device/start failed with result 'timeout'.
我找到了这个超级用户中的文章, 经过沙利布几乎有相同的问题。我运行此命令来组装 RAID 阵列,查看系统中的驱动器,并显示详细响应:
$ sudo mdadm --assemble --scan --verbose
mdadm: Devices UUID-c5d7f13c:4d9bc7ac:bff90942:065235e7 and UUID-c5d7f13c:4d9bc7ac:bff90942:065235e7 have the same name: /dev/md0
mdadm: Duplicate MD device names in conf file were found.
我必须使用以下方法删除重复的设备:
$ sudo nano /etc/mdadm/mdadm.conf
这很奇怪,因为这个配置已经运行了近 2 周。当我再次运行该命令时,它运行正常:
$ sudo mdadm --assemble --scan --verbose
mdadm: looking for devices for /dev/md0
mdadm: No super block found on /dev/sda2 (Expected magic a92b4efc, got 00000476)
mdadm: no RAID superblock on /dev/sda2
mdadm: No super block found on /dev/sda1 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sda1
mdadm: No super block found on /dev/sda (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sda
mdadm: /dev/sdc is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdb is identified as a member of /dev/md0, slot 0.
mdadm: added /dev/sdc to /dev/md0 as 1
mdadm: added /dev/sdb to /dev/md0 as 0
mdadm: /dev/md0 has been started with 2 drives.
md0 设备再次启动! 我尝试使用 Windows 计算机访问共享,但尚未准备好。我运行了:
$ lsblk -o NAME,SIZE,FSTYPE,TYPE,MOUNTPOINT
NAME SIZE FSTYPE TYPE MOUNTPOINT
sda 931.5G disk
├─sda1 1G vfat part /boot/efi
└─sda2 930.5G ext4 part /
sdb 5.5T linux_raid_member disk
└─md0 5.5T ext4 raid1
sdc 5.5T linux_raid_member disk
└─md0 5.5T ext4 raid1
RAID 已启动,但没有挂载点。我必须手动挂载它:
$ sudo mount /dev/md0 /media/share
然后再说一遍:
mauricio@saturno:~$ lsblk -o NAME,SIZE,FSTYPE,TYPE,MOUNTPOINT
NAME SIZE FSTYPE TYPE MOUNTPOINT
sda 931.5G disk
├─sda1 1G vfat part /boot/efi
└─sda2 930.5G ext4 part /
sdb 5.5T linux_raid_member disk
└─md0 5.5T ext4 raid1 /media/share
sdc 5.5T linux_raid_member disk
└─md0 5.5T ext4 raid1 /media/share
我检查了股票,它们运行良好。
问题是: 如何找出导致 RAID 失败的原因? 我不想再发生这样的事。