Ubuntu Server mdam 设备在重启后消失

Ubuntu Server mdam 设备在重启后消失

我有一台服务器(Ubuntu 22.04 LTS)运行带有 mdadm 的软件 RAID 1,由 2 个 WD Red 磁盘组成。启动服务器后,Windows 客户端找不到 Samba 共享。我登录服务器并运行:

$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda      8:0    0 931.5G  0 disk
├─sda1   8:1    0     1G  0 part /boot/efi
└─sda2   8:2    0 930.5G  0 part /
sdb      8:16   0   5.5T  0 disk
sdc      8:32   0   5.5T  0 disk

驱动器在那里,但是md0 设备消失!我有点害怕。我查看了 RAID 的状态:

$ cat /proc/mdstat

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>

我在日志中搜索了md0:

$ grep "md0" /var/log/syslog
Mar 29 08:44:07 saturno systemd[1]: dev-md0.device: Job dev-md0.device/start timed out.
Mar 29 08:44:07 saturno systemd[1]: Timed out waiting for device /dev/md0.
Mar 29 08:44:07 saturno systemd[1]: dev-md0.device: Job dev-md0.device/start failed with result 'timeout'.
Mar 29 08:55:04 saturno systemd[1]: dev-md0.device: Job dev-md0.device/start timed out.
Mar 29 08:55:04 saturno systemd[1]: Timed out waiting for device /dev/md0.
Mar 29 08:55:04 saturno systemd[1]: dev-md0.device: Job dev-md0.device/start failed with result 'timeout'.

我找到了这个超级用户中的文章, 经过沙利布几乎有相同的问题。我运行此命令来组装 RAID 阵列,查看系统中的驱动器,并显示详细响应:

$ sudo mdadm --assemble --scan --verbose
mdadm: Devices UUID-c5d7f13c:4d9bc7ac:bff90942:065235e7 and UUID-c5d7f13c:4d9bc7ac:bff90942:065235e7 have the same name: /dev/md0
mdadm: Duplicate MD device names in conf file were found.

我必须使用以下方法删除重复的设备:

$ sudo nano /etc/mdadm/mdadm.conf

这很奇怪,因为这个配置已经运行了近 2 周。当我再次运行该命令时,它运行正常:

$ sudo mdadm --assemble --scan --verbose
mdadm: looking for devices for /dev/md0
mdadm: No super block found on /dev/sda2 (Expected magic a92b4efc, got 00000476)
mdadm: no RAID superblock on /dev/sda2
mdadm: No super block found on /dev/sda1 (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sda1
mdadm: No super block found on /dev/sda (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sda
mdadm: /dev/sdc is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdb is identified as a member of /dev/md0, slot 0.
mdadm: added /dev/sdc to /dev/md0 as 1
mdadm: added /dev/sdb to /dev/md0 as 0
mdadm: /dev/md0 has been started with 2 drives.

md0 设备再次启动! 我尝试使用 Windows 计算机访问共享,但尚未准备好。我运行了:

$ lsblk -o NAME,SIZE,FSTYPE,TYPE,MOUNTPOINT
NAME     SIZE FSTYPE            TYPE  MOUNTPOINT
sda    931.5G                   disk
├─sda1     1G vfat              part  /boot/efi
└─sda2 930.5G ext4              part  /
sdb      5.5T linux_raid_member disk
└─md0    5.5T ext4              raid1
sdc      5.5T linux_raid_member disk
└─md0    5.5T ext4              raid1

RAID 已启动,但没有挂载点。我必须手动挂载它:

$ sudo mount /dev/md0 /media/share

然后再说一遍:

mauricio@saturno:~$ lsblk -o NAME,SIZE,FSTYPE,TYPE,MOUNTPOINT
NAME     SIZE FSTYPE            TYPE  MOUNTPOINT
sda    931.5G                   disk
├─sda1     1G vfat              part  /boot/efi
└─sda2 930.5G ext4              part  /
sdb      5.5T linux_raid_member disk
└─md0    5.5T ext4              raid1 /media/share
sdc      5.5T linux_raid_member disk
└─md0    5.5T ext4              raid1 /media/share

我检查了股票,它们运行良好。

问题是: 如何找出导致 RAID 失败的原因? 我不想再发生这样的事。

相关内容