如何让 Linux 识别我热插拔的新 SATA /dev/sda 驱动器而无需重新启动?

如何让 Linux 识别我热插拔的新 SATA /dev/sda 驱动器而无需重新启动?

热插拔故障的 SATA /dev/sda 驱动器工作正常,但是当我换入新驱动器时,它无法被识别:

[root@fs-2 ~]# tail -18 /var/log/messages
May 5 16:54:35 fs-2 kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen
May 5 16:54:35 fs-2 kernel: ata1: SError: { PHYRdyChg CommWake }
May 5 16:54:40 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:54:45 fs-2 kernel: ata1: device not ready (errno=-16), forcing hardreset
May 5 16:54:45 fs-2 kernel: ata1: soft resetting link
May 5 16:54:50 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:54:55 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:54:55 fs-2 kernel: ata1: soft resetting link
May 5 16:55:00 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:55:05 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:55:05 fs-2 kernel: ata1: soft resetting link
May 5 16:55:10 fs-2 kernel: ata1: link is slow to respond, please be patient (ready=0)
May 5 16:55:40 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:55:40 fs-2 kernel: ata1: limiting SATA link speed to 1.5 Gbps
May 5 16:55:40 fs-2 kernel: ata1: soft resetting link
May 5 16:55:45 fs-2 kernel: ata1: SRST failed (errno=-16)
May 5 16:55:45 fs-2 kernel: ata1: reset failed, giving up
May 5 16:55:45 fs-2 kernel: ata1: EH complete

我尝试了一些方法来让服务器找到新的 /dev/sda,例如重新扫描 scsi 总线.sh但它们不起作用:

[root@fs-2 ~]# echo "---" > /sys/class/scsi_host/host0/scan
-bash: echo: write error: Invalid argument
[root@fs-2 ~]#
[root@fs-2 ~]# /root/rescan-scsi-bus.sh -l
[snip]
0 new device(s) found.
0 device(s) removed.
[root@fs-2 ~]#
[root@fs-2 ~]# ls /dev/sda
ls: /dev/sda: No such file or directory

我最终重启了服务器。/dev/sda 被识别了,我修复了软件 RAID,现在一切都正常了。但下次,如何让 Linux 识别我热插拔的新 SATA 驱动器而无需重新启动?

所讨论的操作系统是RHEL5.3:

[root@fs-2 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.3 (Tikanga)

硬盘是Seagate Barracuda ES.2 SATA 3.0-Gb/s 500-GB,型号ST3500320NS。

以下是 lscpi 输出:

[root@fs-2 ~]# lspci
00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:0a.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0e.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02)
04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)

更新:在大概十几个案例中,我们被迫重启服务器,因为热插拔“无法正常工作”。感谢您提供的答案,让我们更深入地了解 SATA 控制器。我已将上面有问题的系统(主机名:fs-2)的 lspci 输出包括在内。我仍然需要一些帮助来了解该系统在热插拔方面究竟不支持哪些硬件。请让我知道除了 lspci 之外还有哪些其他输出可能有用。

好消息是,今天我们的一台服务器(主机名:www-1)上的热插拔“刚刚起作用”,这对我们来说非常罕见。以下是 lspci 输出:

[root@www-1 ~]# lspci
00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a3)
00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a3)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
00:0b.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Miscellaneous Control
00:19.4 Host bridge: Advanced Micro Devices [AMD] K10 [Opteron, Athlon64, Sempron] Link Control
03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02)
04:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
04:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06)
09:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064ET PCI-Express Fusion-MPT SAS (rev 04)

答案1

如果您的 SATA 控制器支持热插拔,它应该“可以正常工作(tm)”。

要强制重新扫描 SCSI 总线(每个 SATA 端口显示为 SCSI 总线)并查找新驱动器,您将使用:

echo "0 0 0" >/sys/class/scsi_host/host<n>/scan

综上所述,< n >是总线编号。

答案2

echo "- - -" >/sys/class/scsi_host/host<n>/scan
       ^ ^
        \_\_______ note spaces between the dashes.

答案3

在某些情况下,当驱动器发生故障时,Linux 不会意识到您实际上已将其从阵列中物理拔出。如果您遇到此问题(就像我今天早上遇到的那样),您可以执行以下操作:

echo 1 > /sys/block/<devnode>/device/delete

例如,就我而言,/dev/sda 出现故障,并且我不想重新启动服务器,因此我执行以下操作:

echo 1 > /sys/block/sda/device/delete

完成此操作后,新的驱动器(实际上已经物理添加)立即可见。

如果此时不可见,您也可以执行以下操作来强制重新扫描:

echo "- - -" > /sys/class/scsi_host/host<n>/scan

“- - -” 分别是通道、id 和 LUN 的通配符,因此,如果您愿意,可以通过指定数字将扫描限制到某个子集。

在开始之前,您还可以:

readlink /sys/block/<devnode>

它将显示具有正确主机号的路径,以便在删除后检查 /proc/scsi/scsi 是否消失。

答案4

这个怎么样(似乎在 Ubuntu 中有效):

sudo partprobe

相关内容