我对这个问题苦恼不已。
我想在我的家庭服务器中添加一个热插拔托架,以便轻松添加和移除硬盘,例如轻松轮换异地备份。有问题的主板是 Asrock J4105-ITX 主板,带有四个原生 SATA 端口,分为 ASM1062 和英特尔处理器 SATA 控制器。两者都运行良好并使用ahci
内核模块。BIOS 中有一个热插拔选项,但似乎没有效果。
如果驱动器断开连接(通过echo 1 > /sys/block/sdX/device/delete
或粗暴地移除驱动器),重新连接后将无法识别任何新设备。我尝试过强制重新扫描(echo "- - -" > /sys/class/scsi_host/host<n>/scan
),但无济于事,SATA 端口实际上在下次重新启动之前无法使用。我还尝试了一些更极端的命令,但没有任何效果:
echo 1 > /sys/class/scsi_device/2:0:0:0/device/reset
echo 1 > /sys/devices/pci0000:00/0000:00:1f.2/rescan
echo 1 > /sys/devices/pci0000:00/0000:00:1f.2/reset
(取自如何让 Linux 识别我热插拔的新 SATA /dev/sda 驱动器而无需重新启动?)
“好吧,可能是芯片组不支持热插拔,或者 BIOS 有问题。”所以我订购了两个 PCIe SATA 控制器(一个使用 ASM1064,另一个使用 Marvell 88SE9215)。两者都出现了同样的问题,尽管其他买家表示热插拔对他们有用,所以我猜问题要么与软件有关(我的安装?我正在运行 Arch OS,它一直保持最新状态)。
一些希望有用的信息:
$ uname -a
Linux servername 5.14.14-arch1-1 #1 SMP PREEMPT Wed, 20 Oct 2021 21:35:18 +0000 x86_64 GNU/Linux
$ dmesg | grep ahci
[ 0.447450] ahci 0000:00:12.0: version 3.0
[ 0.447842] ahci 0000:00:12.0: SSS flag set, parallel bus scan disabled
[ 0.457970] ahci 0000:00:12.0: AHCI 0001.0301 32 slots 2 ports 6 Gbps 0x3 impl SATA mode
[ 0.457981] ahci 0000:00:12.0: flags: 64bit ncq sntf stag pm clo only pmp pio slum part sxs deso sadm sds apst
[ 0.458750] scsi host0: ahci
[ 0.459204] scsi host1: ahci
[ 0.469788] ahci 0000:01:00.0: AHCI 0001.0000 32 slots 4 ports 6 Gbps 0xf impl SATA mode
[ 0.469801] ahci 0000:01:00.0: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs
[ 0.470767] scsi host2: ahci
[ 0.471203] scsi host3: ahci
[ 0.471562] scsi host4: ahci
[ 0.471904] scsi host5: ahci
[ 0.472341] ahci 0000:04:00.0: SSS flag set, parallel bus scan disabled
[ 0.472376] ahci 0000:04:00.0: AHCI 0001.0200 32 slots 2 ports 6 Gbps 0x3 impl SATA mode
[ 0.472382] ahci 0000:04:00.0: flags: 64bit ncq sntf stag led clo pmp pio slum part ccc
[ 0.472803] scsi host6: ahci
[ 0.473011] scsi host7: ahci
$ lspci -v
[...]
01:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller (rev 11) (prog-if 01 [AHCI 1.0])
Subsystem: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller
Flags: bus master, fast devsel, latency 0, IRQ 127
I/O ports at e050 [size=8]
I/O ports at e040 [size=4]
I/O ports at e030 [size=8]
I/O ports at e020 [size=4]
I/O ports at e000 [size=32]
Memory at a1340000 (32-bit, non-prefetchable) [size=2K]
Expansion ROM at a1300000 [disabled] [size=256K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [70] Express Legacy Endpoint, MSI 00
Capabilities: [e0] SATA HBA v0.0
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: ahci
[...]
答案1
我终于找到了原因:我的 powertop 调整太激进了!
因为这台服务器全天候运行,而且这边的电费比较贵,所以我添加了一个 systemd 服务来自动调整所有 powertop 选项:
$ cat /etc/systemd/system/powertop.service
[Unit]
Description=Powertop tunings
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/powertop --auto-tune
[Install]
WantedBy=multi-user.target
这与打开 powertop tui 并将所有选项设置为“Good”相同。关键部分是关于的四行Runtime PM for port ataX
:
Good Runtime PM for port ata3 of PCI device: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller
Bad Runtime PM for port ata4 of PCI device: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller
Good Runtime PM for port ata5 of PCI device: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller
>> Good Runtime PM for port ata6 of PCI device: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller
Good Runtime PM for PCI Device Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller
它们执行的操作echo 'auto' > '/sys/bus/pci/devices/0000:01:00.0/ata4/power/control';
显然导致 SATA 卡永远无法识别端口上的新设备!
只有设置power/control
为on
(powertop 中的“坏”选项)后,卡才会在执行后找到新设备echo 0 0 0 | sudo tee /sys/class/scsi_host/host*/scan
我唯一缺少的是自动重新扫描,因为我的台式电脑会自动找到新设备,而无需写入hostX/scan
,但我目前可以忍受这一点。这是一个非常令人沮丧的经历,所以我希望这可以帮助面临同样问题的人。