如同这个问题我有兴趣完全忽略一个驱动器,但在我的例子中,它是一个作为 SCSI 驱动器向系统公开的驱动器。我的服务器中 21 个驱动器中有两个驱动器出现故障:
[2524080.689492] scsi 0:0:90900:0: Direct-Access ATA ST3000DM001-1CH1 CC43 PQ: 0 ANSI: 6
[2524080.689502] scsi 0:0:90900:0: SATA: handle(0x000d), sas_addr(0x5003048001f298cf), phy(15), device_name(0x0000000000000000)
[2524080.689506] scsi 0:0:90900:0: SATA: enclosure_logical_id(0x5003048001f298ff), slot(3)
[2524080.689594] scsi 0:0:90900:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[2524080.690671] sd 0:0:90900:0: tag#1 CDB: Test Unit Ready 00 00 00 00 00 00
[2524080.690680] mpt2sas_cm0: sas_address(0x5003048001f298cf), phy(15)
[2524080.690683] mpt2sas_cm0: enclosure_logical_id(0x5003048001f298ff),slot(3)
[2524080.690686] mpt2sas_cm0: handle(0x000d), ioc_status(success)(0x0000), smid(17)
[2524080.690695] mpt2sas_cm0: request_len(0), underflow(0), resid(0)
[2524080.690698] mpt2sas_cm0: tag(65535), transfer_count(0), sc->result(0x00000000)
[2524080.690701] mpt2sas_cm0: scsi_status(check condition)(0x02), scsi_state(autosense valid )(0x01)
[2524080.690704] mpt2sas_cm0: [sense_key,asc,ascq]: [0x06,0x29,0x00], count(18)
[2524080.690728] sd 0:0:90900:0: Attached scsi generic sg0 type 0
[2524080.691269] sd 0:0:90900:0: [sdb] 5860533168 512-byte logical blocks: (3.00 TB/2.73 TiB)
[2524080.691285] sd 0:0:90900:0: [sdb] 4096-byte physical blocks
[2524111.163712] sd 0:0:90900:0: attempting task abort! scmd(ffff880869121800)
[2524111.163722] sd 0:0:90900:0: tag#2 CDB: Mode Sense(6) 1a 00 3f 00 04 00
[2524111.163729] scsi target0:0:90900: handle(0x000d), sas_address(0x5003048001f298cf), phy(15)
[2524111.163733] scsi target0:0:90900: enclosure_logical_id(0x5003048001f298ff), slot(3)
[2524111.442310] sd 0:0:90900:0: device_block, handle(0x000d)
[2524113.442331] sd 0:0:90900:0: device_unblock and setting to running, handle(0x000d)
[2524114.939280] sd 0:0:90900:0: task abort: SUCCESS scmd(ffff880869121800)
[2524114.939358] sd 0:0:90900:0: [sdb] Write Protect is off
[2524114.939366] sd 0:0:90900:0: [sdb] Mode Sense: 00 00 00 00
[2524114.939444] sd 0:0:90900:0: [sdb] Asking for cache data failed
[2524114.939501] sd 0:0:90900:0: [sdb] Assuming drive cache: write through
[2524114.940380] sd 0:0:90900:0: [sdb] Read Capacity(16) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[2524114.940387] sd 0:0:90900:0: [sdb] Sense not available.
[2524114.940566] sd 0:0:90900:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[2524114.940570] sd 0:0:90900:0: [sdb] Sense not available.
[2524114.940778] sd 0:0:90900:0: [sdb] Attached SCSI disk
[2524114.984489] mpt2sas_cm0: removing handle(0x000d), sas_addr(0x5003048001f298cf)
[2524114.984494] mpt2sas_cm0: removing : enclosure logical id(0x5003048001f298ff), slot(3)
[2524134.939383] mpt2sas_cm0: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
[2524134.940116] mpt2sas_cm0: removing handle(0x000e), sas_addr(0x5003048001f298d0)
[2524134.940122] mpt2sas_cm0: removing enclosure logical id(0x5003048001f298ff), slot(4)
[2524153.940404] scsi 0:0:90902:0: Direct-Access ATA ST3000DM001-1CH1 CC43 PQ: 0 ANSI: 6
[2524153.940418] scsi 0:0:90902:0: SATA: handle(0x000d), sas_addr(0x5003048001f298cf), phy(15), device_name(0x0000000000000000)
[2524153.940423] scsi 0:0:90902:0: SATA: enclosure_logical_id(0x5003048001f298ff), slot(3)
[2524153.940699] scsi 0:0:90902:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[2524153.942194] sd 0:0:90902:0: tag#0 CDB: Test Unit Ready 00 00 00 00 00 00
[2524153.942205] mpt2sas_cm0: sas_address(0x5003048001f298cf), phy(15)
[2524153.942208] mpt2sas_cm0: enclosure_logical_id(0x5003048001f298ff),slot(3)
[2524153.942212] mpt2sas_cm0: handle(0x000d), ioc_status(success)(0x0000), smid(12)
[2524153.942214] mpt2sas_cm0: request_len(0), underflow(0), resid(0)
[2524153.942217] mpt2sas_cm0: tag(65535), transfer_count(0), sc->result(0x00000000)
[2524153.942220] mpt2sas_cm0: scsi_status(check condition)(0x02), scsi_state(autosense valid )(0x01)
[2524153.942223] mpt2sas_cm0: [sense_key,asc,ascq]: [0x06,0x29,0x00], count(18)
[2524153.942361] sd 0:0:90902:0: Attached scsi generic sg0 type 0
[2524153.942833] sd 0:0:90902:0: [sdb] 5860533168 512-byte logical blocks: (3.00 TB/2.73 TiB)
[2524153.942840] sd 0:0:90902:0: [sdb] 4096-byte physical blocks
[2524154.190159] scsi 0:0:90903:0: Direct-Access ATA ST3000DM001-1CH1 CC43 PQ: 0 ANSI: 6
[2524154.190174] scsi 0:0:90903:0: SATA: handle(0x0022), sas_addr(0x5003048001ec55ed), phy(13), device_name(0x0000000000000000)
[2524154.190179] scsi 0:0:90903:0: SATA: enclosure_logical_id(0x5003048001ec55ff), slot(1)
[2524154.190368] scsi 0:0:90903:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[2524154.191634] sd 0:0:90903:0: tag#1 CDB: Test Unit Ready 00 00 00 00 00 00
[2524154.191639] mpt2sas_cm0: sas_address(0x5003048001ec55ed), phy(13)
[2524154.191642] mpt2sas_cm0: enclosure_logical_id(0x5003048001ec55ff),slot(1)
[2524154.191645] mpt2sas_cm0: handle(0x0022), ioc_status(success)(0x0000), smid(12)
[2524154.191648] mpt2sas_cm0: request_len(0), underflow(0), resid(0)
[2524154.191651] mpt2sas_cm0: tag(65535), transfer_count(0), sc->result(0x00000000)
[2524154.191654] mpt2sas_cm0: scsi_status(check condition)(0x02), scsi_state(autosense valid )(0x01)
[2524154.191657] mpt2sas_cm0: [sense_key,asc,ascq]: [0x06,0x29,0x00], count(18)
[2524154.191800] sd 0:0:90903:0: Attached scsi generic sg3 type 0
[2524154.192211] sd 0:0:90903:0: [sdd] 5860533168 512-byte logical blocks: (3.00 TB/2.73 TiB)
[2524154.192219] sd 0:0:90903:0: [sdd] 4096-byte physical blocks
在我们的例子中,这是一个旧服务器,我们决定不升级/修复。我现在正在考虑甚至不删除旧驱动器,只是将它们留在里面,缩小阵列并禁用它们。该阵列未满,我们仅将其用作某些其他服务器的附加备份位置。
那么,我很懒,不想去服务器机房,有没有办法禁用这些驱动器并继续前进? :-)
有关该系统的更多信息:
lspci -nn -v -s 05:00.0
:
05:00.0 Serial Attached SCSI controller [0107]: LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0087] (rev 05)
Subsystem: LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:3020]
Flags: bus master, fast devsel, latency 0, IRQ 29
I/O ports at 7000 [size=256]
Memory at df640000 (64-bit, non-prefetchable) [size=64K]
Memory at df600000 (64-bit, non-prefetchable) [size=256K]
Expansion ROM at df500000 [disabled] [size=1M]
Capabilities: [50] Power Management version 3
Capabilities: [68] Express Endpoint, MSI 00
Capabilities: [d0] Vital Product Data
Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [1e0] #19
Capabilities: [1c0] Power Budgeting <?>
Capabilities: [190] #16
Capabilities: [148] Alternative Routing-ID Interpretation (ARI)
Kernel driver in use: mpt3sas
Kernel modules: mpt3sas
lsscsi -v
:
[0:0:3:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdc
dir: /sys/bus/scsi/devices/0:0:3:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:2/end_device-0:0:2/target0:0:3/0:0:3:0]
[0:0:6:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdf
dir: /sys/bus/scsi/devices/0:0:6:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:5/end_device-0:0:5/target0:0:6/0:0:6:0]
[0:0:7:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdg
dir: /sys/bus/scsi/devices/0:0:7:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:6/end_device-0:0:6/target0:0:7/0:0:7:0]
[0:0:8:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdh
dir: /sys/bus/scsi/devices/0:0:8:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:7/end_device-0:0:7/target0:0:8/0:0:8:0]
[0:0:11:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdi
dir: /sys/bus/scsi/devices/0:0:11:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:10/end_device-0:0:10/target0:0:11/0:0:11:0]
[0:0:12:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdj
dir: /sys/bus/scsi/devices/0:0:12:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:11/end_device-0:0:11/target0:0:12/0:0:12:0]
[0:0:13:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdk
dir: /sys/bus/scsi/devices/0:0:13:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:12/end_device-0:0:12/target0:0:13/0:0:13:0]
[0:0:15:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdl
dir: /sys/bus/scsi/devices/0:0:15:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:14/end_device-0:0:14/target0:0:15/0:0:15:0]
[0:0:16:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdm
dir: /sys/bus/scsi/devices/0:0:16:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:15/end_device-0:0:15/target0:0:16/0:0:16:0]
[0:0:18:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdn
dir: /sys/bus/scsi/devices/0:0:18:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:17/end_device-0:0:17/target0:0:18/0:0:18:0]
[0:0:20:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdo
dir: /sys/bus/scsi/devices/0:0:20:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:19/end_device-0:0:19/target0:0:20/0:0:20:0]
[0:0:21:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdp
dir: /sys/bus/scsi/devices/0:0:21:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:20/end_device-0:0:20/target0:0:21/0:0:21:0]
[0:0:22:0] enclosu LSI CORP SAS2X36 0717 -
dir: /sys/bus/scsi/devices/0:0:22:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:21/end_device-0:0:21/target0:0:22/0:0:22:0]
[0:0:23:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdq
dir: /sys/bus/scsi/devices/0:0:23:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:1/expander-0:1/port-0:1:1/end_device-0:1:1/target0:0:23/0:0:23:0]
[0:0:24:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdr
dir: /sys/bus/scsi/devices/0:0:24:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:1/expander-0:1/port-0:1:2/end_device-0:1:2/target0:0:24/0:0:24:0]
[0:0:25:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sds
dir: /sys/bus/scsi/devices/0:0:25:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:1/expander-0:1/port-0:1:3/end_device-0:1:3/target0:0:25/0:0:25:0]
[0:0:26:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdt
dir: /sys/bus/scsi/devices/0:0:26:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:1/expander-0:1/port-0:1:4/end_device-0:1:4/target0:0:26/0:0:26:0]
[0:0:28:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdu
dir: /sys/bus/scsi/devices/0:0:28:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:1/expander-0:1/port-0:1:6/end_device-0:1:6/target0:0:28/0:0:28:0]
[0:0:30:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdw
dir: /sys/bus/scsi/devices/0:0:30:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:1/expander-0:1/port-0:1:8/end_device-0:1:8/target0:0:30/0:0:30:0]
[0:0:31:0] disk ATA ST3000DM001-1CH1 CC43 /dev/sdx
dir: /sys/bus/scsi/devices/0:0:31:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:1/expander-0:1/port-0:1:9/end_device-0:1:9/target0:0:31/0:0:31:0]
[0:0:34:0] enclosu LSI CORP SAS2X28 0717 -
dir: /sys/bus/scsi/devices/0:0:34:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:1/expander-0:1/port-0:1:12/end_device-0:1:12/target0:0:34/0:0:34:0]
[0:0:25856:0]disk ATA ST3000DM001-1CH1 CC43 /dev/sda
dir: /sys/bus/scsi/devices/0:0:25856:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:14357/end_device-0:0:14357/target0:0:25856/0:0:25856:0]
[0:0:98760:0]disk ATA ST3000DM001-1CH1 CC43 -
dir: /sys/bus/scsi/devices/0:0:98760:0 [/sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0/port-0:0/expander-0:0/port-0:0:60931/end_device-0:0:60931/target0:0:98760/0:0:98760:0]
[2:0:0:0] disk ATA PLEXTOR PX-128M5 1.00 /dev/sdy
dir: /sys/bus/scsi/devices/2:0:0:0 [/sys/devices/pci0000:00/0000:00:1f.2/ata2/host2/target2:0:0/2:0:0:0]
lsscsi -Hv
:
[0] mpt2sas
dir: /sys/class/scsi_host//host0
device dir: /sys/devices/pci0000:00/0000:00:03.0/0000:05:00.0/host0
[1] ahci
dir: /sys/class/scsi_host//host1
device dir: /sys/devices/pci0000:00/0000:00:1f.2/ata1/host1
[2] ahci
dir: /sys/class/scsi_host//host2
device dir: /sys/devices/pci0000:00/0000:00:1f.2/ata2/host2
[3] ahci
dir: /sys/class/scsi_host//host3
device dir: /sys/devices/pci0000:00/0000:00:1f.2/ata3/host3
[4] ahci
dir: /sys/class/scsi_host//host4
device dir: /sys/devices/pci0000:00/0000:00:1f.2/ata4/host4
[5] ahci
dir: /sys/class/scsi_host//host5
device dir: /sys/devices/pci0000:00/0000:00:1f.2/ata5/host5
[6] ahci
dir: /sys/class/scsi_host//host6
device dir: /sys/devices/pci0000:00/0000:00:1f.2/ata6/host6
smp_discover /dev/bsg/expander-0:0
:
phy 0:S:attached:[500605b00507dd20:03 i(SSP+STP+SMP)] 6 Gbps
phy 1:S:attached:[500605b00507dd20:02 i(SSP+STP+SMP)] 6 Gbps
phy 2:S:attached:[500605b00507dd20:01 i(SSP+STP+SMP)] 6 Gbps
phy 3:S:attached:[500605b00507dd20:00 i(SSP+STP+SMP)] 6 Gbps
phy 12:U:attached:[5003048001f298cc:00 t(SATA)] 6 Gbps
phy 13:U:attached:[5003048001f298cd:00 t(SATA)] 6 Gbps
phy 14:U:attached:[5003048001f298ce:00 t(SATA)] 6 Gbps
phy 17:U:attached:[5003048001f298d1:00 t(SATA)] 6 Gbps
phy 19:U:attached:[5003048001f298d3:00 t(SATA)] 6 Gbps
phy 20:U:attached:[5003048001f298d4:00 t(SATA)] 6 Gbps
phy 21:U:attached:[5003048001f298d5:00 t(SATA)] 6 Gbps
phy 22:U:attached:[5003048001f298d6:00 t(SATA)] 6 Gbps
phy 23:U:attached:[5003048001f298d7:00 t(SATA)] 6 Gbps
phy 25:U:attached:[5003048001f298d9:00 t(SATA)] 6 Gbps
phy 26:U:attached:[5003048001f298da:00 t(SATA)] 6 Gbps
phy 27:U:attached:[5003048001f298db:00 t(SATA)] 6 Gbps
phy 28:U:attached:[5003048001f298dc:00 t(SATA)] 6 Gbps
phy 29:U:attached:[5003048001f298dd:00 t(SATA)] 6 Gbps
phy 31:U:attached:[5003048001f298df:00 t(SATA)] 6 Gbps
phy 32:U:attached:[5003048001f298e0:00 t(SATA)] 6 Gbps
phy 33:U:attached:[5003048001f298e1:00 t(SATA)] 6 Gbps
phy 34:U:attached:[5003048001f298e2:00 t(SATA)] 6 Gbps
phy 35:U:attached:[5003048001f298e3:00 t(SATA)] 6 Gbps
phy 36:D:attached:[5003048001f298fd:00 V i(SSP+SMP) t(SSP)] 6 Gbps
答案1
非常高的 SCSI 设备编号 ( scsi 0:0:90903:0
) 表明在这种情况下存在硬件不断丢失并重新初始化驱动器的问题。
MPT SAS 硬件在这里完成大部分重新初始化工作,因此我们无法从内核完全控制它。另外,您提到有 21 个驱动器,因此它们可能位于一个或多个 SAS 扩展器后面。
那么问题就变成了,是否可以通过软件禁用 SAS 扩展器上的端口?
如果扩展器确实支持它(我认为它在标准中是可选的),那么是的!
有问题的包是smp_utils
。sg3_utils
也会有帮助)。
你想要的是:
根据上面的联机帮助页找出扩展器设备(可能
ls /dev/bsg/expand*
)通过 dmesg: 确认有故障的磁盘已连接到 phy
smp_discover /dev/bsg/expander-...
。禁用 PHY,形式为
smp_phy_control --phy=NN --op=di /dev/bsg/expander-...
。针对您的案例进行扩展:smp_phy_control --phy=13 --op=di /dev/bsg/expander-0:0 smp_phy_control --phy=15 --op=di /dev/bsg/expander-0:0
phy 编号已在您的输出中:13
, 15
,但您可能需要使用 来确认它们smp_discover
。
答案2
我认为您尝试做的事情听起来类似于 Oracle ASM 磁盘的设置,其中 Oracle 直接访问块,建议系统操作员使用 udev 排除它们,这样磁盘在 Oracle 使用时就不会被格式化。
以下是 Oracle 网站上相关页面的链接:
https://oracle-base.com/articles/linux/udev-scsi-rules-configuration-in-oracle-linux