Adaptec pm80xx 驱动程序随机丢弃驱动器

Adaptec pm80xx 驱动程序随机丢弃驱动器

我正在使用Adaptec ASA-71605H 主机总线适配器在 Ubuntu 12.04.4 上。

现代 Linux 内核附带了所需 pm80xx 内核模块的开源版本。Adaptec 自己提供了适用于 Ubuntu 12.04 的驱动程序,我测试了两者,效果相同。

我看到的症状是,有时启动后 16 个驱动器中只有 14 个可用。

完整dmesg日志可在此处获取有趣的是

[    3.591035] pm80xx 0000:01:00.0: driver version 0.1.37 / 1.0.15-1

[   50.749419] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[   50.749424] sas: ata1: end_device-1:0: dev error handler
[   50.749430] sas: ata2: end_device-1:1: dev error handler
[   50.749433] sas: ata3: end_device-1:2: dev error handler
[   55.900826] ata3.00: qc timeout (cmd 0xec)
[   55.900899] pm80xx:: mpi_sata_completion 2049: SATA IO STATUS 0x1 task ffff8807ee8cc000
[   55.900900] pm80xx:: mpi_sata_completion 2085: status:0x1, tag:0x2, task::0xffff8807ee8cc000
[   55.900831] pm80xx:: pm8001_chip_abort_task 4889: cmd_tag = 0x3, abort task tag = 0x2
[   55.900902] pm80xx:: mpi_sata_completion 2118: SAS Address of IO Failure Drive:50000d1106c76219<6>
[   55.900903] pm80xx:: mpi_sata_completion 2493: task 0xffff8807ee8cc000 done with io_status 0x1 resp 0x0 stat 0x8d but aborted by upper layer!
[   55.900906] pm80xx:: pm8001_mpi_task_abort_resp 3840: ABORT status = 0x0 task ffff8807ee8cc1c0
[   55.900907] pm80xx:: pm8001_mpi_task_abort_resp 3856: ABORT IO_SUCCESS for tag 3 ,task ffff8807ee8cc1c0
[   55.900911] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   66.049020] ata3.00: qc timeout (cmd 0xec)
[   66.049087] pm80xx:: mpi_sata_completion 2049: SATA IO STATUS 0x1 task ffff8807ee8cc000
[   66.049088] pm80xx:: mpi_sata_completion 2085: status:0x1, tag:0x2, task::0xffff8807ee8cc000
[   66.049025] pm80xx:: pm8001_chip_abort_task 4889: cmd_tag = 0x3, abort task tag = 0x2
[   66.049089] pm80xx:: mpi_sata_completion 2118: SAS Address of IO Failure Drive:50000d1106c76219<6>
[   66.049091] pm80xx:: mpi_sata_completion 2493: task 0xffff8807ee8cc000 done with io_status 0x1 resp 0x0 stat 0x8d but aborted by upper layer!
[   66.049093] pm80xx:: pm8001_mpi_task_abort_resp 3840: ABORT status = 0x0 task ffff8807ee8cc1c0
[   66.049094] pm80xx:: pm8001_mpi_task_abort_resp 3856: ABORT IO_SUCCESS for tag 3 ,task ffff8807ee8cc1c0
[   66.049098] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   96.181921] ata3.00: qc timeout (cmd 0xec)
[   96.182001] pm80xx:: mpi_sata_completion 2049: SATA IO STATUS 0x1 task ffff8807ee8cc000
[   96.182009] pm80xx:: mpi_sata_completion 2085: status:0x1, tag:0x2, task::0xffff8807ee8cc000
[   96.181934] pm80xx:: pm8001_chip_abort_task 4889: cmd_tag = 0x3, abort task tag = 0x2
[   96.182014] pm80xx:: mpi_sata_completion 2118: SAS Address of IO Failure Drive:50000d1106c76219<6>
[   96.182020] pm80xx:: mpi_sata_completion 2493: task 0xffff8807ee8cc000 done with io_status 0x1 resp 0x0 stat 0x8d but aborted by upper layer!
[   96.182025] pm80xx:: pm8001_mpi_task_abort_resp 3840: ABORT status = 0x0 task ffff8807ee8121c0
[   96.182029] pm80xx:: pm8001_mpi_task_abort_resp 3856: ABORT IO_SUCCESS for tag 3 ,task ffff8807ee8121c0
[   96.182043] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   96.337817] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0

[   96.354159] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[   96.354177] sas: ata1: end_device-1:0: dev error handler
[   96.354194] sas: ata2: end_device-1:1: dev error handler
[   96.354204] sas: ata3: end_device-1:2: dev error handler
[   96.354210] sas: ata4: end_device-1:3: dev error handler
[   96.510401] ata4.00: ATA-9: ST4000VN000-1H4168, SC43, max UDMA/133
[   96.510409] ata4.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[   96.511106] ata4.00: configured for UDMA/133
[   96.511134] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0
[   96.526013] scsi 1:0:3:0: Direct-Access     ATA      ST4000VN000-1H41 SC43 PQ: 0 ANSI: 5

第一个大块显示驱动器检测失败的样子,第二个大块显示驱动器检测成功的样子。

所有硬盘在投入完整组装之前都经过多次无错误测试。故障的硬盘并不总是相同的,似乎完全是随机的。

另一个问题表明错误来自共享的 IRQ 16,事实上,我有时会收到指向 IRQ 16 的错误日志。不幸的是,我不知道是否可以使用另一个 IRQ,因为 BIOS 不允许我这样做,并且从链路速度方面来看,使用另一个 PCIe 插槽不是一个选择。

任何帮助都欢迎。我即将订购一个 LSI 控制器,看看它是否有用,但希望它能与 Adaptec 配合使用。我只是非常担心将我的数据托付给这个控制器。

更新:问题还在继续。即使找到了所有驱动器,libsas 和 pm80xx 内核模块中也会随机出现内核崩溃。在生产中也无法使用。正在考虑购买 LSI 9201-16i……

相关内容