对 Linux 上故障磁盘硬件(没有介质)进行故障排除?

对 Linux 上故障磁盘硬件(没有介质)进行故障排除?

好的,我正在使用我的华硕 TX300CA,它有一个平板电脑部件,其中装有 CPU 和一个硬盘(/dev/sda),还有一个键盘底座,里面装有另一个硬盘(/dev/sdb)。键盘底座上的驱动器分区已安装,当我尝试访问cat这些驱动器上的文件时,突然开始出现“输入/输出错误:读取”或类似内容(ls仍在运行)。所以我重新启动,发现如果平板电脑部件连接到键盘底座,系统甚至不会启动(只显示带有华硕徽标的启动画面,并且永远不会进入 GRUB 启动菜单)。

幸运的是,我的主要 Ubuntu 14.04 操作系统安装在笔记本电脑部分的硬盘上,因此我断开连接并启动进入操作系统;然后我再次连接键盘底座。系统日志中的消息不会立即指示任何错误:

Oct 29 21:48:14 mypc kernel: [ 1348.596871] ACPI Error: [^^^XHC_.SSP1] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-359)
Oct 29 21:48:14 mypc kernel: [ 1348.596896] ACPI Error: Method parse/execution failed [\_SB.PCI0.LPCB.EC0._Q82] (Node f389e288), AE_NOT_FOUND (20150930/psparse-542)
Oct 29 21:48:14 mypc kernel: [ 1348.601331] asus_wmi: Unknown key 75 pressed
Oct 29 21:48:18 mypc kernel: [ 1352.297028] usb 4-1: new SuperSpeed USB device number 2 using xhci_hcd
Oct 29 21:48:18 mypc kernel: [ 1352.320942] usb 4-1: New USB device found, idVendor=05e3, idProduct=0612
Oct 29 21:48:18 mypc kernel: [ 1352.320953] usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Oct 29 21:48:18 mypc kernel: [ 1352.320959] usb 4-1: Product: USB3.0 Hub
Oct 29 21:48:18 mypc kernel: [ 1352.320964] usb 4-1: Manufacturer: GenesysLogic
Oct 29 21:48:18 mypc kernel: [ 1352.329092] hub 4-1:1.0: USB hub found
Oct 29 21:48:18 mypc kernel: [ 1352.329477] hub 4-1:1.0: 4 ports detected
....
Oct 29 21:48:26 mypc mtp-probe: checking bus 4, device 4: "/sys/devices/pci0000:00/0000:00:14.0/usb4/4-1/4-1.4"
Oct 29 21:48:26 mypc mtp-probe: bus: 4, device: 4 was not an MTP device
Oct 29 21:48:26 mypc kernel: [ 1360.719291] usb-storage 4-1.2:1.0: USB Mass Storage device detected
Oct 29 21:48:26 mypc kernel: [ 1360.719384] scsi host4: usb-storage 4-1.2:1.0
Oct 29 21:48:26 mypc kernel: [ 1360.719787] usbcore: registered new interface driver usb-storage
Oct 29 21:48:26 mypc kernel: [ 1360.723564] usbcore: registered new interface driver uas
Oct 29 21:48:27 mypc kernel: [ 1361.067216] ax88179_178a 4-1.4:1.0 eth0: register 'ax88179_178a' at usb-0000:00:14.0-1.4, ASIX AX88179 USB 3.0 Gigabit Ethernet, 74:d0:2b:0a:6b:62
Oct 29 21:48:27 mypc kernel: [ 1361.078810] usbcore: registered new interface driver ax88179_178a
Oct 29 21:48:27 mypc NetworkManager[1001]: <warn> failed to allocate link cache: (-12) Object not found
Oct 29 21:48:27 mypc NetworkManager[1001]: <info> (eth0): carrier is OFF
Oct 29 21:48:27 mypc NetworkManager[1001]: <info> (eth0): new Ethernet device (driver: 'ax88179_178a' ifindex: 4)
...

... 上图显示已检测到以太网端口和底座中的 USB 集线器;并且唯一与磁盘相关的是:

Oct 29 21:48:29 mypc kernel: [ 1363.961212] scsi 4:0:0:0: Direct-Access      osz osz  osz osz osz osz AD04 PQ: 0 ANSI: 6
Oct 29 21:48:29 mypc kernel: [ 1363.964557] sd 4:0:0:0: [sdb] Attached SCSI removable disk
Oct 29 21:48:29 mypc kernel: [ 1363.964978] sd 4:0:0:0: Attached scsi generic sg1 type 0

那么,这个磁盘现在的状态是:

  • sudo mount甚至不显示分区/dev/sdb
  • sudo fdisk -l没有显示此设备,但WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.(虽然/dev/sdb无法启动)
  • sudo parted -l/dev/sdb根本不报告此设备

唯一能显示该内容的命令是:

$ sudo lshw -class disk -class storage -short
H/W path      Device     Class          Description
===================================================
/0/100/1f.2              storage        7 Series Chipset Family 6-port SATA Controller [A
/0/2          scsi0      storage        
/0/2/0.0.0    /dev/sda   disk           128GB SanDisk SSD U100
/0/3          scsi4      storage        
/0/3/0.0.0    /dev/sdb   disk           osz osz osz osz
/0/3/0.0.0/0  /dev/sdb   disk           

$ sudo smartctl --all /dev/sdb
smartctl 6.2 2013-07-26 r3841 [i686-linux-4.4.0-57-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/sdb: Unknown USB bridge [0x05e3:0x0735 (0x4104)]
Please specify device type with the -d option.

Use smartctl -h to get a usage summary

$ sudo smartctl --all -d scsi /dev/sdb
smartctl 6.2 2013-07-26 r3841 [i686-linux-4.4.0-57-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               osz osz
Product:              osz osz osz osz
Revision:             AD04
Logical block provisioning type unreported, LBPME=-1, LBPRZ=0
Device type:          disk
Local Time is:        Sun Oct 29 22:25:01 2017 CET
NO MEDIUM present on device
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

另外,gksu gnome-disks也显示此磁盘:

在此处输入图片描述

好吧,我不记得这个驱动器叫什么了,但它肯定不是osz osz osz ...(在重新启动后可能会改变orj orj...,我不得不这样做,因为在我写到这里时系统再次崩溃了),所以我可以说有些地方不对劲。

我还设法通过直接从以下位置读取来打印一些内容/sys

$ cat /sys/bus/scsi/devices/4\:0\:0\:0/model 
 osz osz osz osz
$ cat /sys/bus/scsi/devices/4\:0\:0\:0/vendor
 orj orj
$ cat /sys/bus/scsi/devices/4\:0\:0\:0/dh_state
detached
$ cat /sys/bus/scsi/devices/4\:0\:0\:0/state
running
$ cat /sys/bus/scsi/devices/4\:0\:0\:0/type
0

因此,我的问题是 - 我还能做什么来排除处于这种状态的设备故障?我可以强制操作系统以某种方式重新扫描它并转储更详细的错误消息吗?我应该在哪里查找这些消息(即syslog)?我可以使用哪些其他工具(如果有)来查询处于这种状态的设备?

相关内容