“内核:设备上的缓冲区 I/O 错误”- 我的服务器是否存在硬件问题?

“内核:设备上的缓冲区 I/O 错误”- 我的服务器是否存在硬件问题?

我们有linux DB服务器redhat 7.2

我们注意到以下有关所有已安装磁盘的许多消息

/var/log/messages

如果此行为与硬件问题相关,我们需要了解什么

Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4980*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4981*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4982*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4983*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4984*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4985*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4986*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4987*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4988*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4989*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4990*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4991*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4992*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4993*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4994*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4995*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4996*
Mar 29 13:28:22 server_DB kernel: Buffer I/O error on device sdb, logical block *N4997*

我们也看到了这个消息

Mar 27 09:18:08 server_DB smartd[1734]: Monitoring 0 ATA and 26 SCSI devices
Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00*CO*/0000:02*CO*': not supported by any plugin
Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00*CO*/0000:02*CO*': not supported by any plugin
Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00*CO*/0000:01*CO*': not supported by any plugin
Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:00/0000:00*CO*/0000:01*CO*': not supported by any plugin
Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:80/0000:80*CO*/0000:81*CO*': not supported by any plugin
Mar 27 09:18:08 server_DB ModemManager[1755]: <warn>  Couldn't find support for device at '/sys/devices/pci0000:80/0000:80*CO*/0000:81*CO*': not supported by any plugin

我也检查了磁盘

smartctl -a -d megaraid,0 /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-327.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST600MM0238
Revision:             BS04
User Capacity:        600,127,266,816 bytes [600 GB]
Logical block size:   512 bytes
Formatted with type 2 protection
Logical block provisioning type unreported, LBPME=0, LBPRZ=0
Rotation Rate:        10000 rpm
Form Factor:          2.5 inches
Logical Unit id:      0x5000c500a0f28343
Serial number:        W0M0LYD2
Device type:          disk
Transport protocol:   SAS
Local Time is:        Wed Mar 27 10:51:30 2019 UTC
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     24 C
Drive Trip Temperature:        60 C

Manufactured in week 45 of year 2017
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  50
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  177
Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 412242328
  Blocks received from initiator = 3213595579
  Blocks read from cache and sent to initiator = 312462212
  Number of read and write commands whose size <= segment size = 31915885
  Number of read and write commands whose size > segment size = 0

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 3178.45
  number of minutes until next internal SMART test = 12

答案1

编写此I/O error消息是为了警告有关 的硬件错误sdb。例如,它可以与磁盘或电缆一起使用。

我认为如果您有大量磁盘同时显示错误,那么磁盘本身不太可能出现错误:-)。这可能是磁盘控制器中的错误。

如果您看到“缓冲区 I/O 错误”,但没有有关 ATA 或 SCSI 错误代码或一般重试尝试的具体消息,这可能会给出一些提示。但我真的不知道:-)。

当然,软件错误可能会导致出现任何消​​息:-​​)。

举一个软件错误的例子,尽管我知道这不是同一个错误:我看到一个内核错误,其中显示“缓冲区 I/O 错误”,但没有任何有关 ATA 或 SCSI 的错误消息或重试尝试。 Fedora 错误 1553979


“缓冲区”部分仅意味着它发生在请求可在页面缓存中缓存的文件数据期间。由于历史原因,人们有时将这些请求称为“缓冲IO”。

相关内容