如何找到出现故障的磁盘(卷)?

如何找到出现故障的磁盘(卷)?

我正在使用普通的 Apache Hadoop 1.1.1,由于以下原因,我无法启动数据节点:

2015-04-23 09:12:48,138 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2015-04-23 09:12:48,152 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2015-04-23 09:12:48,154 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2015-04-23 09:12:48,154 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2015-04-23 09:12:48,254 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2015-04-23 09:12:48,608 INFO org.apache.hadoop.hdfs.server.common.Storage: Cannot access storage directory /hadoop/data/05
2015-04-23 09:12:48,608 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /hadoop/data/05 does not exist.
2015-04-23 09:12:48,731 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.util.DiskChecker$DiskErrorException: Invalid value for volsFailed : 1 , Volumes tolerated : 0
    at org.apache.hadoop.hdfs.server.datanode.FSDataset.<init>(FSDataset.java:974)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:403)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:309)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1651)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1590)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1608)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1734)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1751)

2015-04-23 09:12:48,732 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hadoop03
************************************************************/

现在我知道了这里我可以将容许的故障卷设置为高于零,但我如何找出哪个卷实际上发生了故障?我假设这是一个实际的磁盘故障,因为这是一个相当旧的硬件,但我可以做任何 Hadoop-ish(甚至标准 linux-ish)来调试哪个磁盘发生故障?

相关内容