我正在尝试运行 Hadoop pi 示例。它在单个节点上运行没有任何问题。但现在我正在多节点上运行,它给出了以下错误。如果有人能提供建议,请提供。
mapred-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- In: conf/mapred-site.xml -->
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m</value>
</property>
<property>
<name>mapred.shuffle.input.buffer.percent</name>
<value>0.2</value>
</property>
</configuration>
控制台输出:
Number of Maps = 3
Samples per Map = 10
14/10/11 20:34:20 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
14/10/11 20:34:54 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Starting Job
14/10/11 20:34:54 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/10/11 20:34:55 INFO input.FileInputFormat: Total input paths to process : 3
14/10/11 20:34:55 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/10/11 20:34:55 INFO mapreduce.JobSubmitter: number of splits:3
14/10/11 20:34:55 INFO mapreduce.JobSubmitter: adding the following namenodes' delegation tokens:null
14/10/11 20:34:55 INFO mapreduce.Job: Running job: job_201410112034_0001
14/10/11 20:34:56 INFO mapreduce.Job: map 0% reduce 0%
14/10/11 20:35:05 INFO mapreduce.Job: map 33% reduce 0%
14/10/11 20:35:08 INFO mapreduce.Job: map 100% reduce 0%
14/10/11 20:35:14 INFO mapreduce.Job: map 100% reduce 11%
14/10/11 20:35:31 INFO mapreduce.Job: Task Id : attempt_201410112034_0001_r_000000_0, Status : FAILED
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362)
at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
at org.apache.hadoop.mapred.Child.main(Child.java:211)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:234)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:149)
14/10/11 20:35:32 INFO mapreduce.Job: map 100% reduce 0%
14/10/11 20:35:41 INFO mapreduce.Job: map 100% reduce 11%
14/10/11 20:35:49 INFO mapreduce.Job: Task Id : attempt_201410112034_0001_m_000000_0, Status : FAILED
Too many fetch-failures
14/10/11 20:35:49 WARN mapreduce.Job: Error reading task outputhttp://userA:50060/tasklog?plaintext=true&attemptid=attempt_201410112034_0001_m_000000_0&filter=stdout
14/10/11 20:35:49 WARN mapreduce.Job: Error reading task outputhttp://userA:50060/tasklog?plaintext=true&attemptid=attempt_201410112034_0001_m_000000_0&filter=stderr
14/10/11 20:36:13 INFO mapreduce.Job: Task Id : attempt_201410112034_0001_r_000000_1, Status : FAILED
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#2
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362)
at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
at org.apache.hadoop.mapred.Child.main(Child.java:211)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:234)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:149)
14/10/11 20:36:14 INFO mapreduce.Job: map 100% reduce 0%
14/10/11 20:36:22 INFO mapreduce.Job: Task Id : attempt_201410112034_0001_m_000001_0, Status : FAILED
Too many fetch-failures
14/10/11 20:36:22 WARN mapreduce.Job: Error reading task outputhttp://userA:50060/tasklog?plaintext=true&attemptid=attempt_201410112034_0001_m_000001_0&filter=stdout
14/10/11 20:36:22 WARN mapreduce.Job: Error reading task outputhttp://userA:50060/tasklog?plaintext=true&attemptid=attempt_201410112034_0001_m_000001_0&filter=stderr
14/10/11 20:36:23 INFO mapreduce.Job: map 100% reduce 11%
14/10/11 20:36:32 INFO mapreduce.Job: map 100% reduce 100%
14/10/11 20:36:34 INFO mapreduce.Job: Job complete: job_201410112034_0001
14/10/11 20:36:34 INFO mapreduce.Job: Counters: 33
FileInputFormatCounters
BYTES_READ=354
FileSystemCounters
FILE_BYTES_READ=72
FILE_BYTES_WRITTEN=252
HDFS_BYTES_READ=765
HDFS_BYTES_WRITTEN=215
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=1
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
Job Counters
Data-local map tasks=5
Total time spent by all maps waiting after reserving slots (ms)=0
Total time spent by all reduces waiting after reserving slots (ms)=0
SLOTS_MILLIS_MAPS=11950
SLOTS_MILLIS_REDUCES=80809
Launched map tasks=5
Launched reduce tasks=3
Map-Reduce Framework
Combine input records=0
Combine output records=0
Failed Shuffles=1
GC time elapsed (ms)=6
Map input records=3
Map output bytes=54
Map output records=6
Merged Map outputs=3
Reduce input groups=2
Reduce input records=6
Reduce output records=0
Reduce shuffle bytes=84
Shuffled Maps =3
Spilled Records=12
SPLIT_RAW_BYTES=411
Job Finished in 100.067 seconds
Estimated value of Pi is 3.60000000000000000000
答案1
导致此错误的一个原因可能是 Hadoop 集群中机器之间的通信无法正常工作。机器应该能够互相 ping 通(主服务器和从服务器之间,从服务器之间也可以)。根据您的设置,您可能需要修改/etc/hosts
机器上的文件,以便它们能够通过主机名互相 ping 通。
例如/etc/hosts
可以配置如下:
127.0.0.1 localhost
<ipslave1> slave1
<ipmaster> master
<ipslave2> slave2