使用 Python 3.5 进行 Hadoop 流式传输:java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为 127

使用 Python 3.5 进行 Hadoop 流式传输:java.lang.RuntimeException:PipeMapRed.waitOutputThreads():子进程失败,代码为 127

我正在尝试在基于 VMware Workstation VM 构建的集群上使用 Hadoop Streaming 运行我自己的映射器和化简器 Python 脚本。

所有虚拟机上的 Hadoop 版本为 2.7、Python 为 3.5、操作系统为 CentOS 7.2。

我有一台单独的机器,它充当客户端应用程序主机的角色,并将 mapreduce 作业提交给资源管理器。Map 和 Reduce 脚本也存储在那里。我使用以下 hadoop 命令来运行作业:

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar -output result1 -input /user/hadoop/hr/profiles -file /home/hadoop/map.py -mapper map.py -file /home/hadoop/reduce.py -reducer reduce.py

我还尝试在 -mapper 和 -reducer 脚本之前插入“python3”解释器:

hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar -output result1 -input /user/hadoop/hr/profiles -file /home/hadoop/map.py -mapper "python3.5 map.py" -file /home/hadoop/reduce.py -reducer "python3.5 reduce.py"

然而,作业总是失败,并且我仍然在日志中收到相同的错误消息:

2016-10-07 21:57:10,485 INFO [IPC Server handler 1 on 41498] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1475888525921_0004_m_000001_0 is : 0.0
2016-10-07 21:57:10,520 FATAL [IPC Server handler 2 on 41498] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1475888525921_0004_m_000001_0 - exited : java.lang.RuntimeException: **PipeMapRed.waitOutputThreads(): subprocess failed with code 127**
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2016-10-07 21:57:10,520 INFO [IPC Server handler 2 on 41498] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1475888525921_0004_m_000001_0: Error: java.lang.RuntimeException: **PipeMapRed.waitOutputThreads(): subprocess failed with code 127**
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2016-10-07 21:57:10,521 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1475888525921_0004_m_000001_0: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2016-10-07 21:57:10,523 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1475888525921_0004_m_000001_0 TaskAttempt Transitioned from RUNNING to FAIL_CONTAINER_CLEANUP
2016-10-07 21:57:10,523 INFO [ContainerLauncher #2] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1475888525921_0004_01_000003 taskAttempt attempt_1475888525921_0004_m_000001_0
2016-10-07 21:57:10,524 INFO [ContainerLauncher #2] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1475888525921_0004_m_000001_0
2016-10-07 21:57:10,524 INFO [ContainerLauncher #2] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : slave-1:56838

Python 3.5 解释器安装在集群中的所有虚拟机上,其路径也添加到系统的 PATH 中。我可以使用以下命令在所有节点上启动解释器python3.5命令。

我尝试在 NameNode 上使用相同的脚本运行相同的命令,并且成功了。所以这似乎是 HDFS 安全问题。

我已经阅读了许多与此问题相关的帖子,并尝试了所有建议的方法,但仍然没有进展。

我已经尝试过以下操作:

  • 禁用 dfs 权限
  • 在 map 和 Reduce 脚本中将 stdout 重定向到 sdterr
  • 因为我用的是虚拟机,所以我减少了容器的 RAM 和 CPU 要求:256MB 和 1 核
  • 在“hadoop jar”命令中的 -mapper 和 -reducer 选项之前添加“python3”解释器
  • 我在脚本中将 CRLF 替换为 Unix LF

我的所有脚本都有/opt/rh/rh-python35/root/usr/bin/python3.5指向解释器位置的字符串。我已经多次测试了我的脚本 - 它们运行良好。

我对这个主题完全陌生,现在我陷入困境。如果您知道如何解决这个问题,请分享您的经验。提前致谢。

相关内容