本地模式下的 Pig 教程返回 OutOfMemoryError

本地模式下的 Pig 教程返回 OutOfMemoryError

我是 Pig 新手,我尝试从 Apache 网站上的 Pig 教程中学习。我正在使用 Hadoop 1.0.1 和 PigVersion 0.11.1。在教程中,建议在两个示例(称为 script1-local.pig 和 script2-local.pig)上尝试 Pig。但是当我尝试使用以下 CLI 运行第一个示例时:

pig -x local script1-local.pig

我收到以下错误:(它似乎与 Java 堆大小有关......)

2013-07-16 15:26:59,033 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53
2013-07-16 15:26:59,033 [main] INFO  org.apache.pig.Main - Logging error messages to: /root/pigtmp/pig_1373988419029.log
2013-07-16 15:26:59,246 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /root/.pigbootup not found
2013-07-16 15:26:59,332 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2013-07-16 15:27:00,287 [main] WARN  org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 3 time(s).
2013-07-16 15:27:00,287 [main] WARN  org.apache.pig.PigServer - Encountered Warning USING_OVERLOADED_FUNCTION 3 time(s).
2013-07-16 15:27:00,300 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,ORDER_BY,DISTINCT,FILTER
2013-07-16 15:27:00,446 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2013-07-16 15:27:00,469 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2013-07-16 15:27:00,492 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 5
2013-07-16 15:27:00,493 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 5
2013-07-16 15:27:00,518 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-07-16 15:27:00,533 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-07-16 15:27:00,536 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2013-07-16 15:27:00,537 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=208348
2013-07-16 15:27:00,537 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2013-07-16 15:27:00,557 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2013-07-16 15:27:00,563 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2013-07-16 15:27:00,563 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2013-07-16 15:27:00,563 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Distributed cache not supported or needed in local mode. Setting key [pig.schematuple.local.dir] with code temp directory: /tmp/1373988420563-0
2013-07-16 15:27:00,564 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting identity combiner class.
2013-07-16 15:27:00,637 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2013-07-16 15:27:00,648 [JobControl] WARN  org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2013-07-16 15:27:00,652 [JobControl] WARN  org.apache.hadoop.mapred.JobClient - No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
****file:/root/pigtmp/excite-small.log
2013-07-16 15:27:00,681 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-07-16 15:27:00,681 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2013-07-16 15:27:00,688 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2013-07-16 15:27:00,922 [Thread-3] INFO  org.apache.hadoop.util.ProcessTree - setsid exited with exit code 0
2013-07-16 15:27:00,925 [Thread-3] INFO  org.apache.hadoop.mapred.Task -  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1b4b0967
2013-07-16 15:27:00,936 [Thread-3] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed file:/root/pigtmp/excite-small.log:0+208348
2013-07-16 15:27:00,942 [Thread-3] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2013-07-16 15:27:01,059 [Thread-3] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
2013-07-16 15:27:01,138 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
2013-07-16 15:27:01,138 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases clean1,clean2,houred,ngramed1,raw
2013-07-16 15:27:01,138 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: raw[28,6],clean1[31,9],clean2[-1,-1],houred[39,9],ngramed1[42,11] C:  R: 
2013-07-16 15:27:01,145 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2013-07-16 15:27:01,148 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2013-07-16 15:27:01,148 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0001 has failed! Stop running all dependent jobs
2013-07-16 15:27:01,149 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-07-16 15:27:01,149 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2013-07-16 15:27:01,149 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode. Stats reported below may be incomplete
2013-07-16 15:27:01,150 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion  UserId  StartedAt   FinishedAt  Features
1.0.1   0.11.1  root    2013-07-16 15:27:00 2013-07-16 15:27:01 GROUP_BY,ORDER_BY,DISTINCT,FILTER

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_local_0001  clean1,clean2,houred,ngramed1,raw   DISTINCT    Message: Job failed! Error - NA 

Input(s):
Failed to read data from "file:///root/pigtmp/excite-small.log"

Output(s):

Job DAG:
job_local_0001  ->  null,
null    ->  null,
null    ->  null,
null    ->  null,
null


2013-07-16 15:27:01,150 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2013-07-16 15:27:01,152 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile: /root/pigtmp/pig_1373988419029.log
2013-07-16 15:27:01,153 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile: /root/pigtmp/pig_1373988419029.log
2013-07-16 15:27:01,154 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile: /root/pigtmp/pig_1373988419029.log
2013-07-16 15:27:01,154 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile: /root/pigtmp/pig_1373988419029.log
2013-07-16 15:27:01,155 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
Details at logfile: /root/pigtmp/pig_1373988419029.log

有人遇到过同样的问题吗?

谢谢。

答案1

正如本文所提到的回答,您可以尝试增加最大内存分配空间:

export HADOOP_CLIENT_OPTS="-Xmx1024m"

相关内容