我已经在我的系统(14.04)中安装了 hadoop-1.0.3,然后本教程。
我成功运行了一个用于 wordcount 的示例 mapreduce 程序,如下所示,
hadoopuser@arul-PC:/usr/local/hadoop$ bin/hadoop jar hadoop*examples*.jar wordcount /user/hadoopuser/SampleData /user/hadoopuser/SampleOutput
14/06/17 15:25:45 INFO input.FileInputFormat: Total input paths to process : 3
14/06/17 15:25:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/06/17 15:25:45 WARN snappy.LoadSnappy: Snappy native library not loaded
14/06/17 15:25:45 INFO mapred.JobClient: Running job: job_201406171444_0002
14/06/17 15:25:46 INFO mapred.JobClient: map 0% reduce 0%
14/06/17 15:26:04 INFO mapred.JobClient: map 66% reduce 0%
14/06/17 15:26:13 INFO mapred.JobClient: map 100% reduce 0%
14/06/17 15:26:16 INFO mapred.JobClient: map 100% reduce 22%
14/06/17 15:26:28 INFO mapred.JobClient: map 100% reduce 100%
14/06/17 15:26:33 INFO mapred.JobClient: Job complete: job_201406171444_0002
14/06/17 15:26:33 INFO mapred.JobClient: Counters: 29
14/06/17 15:26:33 INFO mapred.JobClient: Job Counters
14/06/17 15:26:33 INFO mapred.JobClient: Launched reduce tasks=1
14/06/17 15:26:33 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=33037
14/06/17 15:26:33 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/06/17 15:26:33 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/06/17 15:26:33 INFO mapred.JobClient: Launched map tasks=3
14/06/17 15:26:33 INFO mapred.JobClient: Data-local map tasks=3
14/06/17 15:26:33 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=21208
14/06/17 15:26:33 INFO mapred.JobClient: File Output Format Counters
14/06/17 15:26:33 INFO mapred.JobClient: Bytes Written=880838
14/06/17 15:26:33 INFO mapred.JobClient: FileSystemCounters
14/06/17 15:26:33 INFO mapred.JobClient: FILE_BYTES_READ=2214875
14/06/17 15:26:33 INFO mapred.JobClient: HDFS_BYTES_READ=3671899
14/06/17 15:26:33 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3775759
14/06/17 15:26:33 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=880838
14/06/17 15:26:33 INFO mapred.JobClient: File Input Format Counters
14/06/17 15:26:33 INFO mapred.JobClient: Bytes Read=3671523
14/06/17 15:26:33 INFO mapred.JobClient: Map-Reduce Framework
14/06/17 15:26:33 INFO mapred.JobClient: Map output materialized bytes=1474367
14/06/17 15:26:33 INFO mapred.JobClient: Map input records=77931
14/06/17 15:26:33 INFO mapred.JobClient: Reduce shuffle bytes=1207341
14/06/17 15:26:33 INFO mapred.JobClient: Spilled Records=255966
14/06/17 15:26:33 INFO mapred.JobClient: Map output bytes=6076101
14/06/17 15:26:33 INFO mapred.JobClient: Total committed heap usage (bytes)=517210112
14/06/17 15:26:33 INFO mapred.JobClient: CPU time spent (ms)=11530
14/06/17 15:26:33 INFO mapred.JobClient: Combine input records=629172
14/06/17 15:26:33 INFO mapred.JobClient: SPLIT_RAW_BYTES=376
14/06/17 15:26:33 INFO mapred.JobClient: Reduce input records=102324
14/06/17 15:26:33 INFO mapred.JobClient: Reduce input groups=82335
14/06/17 15:26:33 INFO mapred.JobClient: Combine output records=102324
14/06/17 15:26:33 INFO mapred.JobClient: Physical memory (bytes) snapshot=589725696
14/06/17 15:26:33 INFO mapred.JobClient: Reduce output records=82335
14/06/17 15:26:33 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1862012928
14/06/17 15:26:33 INFO mapred.JobClient: Map output records=629172
当我检查输出文件时,它存在于输出文件夹中,
hadoopuser@arul-PC:/usr/local/hadoop$ bin/hadoop dfs -ls /user/hadoopuser/SampleOutput
Found 3 items
-rw-r--r-- 1 hadoopuser supergroup 0 2014-06-17 15:26 /user/hadoopuser/SampleOutput/_SUCCESS
drwxr-xr-x - hadoopuser supergroup 0 2014-06-17 15:25 /user/hadoopuser/SampleOutput/_logs
-rw-r--r-- 1 hadoopuser 超级组 880838 2014-06-17 15:26 /user/hadoopuser/SampleOutput/part-r-00000
我尝试使用以下命令打开它,
hadoopuser@avvenire-PC:/usr/local/hadoop$ bin/hadoop dfs -cat /user/hadoopuser/SampleOutput/part-r-0000
但我得到的结果如下,
cat: File does not exist: /user/hadoopuser/SampleOutput/part-r-0000
请提供解决方案。提前致谢。
答案1
检查文件名。它是五个 0,而不是四个。
bin/hadoop dfs -cat /用户/hadoopuser/SampleOutput/part-r-00000
答案2
所以我意识到答案已经被接受了,但是当它发生在我身上时,这是解决方案(以防其他人看到这个帖子)。
TLDR;确保您的 hadoop 目录中没有任何冲突的文件夹名称(对我来说它是 /usr/local/hadoop)。
当我生成输出时,我将其放在名为 output/ 的文件夹中,但是在此程序之前,我还有另一个程序也在写入输出,并且我将输出数据保存在 hadoop 目录中名为 output 的文件夹中。这给我带来了问题,因为即使在我运行 时该文件夹没有出现bin/hadoop fs -ls
,但命令bin/hadoop fs -cat output/*
实际上是在搜索我之前生成的文件夹,而不是我刚刚运行的程序的输出。使用 删除该输出目录后rm -rf output/
,问题就消失了。