我有一个 50 GB 的 MapReduce 作业,它(基本上)是一个 WordCount 应用程序,并且它具有以下 Map/Reduce 作业百分比(列在问题末尾)。似乎 Reducer 想要等到 Mapper 完全完成后才能开始工作。这是正常行为吗?如果不是,我该如何排查为什么会发生这种情况并进行更改?
从结尾的减少百分比来看,似乎不是一个巨大的影响必须等到最后,因为 Reduce 部分大约需要 5 分钟,而 Map 部分大约需要 35 分钟,但是如果我可以让 Reducer 在 Mappers 执行其操作时继续工作,那么可以减少这 5 分钟。
15/02/09 09:14:38 INFO mapred.JobClient: map 0% reduce 0%
15/02/09 09:17:08 INFO mapred.JobClient: map 1% reduce 0%
15/02/09 09:18:04 INFO mapred.JobClient: map 2% reduce 0%
15/02/09 09:18:34 INFO mapred.JobClient: map 3% reduce 0%
15/02/09 09:18:51 INFO mapred.JobClient: map 4% reduce 0%
15/02/09 09:19:10 INFO mapred.JobClient: map 5% reduce 0%
15/02/09 09:19:30 INFO mapred.JobClient: map 6% reduce 0%
15/02/09 09:19:48 INFO mapred.JobClient: map 7% reduce 0%
15/02/09 09:20:02 INFO mapred.JobClient: map 7% reduce 1%
15/02/09 09:20:07 INFO mapred.JobClient: map 8% reduce 1%
15/02/09 09:20:37 INFO mapred.JobClient: map 9% reduce 1%
15/02/09 09:20:49 INFO mapred.JobClient: map 9% reduce 2%
15/02/09 09:20:54 INFO mapred.JobClient: map 10% reduce 2%
15/02/09 09:20:58 INFO mapred.JobClient: map 10% reduce 3%
15/02/09 09:21:08 INFO mapred.JobClient: map 11% reduce 3%
15/02/09 09:21:25 INFO mapred.JobClient: map 12% reduce 3%
15/02/09 09:21:47 INFO mapred.JobClient: map 13% reduce 3%
15/02/09 09:22:09 INFO mapred.JobClient: map 14% reduce 3%
15/02/09 09:22:23 INFO mapred.JobClient: map 14% reduce 4%
15/02/09 09:22:30 INFO mapred.JobClient: map 15% reduce 4%
15/02/09 09:22:47 INFO mapred.JobClient: map 16% reduce 4%
15/02/09 09:22:57 INFO mapred.JobClient: map 16% reduce 5%
15/02/09 09:23:09 INFO mapred.JobClient: map 17% reduce 5%
15/02/09 09:23:19 INFO mapred.JobClient: map 18% reduce 5%
15/02/09 09:23:36 INFO mapred.JobClient: map 19% reduce 5%
15/02/09 09:23:55 INFO mapred.JobClient: map 20% reduce 5%
15/02/09 09:24:19 INFO mapred.JobClient: map 21% reduce 5%
15/02/09 09:24:38 INFO mapred.JobClient: map 22% reduce 5%
15/02/09 09:24:57 INFO mapred.JobClient: map 23% reduce 5%
15/02/09 09:25:10 INFO mapred.JobClient: map 24% reduce 5%
15/02/09 09:25:27 INFO mapred.JobClient: map 25% reduce 5%
15/02/09 09:25:51 INFO mapred.JobClient: map 26% reduce 5%
15/02/09 09:26:09 INFO mapred.JobClient: map 27% reduce 5%
15/02/09 09:26:19 INFO mapred.JobClient: map 28% reduce 5%
15/02/09 09:26:35 INFO mapred.JobClient: map 29% reduce 5%
15/02/09 09:26:49 INFO mapred.JobClient: map 30% reduce 5%
15/02/09 09:27:06 INFO mapred.JobClient: map 31% reduce 5%
15/02/09 09:27:18 INFO mapred.JobClient: map 32% reduce 5%
15/02/09 09:27:42 INFO mapred.JobClient: map 33% reduce 5%
15/02/09 09:27:51 INFO mapred.JobClient: map 34% reduce 5%
15/02/09 09:28:07 INFO mapred.JobClient: map 35% reduce 5%
15/02/09 09:28:26 INFO mapred.JobClient: map 36% reduce 5%
15/02/09 09:28:53 INFO mapred.JobClient: map 37% reduce 5%
15/02/09 09:29:10 INFO mapred.JobClient: map 38% reduce 5%
15/02/09 09:29:19 INFO mapred.JobClient: map 39% reduce 5%
15/02/09 09:29:37 INFO mapred.JobClient: map 40% reduce 5%
15/02/09 09:29:57 INFO mapred.JobClient: map 41% reduce 5%
15/02/09 09:30:13 INFO mapred.JobClient: map 42% reduce 5%
15/02/09 09:30:26 INFO mapred.JobClient: map 43% reduce 5%
15/02/09 09:30:47 INFO mapred.JobClient: map 44% reduce 5%
15/02/09 09:31:03 INFO mapred.JobClient: map 45% reduce 5%
15/02/09 09:31:12 INFO mapred.JobClient: map 46% reduce 5%
15/02/09 09:31:30 INFO mapred.JobClient: map 47% reduce 5%
15/02/09 09:31:40 INFO mapred.JobClient: map 48% reduce 5%
15/02/09 09:31:59 INFO mapred.JobClient: map 49% reduce 5%
15/02/09 09:32:15 INFO mapred.JobClient: map 50% reduce 5%
15/02/09 09:32:28 INFO mapred.JobClient: map 51% reduce 5%
15/02/09 09:32:45 INFO mapred.JobClient: map 52% reduce 5%
15/02/09 09:32:56 INFO mapred.JobClient: map 53% reduce 5%
15/02/09 09:33:18 INFO mapred.JobClient: map 54% reduce 5%
15/02/09 09:33:38 INFO mapred.JobClient: map 55% reduce 5%
15/02/09 09:33:40 INFO mapred.JobClient: map 55% reduce 0%
15/02/09 09:33:51 INFO mapred.JobClient: Task Id : attempt_201306131151_3706_r_000000_0, Status : FAILED
Task attempt_201306131151_3706_r_000000_0 failed to report status for 600 seconds. Killing!
15/02/09 09:33:55 INFO mapred.JobClient: map 56% reduce 0%
15/02/09 09:34:08 INFO mapred.JobClient: map 57% reduce 0%
15/02/09 09:34:35 INFO mapred.JobClient: map 58% reduce 0%
15/02/09 09:34:44 INFO mapred.JobClient: map 58% reduce 1%
15/02/09 09:35:02 INFO mapred.JobClient: map 59% reduce 1%
15/02/09 09:35:18 INFO mapred.JobClient: map 60% reduce 1%
15/02/09 09:35:25 INFO mapred.JobClient: map 60% reduce 2%
15/02/09 09:35:39 INFO mapred.JobClient: map 61% reduce 2%
15/02/09 09:36:06 INFO mapred.JobClient: map 62% reduce 3%
15/02/09 09:36:25 INFO mapred.JobClient: map 63% reduce 3%
15/02/09 09:36:49 INFO mapred.JobClient: map 63% reduce 4%
15/02/09 09:36:52 INFO mapred.JobClient: map 64% reduce 4%
15/02/09 09:37:07 INFO mapred.JobClient: map 65% reduce 4%
15/02/09 09:37:31 INFO mapred.JobClient: map 66% reduce 4%
15/02/09 09:37:51 INFO mapred.JobClient: map 67% reduce 4%
15/02/09 09:38:10 INFO mapred.JobClient: map 68% reduce 4%
15/02/09 09:38:19 INFO mapred.JobClient: map 69% reduce 4%
15/02/09 09:38:43 INFO mapred.JobClient: map 70% reduce 4%
15/02/09 09:39:03 INFO mapred.JobClient: map 71% reduce 4%
15/02/09 09:39:24 INFO mapred.JobClient: map 72% reduce 4%
15/02/09 09:39:42 INFO mapred.JobClient: map 73% reduce 4%
15/02/09 09:40:00 INFO mapred.JobClient: map 74% reduce 4%
15/02/09 09:40:29 INFO mapred.JobClient: map 75% reduce 4%
15/02/09 09:41:13 INFO mapred.JobClient: map 76% reduce 4%
15/02/09 09:41:31 INFO mapred.JobClient: map 77% reduce 4%
15/02/09 09:41:54 INFO mapred.JobClient: map 78% reduce 4%
15/02/09 09:42:06 INFO mapred.JobClient: map 79% reduce 4%
15/02/09 09:42:31 INFO mapred.JobClient: map 80% reduce 4%
15/02/09 09:43:02 INFO mapred.JobClient: map 81% reduce 4%
15/02/09 09:43:28 INFO mapred.JobClient: map 82% reduce 4%
15/02/09 09:43:53 INFO mapred.JobClient: map 83% reduce 4%
15/02/09 09:44:07 INFO mapred.JobClient: map 84% reduce 4%
15/02/09 09:44:23 INFO mapred.JobClient: map 85% reduce 4%
15/02/09 09:44:36 INFO mapred.JobClient: map 86% reduce 4%
15/02/09 09:44:49 INFO mapred.JobClient: map 87% reduce 4%
15/02/09 09:45:15 INFO mapred.JobClient: map 88% reduce 4%
15/02/09 09:45:42 INFO mapred.JobClient: map 89% reduce 4%
15/02/09 09:45:58 INFO mapred.JobClient: map 90% reduce 4%
15/02/09 09:46:28 INFO mapred.JobClient: map 91% reduce 4%
15/02/09 09:46:42 INFO mapred.JobClient: map 92% reduce 4%
15/02/09 09:46:57 INFO mapred.JobClient: map 93% reduce 4%
15/02/09 09:47:16 INFO mapred.JobClient: map 94% reduce 4%
15/02/09 09:47:28 INFO mapred.JobClient: map 95% reduce 4%
15/02/09 09:47:45 INFO mapred.JobClient: map 96% reduce 4%
15/02/09 09:48:09 INFO mapred.JobClient: map 97% reduce 4%
15/02/09 09:48:29 INFO mapred.JobClient: map 98% reduce 4%
15/02/09 09:48:31 INFO mapred.JobClient: map 98% reduce 0%
15/02/09 09:48:38 INFO mapred.JobClient: map 99% reduce 0%
15/02/09 09:48:44 INFO mapred.JobClient: Task Id : attempt_201306131151_3706_r_000000_1, Status : FAILED
Task attempt_201306131151_3706_r_000000_1 failed to report status for 600 seconds. Killing!
15/02/09 09:49:16 INFO mapred.JobClient: map 99% reduce 1%
15/02/09 09:49:25 INFO mapred.JobClient: map 99% reduce 2%
15/02/09 09:49:31 INFO mapred.JobClient: map 99% reduce 3%
15/02/09 09:49:38 INFO mapred.JobClient: map 100% reduce 4%
15/02/09 09:49:48 INFO mapred.JobClient: map 100% reduce 5%
15/02/09 09:50:02 INFO mapred.JobClient: map 100% reduce 6%
15/02/09 09:50:05 INFO mapred.JobClient: map 100% reduce 7%
15/02/09 09:50:12 INFO mapred.JobClient: map 100% reduce 8%
15/02/09 09:50:22 INFO mapred.JobClient: map 100% reduce 9%
15/02/09 09:50:27 INFO mapred.JobClient: map 100% reduce 10%
15/02/09 09:50:36 INFO mapred.JobClient: map 100% reduce 11%
15/02/09 09:50:42 INFO mapred.JobClient: map 100% reduce 12%
15/02/09 09:50:45 INFO mapred.JobClient: map 100% reduce 13%
15/02/09 09:50:56 INFO mapred.JobClient: map 100% reduce 14%
15/02/09 09:51:02 INFO mapred.JobClient: map 100% reduce 15%
15/02/09 09:51:05 INFO mapred.JobClient: map 100% reduce 16%
15/02/09 09:51:11 INFO mapred.JobClient: map 100% reduce 17%
15/02/09 09:51:17 INFO mapred.JobClient: map 100% reduce 18%
15/02/09 09:51:30 INFO mapred.JobClient: map 100% reduce 19%
15/02/09 09:51:39 INFO mapred.JobClient: map 100% reduce 20%
15/02/09 09:51:45 INFO mapred.JobClient: map 100% reduce 21%
15/02/09 09:51:48 INFO mapred.JobClient: map 100% reduce 22%
15/02/09 09:51:54 INFO mapred.JobClient: map 100% reduce 23%
15/02/09 09:52:00 INFO mapred.JobClient: map 100% reduce 24%
15/02/09 09:52:03 INFO mapred.JobClient: map 100% reduce 25%
15/02/09 09:52:07 INFO mapred.JobClient: map 100% reduce 26%
15/02/09 09:52:19 INFO mapred.JobClient: map 100% reduce 27%
15/02/09 09:52:22 INFO mapred.JobClient: map 100% reduce 28%
15/02/09 09:52:28 INFO mapred.JobClient: map 100% reduce 29%
15/02/09 09:52:34 INFO mapred.JobClient: map 100% reduce 30%
15/02/09 09:52:37 INFO mapred.JobClient: map 100% reduce 31%
15/02/09 09:52:46 INFO mapred.JobClient: map 100% reduce 32%
15/02/09 09:52:49 INFO mapred.JobClient: map 100% reduce 33%
15/02/09 09:53:31 INFO mapred.JobClient: map 100% reduce 66%
15/02/09 09:53:34 INFO mapred.JobClient: map 100% reduce 69%
15/02/09 09:53:37 INFO mapred.JobClient: map 100% reduce 70%
15/02/09 09:53:40 INFO mapred.JobClient: map 100% reduce 72%
15/02/09 09:53:43 INFO mapred.JobClient: map 100% reduce 73%
15/02/09 09:53:46 INFO mapred.JobClient: map 100% reduce 74%
15/02/09 09:53:49 INFO mapred.JobClient: map 100% reduce 76%
15/02/09 09:53:52 INFO mapred.JobClient: map 100% reduce 77%
15/02/09 09:53:55 INFO mapred.JobClient: map 100% reduce 78%
15/02/09 09:53:58 INFO mapred.JobClient: map 100% reduce 80%
15/02/09 09:54:01 INFO mapred.JobClient: map 100% reduce 81%
15/02/09 09:54:04 INFO mapred.JobClient: map 100% reduce 82%
15/02/09 09:54:07 INFO mapred.JobClient: map 100% reduce 84%
15/02/09 09:54:10 INFO mapred.JobClient: map 100% reduce 85%
15/02/09 09:54:13 INFO mapred.JobClient: map 100% reduce 86%
15/02/09 09:54:16 INFO mapred.JobClient: map 100% reduce 88%
15/02/09 09:54:19 INFO mapred.JobClient: map 100% reduce 89%
15/02/09 09:54:22 INFO mapred.JobClient: map 100% reduce 90%
15/02/09 09:54:25 INFO mapred.JobClient: map 100% reduce 92%
15/02/09 09:54:28 INFO mapred.JobClient: map 100% reduce 93%
15/02/09 09:54:31 INFO mapred.JobClient: map 100% reduce 94%
15/02/09 09:54:35 INFO mapred.JobClient: map 100% reduce 96%
15/02/09 09:54:38 INFO mapred.JobClient: map 100% reduce 97%
15/02/09 09:54:41 INFO mapred.JobClient: map 100% reduce 98%
15/02/09 09:54:44 INFO mapred.JobClient: map 100% reduce 100%
答案1
这是设计使然,因为reduce()
算法在开始之前需要语义保证(又称其先决条件)。这是理解 MapReduce 工作原理的核心方面之一。在尝试实际使用 MapReduce 之前学习理论是明智的,以避免将来出现此类混淆。
这里有资料直接指出,Reduce 算法只有在 Map 完成之后才能开始。
请记住,尽管理论上可能让你编写一个“MapReduce”的实现(或者留给开发人员编写的算法/函子),这样 Reduce可以在映射完成之前开始,这样做会破坏标准 MapReduce 设计的“契约”。所以你实际上不会使用恰当的MapReduce 就是其中之一。你必须非常非常小心,以确保违反该合同不会导致某些竞争条件或锁定问题。
需要记住的是,MapReduce 框架的设计契约是有特定原因的;它是为了同时最大化数据安全性、容错性和性能。违反契约意味着从那时起你就有责任做你的自己的分析,以说服自己保留了官方 MapReduce 承诺的相同保证(或说服自己不在乎如果这些保证得不到满足,那么你就无法再对 MapReduce 进行任何修改(如果这些保证得不到满足)。在这种情况下,一旦你修改了(例如)Hadoop 的源代码以满足你的需求,那么最终的产品将不再是 MapReduce,因为 MapReduce 的契约将被打破。