问题
我正在尝试在 Hadoop 集群上安装 Spark。我已经安装并测试了 Hadoop。我可以探索 HDFS 并运行 MapReduce 示例。但是,当我尝试安装 Spark 时,我无法运行它,因为它无法启动并出现 EOFException。
系统信息:
- Rocky Linux 8.8
- 内核 4.18.0
- Hadoop-3.3.6
- spark-3.5.0-bin-不带 hadoop
设置:
- node32.cluster-主节点
- node[33-35].cluster-HDFS 和计算节点
笔记
/opt/
我有带有 Hadoop 和 Spark 的NFS 目录,并/home
在所有节点上共享,因此我确信配置是相同的。- 我
hadoop:hadoop
在所有节点都有用户
Hadoop 配置
核心站点.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://node32.cluster:10000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/mnt/hadoop/data/name_node</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/mnt/hadoop/data/data_node</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
mapred-站点.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
</configuration>
纱线-站点.xml
<configuration>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node32</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
<value>true</value>
</property>
</configuration>
Spark 配置
spark-defaults.conf
spark.master yarn
spark.eventLog.enabled true
spark.eventLog.dir hdfs://node32.cluster:9000/spark-logs
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.history.fs.logDirectory hdfs://node32.cluster:9000/spark-logs
spark.history.fs.update.interval 10s
spark.history.ui.port 18
spark-env.sh
export SPARK_DIST_CLASSPATH="$(hadoop classpath)"
环境
export HADOOP_HOME="/opt/hadoop-3.3.6"
export HADOOP_CONF_DIR="$HADOOP_HOME/etc/hadoop"
export SPARK_HOME="/opt/spark-3.5.0-bin-without-hadoop"
export LD_LIBRARY_PATH="$HADOOP_HOME/lib/native:$LD_LIBRARY_PATH"
运行与问题
$HADOOP_HOME/sbin/start-all.sh
主服务器上的 JPS 结果
4069901 ResourceManager
4077965 Master
4068697 SecondaryNameNode
4078614 Worker
4088135 Jps
4067905 NameNode
其他节点上的 JPS 结果
3350467 NodeManager
3357158 Jps
3349966 DataNode
运行示例
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--num-executors 1 \
$SPARK_HOME/examples/jars/spark-examples_2.12-3.5.0.jar
来自节点的错误日志
02:41:16.548 [Driver] ERROR org.apache.spark.SparkContext - Error initializing SparkContext.
java.io.IOException: DestHost:destPort node32.cluster:9000 , LocalHost:localPort node34/192.168.100.34:0. Failed on local exception: java.io.IOException: java.io.EOFException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_392]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_392]
...
运行spark-shell,结果相同
spark-shell
错误日志
02:47:31.499 [main] ERROR org.apache.spark.SparkContext - Error initializing SparkContext.
java.io.EOFException: End of File Exception between local host is: "node32/192.168.100.32"; destination host is: "node32.cluster":9000; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_392]
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_392]
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_392]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_392]
...
我看了维基百科,但我不明白其根本原因是什么。
结论
有人能帮忙解决这个问题吗?