Spark Cassandra 连接器出了什么问题?您能帮忙解决吗?
Scala 文件:
import com.datastax.spark.connector._
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}
object SparkTest {
def main(args: Array[String]) {
val spark = SparkSession.builder.appName("SparkTest").getOrCreate()
println("Hello World2")
val conf = new SparkConf().set("spark.cassandra.connection.host","localhost")
val sc = new SparkContext(conf)
val rdd1 = sc.cassandraTable("spark_parallalism_test", "test_table1")
println(rdd1.first)
sc.stop()
spark.stop()
}
}
SBT 文件如下所示:
name := "SparkTest"
version := "1.0"
scalaVersion := "2.12.10"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.0.1"
libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector_2.12" % "3.0.0"
要执行的命令:
sbt package
spark-submit --class "SparkTest" --master local[4] target/scala-2.12/sparktest_2.12-1.0.jar
错误日志:
20/12/28 23:41:57 WARN Utils: Your hostname, subhrangshu-Lenovo-V110-15ISK resolves to a loopback address: 127.0.1.1; using 192.168.43.14 instead (on interface wlp2s0)
20/12/28 23:41:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/12/28 23:41:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.NoClassDefFoundError: com/datastax/spark/connector/rdd/reader/RowReaderFactory
at SparkTest.main(SparkTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.rdd.reader.RowReaderFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 13 more
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
请问这个事例中我的错误是什么?
答案1
这个问题更多是针对 StackOverflow 的,但我会在这里回答。
主要原因是sbt package
只编译您的代码,但不将依赖项放入生成的 Jar 文件中。您要么需要从开始spark-submit
,--packages com.datastax.spark:spark-cassandra-connector_2.12:3.0.0
要么可以执行sbt assembly
,但在这种情况下spark-sql
需要声明为provided
。
看Spark Cassandra 连接器文档更多细节。