I'm trying to talk on my host machine to a docker container running Spark. Unfortunately, upon connection I see:
Caused by: java.lang.RuntimeException: java.io.InvalidClassException: org.apache.spark.rpc.netty.RpcEndpointVerifier$CheckExistence; local class incompatible: stream classdesc serialVersionUID = 5378738997755484868, local class serialVersionUID = 7789290765573734431
This appears to be something of a known issue with this container.
We get the client code's classpath using SBT [SO]:
sbt "export runtime:fullClasspath"
and taking the classpath for my module, we run:
serialver -classpath $FROM_ABOVE org.apache.spark.rpc.netty.RpcEndpointVerifier\$CheckExistence
which yields:
org.apache.spark.rpc.netty.RpcEndpointVerifier$CheckExistence: private static final long serialVersionUID = 5378738997755484868L;
(The error message is coming from Spark master).
On the host, we login to the master container with:
docker exec -it $(docker ps | grep spark-master:3.2.1 | awk '{print $1}') /bin/bash
where we run:
for FILE in $(find spark/jars/) ; do { echo $FILE ; unzip -l $FILE | grep CheckExistence ; } done
and discover the class is in spark/jars/spark-core_2.12-3.2.1.jar. Hmm, the name suggests this Docker container has a Spark instance built with Scala 2.12 and I'm using Scala 3 which is compatible with 2.13 but not (apparently) 2.12.
Mounting the host file system from the container:
docker run -it -v /tmp:/mnt/disk bde2020/spark-master:3.2.1-hadoop3.2 /bin/bash
I copied all the Spark jars to a temporary folder. Then on the host:
CP="" ; for JAR in $(ls /tmp/jars/*.jar) ; do { CP="$CP:$JAR" ; } done
serialver -classpath $CP org.apache.spark.rpc.netty.RpcEndpointVerifier\$CheckExistence
yielded 7789290765573734431L.
So, at this point it appears I am SooL and need to get a new container.