Agile Java Man: Spark Operator

Wednesday, November 5, 2025

Spark Operator

I've found that managing Spark clusters in Kubernetes is far easier using the Spark Operator. Here are some commands that helped me diagnose issues.

Dude, where's my appliction?

List your Spark applications with:

kubectl get sparkapplications

It can be annoying when you can't delete a sparkapplication with

kubectl delete sparkapplication YOUR_APP

even though it's running. In my case, I thought a

kubectl rollout restart deployment spark-kubernetes-operator

left an orphaned cluster.

It's possible that you don't see anything even though there are Spark pods clearly there. In this case:

kubectl describe pod POD_NAME

and you should see something like:

...

Controlled By: StatefulSet/XXX

...

Great, so it looks like the Spark Operator has set the cluster up by delegating to Kubernetes primitives. Let's see them:

kubectl get statefulsets

and then we can just:

kubectl delete statefulset XXX

OK, so, dude, where's my cluster

But we're barking up the wrong tree. The YAML to create a cluster has kind: SparkCluster so we're using the wrong CRD with sparkapplications.

kubectl get crd | grep spark

sparkclusters.spark.apache.org 2025-11-04T10:52:56Z

...

Right, so now:

kubectl delete sparkclusters YOUR_CLUSTER

Python

As a little aside, I was seeing strange errors when running PySpark commands that appeared to be a versioning problems. A few commands that came in useful were:

import sys

print(sys.path)

to print where the Python executable was getting its libraries from and:

from pyspark.version import __version__

print(__version__)

to make sure we really did have the correct PySpark version.

As it happened, it was the wrong version of the Iceberg runtime in spark.jars.packages.

Agile Java Man

Wednesday, November 5, 2025

Spark Operator

No comments:

Post a Comment

Blog Archive

About Me