Saturday, May 17, 2025

DevOps lessons from Polaris and Iceberg

The Apache Iceberg and Polaris code bases make pretty good reading. There's some nice DevOps work here. Here are a few tidbits.

Iceberg's Kafka Connectors

As part of Iceberg's integration tests, Docker compose is used to fire up a Kafka Connect container. Interestingly, this container mounts the directory holding Iceberg's artifacts so it instantly has the latest implementation of Iceberg's Kafka Connect's SinkConnector and SinkTask. The test suite then starts the connector with a REST call (that's the nature of Kafka Connect) that contains all of the connector's config.

This docker-compose.yml also starts a MinIO container so Kafka Connects thinks it's writing to an AWS S3 bucket - all of this in on one self-contained laptop, no config required. (It uses TestContainers to do the heavy lifting of talking to the Docker daemon).

TestContainers has a very cool trick for cleaning up the containers when the tests are finished. It starts a sidecar called Ryuk that listens on a port connected to the JVM and kills the containers when that connection closes. You can see it while the tests are running with:

$ docker ps
CONTAINER ID   IMAGE                                 COMMAND                  CREATED         STATUS                            PORTS                                                           NAMES
...
f7c69b891330   testcontainers/ryuk:0.11.0            "/bin/ryuk"              5 seconds ago   Up 5 seconds                      0.0.0.0:32772->8080/tcp, :::32772->8080/tcp                     testcontainers-ryuk-3c88ed17-ec3d-4ce9-8830-dbbc1ca86294

Kind - K8s lite

You can run Polaris in a Kind cluster. Kind is Kubernetes in Docker. Although it is not CNCF compliant itself, it runs a CNCF compliant version of Kubernetes, just inside Docker containers. So, although it is not appropriate for a multi-node environment, Kind is great for development on a single machine.

Kind starts two Docker containers (kind-control-plane and kind-registry) as well as updating your ~/.kube/config file. If you ./run.sh in the Polaris codebase, you will see it start Kubernetes pods like etcdcore-dns and kube-proxy as well as a polaris container.

Metadata file

Iceberg creates a file called iceberg-build.properties when it's built. Having your project do this can be enormously useful when you've ssh-ed into a box and wondering exactly what version is running there because it's a test environment and nobody has keeping tracking of what is deployed where (ie, the real world). 

Iceberg is built with Gradle so uses the com.gorylenko.gradle-git-properties plugin but there appears to be an equivalent for Maven (git-commit-id-plugin).

Quarkus

Polaris has become heavily dependent on Quarkus. The Docker container just runs quarkus-run.jar. This Jar's main class is io.quarkus.bootstrap.runner.QuarkusEntryPoint. This loads /deployments/quarkus/quarkus-application.dat, a binary file that loads all the Polaris guff. Apparently, it's a binary file to minimise start up times.

The Docker image's entrypoint is /opt/jboss/container/java/run/run-java.sh. This script comes FROM the Dockerfile's Quarkus-friendly base image and contains sensible JVM defaults.

No comments:

Post a Comment