Wednesday, October 15, 2025

Configuring Polaris Part 1

To vend credentials, Polaris needs an AWS (or other cloud provider) account. But what if you want to talk to several AWS accounts? Well this ticket suggests an interesting workaround. It's saying "yeah, use just one AWS account but if you need to use others, set up a role that allows access to other AWS accounts, accounts outside the one that role lives in."

We are working in a cross cloud environment. We talk to not just AWS but GCP and Azure clouds. We happen to host Polaris in AWS but this choice was arbitrary. We can give Polaris the ability to vend credentials for all clouds no matter where it sits.

Integration with Spark

It's the spark.sql.catalog.YOUR_CATALOG.warehouse SparkConf value that identifies the Polaris catalog.

The YOUR_CATALOG defines the namespace. In fact, the top level value, spark.sql.catalog.YOUR_CATALOG, tells Spark which catalog to use (Hive, Polaris, etc).

So, basically, your config should look something like:

spark.sql.catalog.azure.oauth2.token                                            POLARIS_ACCESS_TOKEN
spark.sql.catalog.azure.client_secret                                                         s3cr3t
spark.sql.catalog.azure.uri                                        http://localhost:8181/api/catalog
spark.sql.catalog.azure.token                                                  POLARIS_ACCESS_TOKEN
spark.sql.catalog.azure.type                                                                    rest
spark.sql.catalog.azure.scope                                                     PRINCIPAL_ROLE:ALL
spark.sql.catalog.azure.client_id                                                               root
spark.sql.catalog.azure.warehouse                                                              azure
spark.sql.catalog.azure.header.X-Iceberg-Access-Delegation                        vended-credentials
spark.sql.catalog.azure.credential                                                       root:s3cr3t
spark.sql.catalog.azure.cache-enabled                                                          false
spark.sql.catalog.azure.rest.auth.oauth2.scope                                    PRINCIPAL_ROLE:ALL
spark.sql.catalog.azure                                        org.apache.iceberg.spark.SparkCatalog 

This is the config specific to my Azure catalog. AWS and GCP would have very similar config.

Local Debugging

Put:

"-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005",

in build.gradle.kts in the 

tasks.named<QuarkusRun>("quarkusRun") {
  jvmArgs =
    listOf(

section then run with:

./gradlew --stop && ./gradlew run

then you'll then be able to remotely debug by attaching to port 5005.

Polaris in integration tests

Naturally, you're going to want to write a suite of regression tests. This is where the wonderful TestContainers shines. You can fire up a Docker container of Polaris in Java code.

There are some configuration issues. AWS and Azure are easy to configure within Polaris. You must just pass them the credentials as environment variables. GCP is a little harder as it's expecting a file of JSON containing its credentials (the Application Default Credentials file). Fortunately, TestContainers allows you to copy that file over once the container has started running.

          myContainer = new GenericContainer<>("apache/polaris:1.1.0-incubating")
                    // AWS
                    .withEnv("AWS_ACCESS_KEY_ID",     AWS_ACCESS_KEY_ID)
                    .withEnv("AWS_SECRET_ACCESS_KEY", AWS_SECRET_ACCESS_KEY)
                    // Azure
                    .withEnv("AZURE_CLIENT_SECRET", AZURE_CLIENT_SECRET)
                    .withEnv("AZURE_CLIENT_ID",     AZURE_CLIENT_ID)
                    .withEnv("AZURE_TENANT_ID",     AZURE_TENANT_ID)
                    // Polaris
                    .withEnv("POLARIS_ID",     POLARIS_ID)
                    .withEnv("POLARIS_SECRET", POLARIS_SECRET)
                    .withEnv("POLARIS_BOOTSTRAP_CREDENTIALS", format("POLARIS,%s,%s", POLARIS_IDPOLARIS_SECRET))
                    // GCP
                    .withEnv("GOOGLE_APPLICATION_CREDENTIALS", GOOGLE_FILE)
                    .waitingFor(Wait.forHttp("/q/health").forPort(8182).forStatusCode(200));
            ;
            myContainer.setPortBindings(List.of("8181:8181", "8182:8182"));
            myContainer.start();
            myContainer.copyFileToContainer(Transferable.of(googleCreds.getBytes()), GOOGLE_FILE);

The other thing you want for a reliable suite of tests is to wait until Polaris starts. Fortunately, Polaris is cloud native and offers a health endpoint which TestContainers can poll.

Polaris in EKS

I found I had to mix both AWS's own library (software.amazon.awssdk:eks:2.34.6) with the official Kubernetes library (io.kubernetes:client-java:24.0.0) before I could interrogate the Kubernetes cluster in AWS from my laptop and look at the logs of the Polaris container. 

        EksClient eksClient = EksClient.builder()
                                       .region(REGION)
                                       .credentialsProvider(DefaultCredentialsProvider.create())
                                       .build();

        DescribeClusterResponse clusterInfo = eksClient.describeCluster(
                DescribeClusterRequest.builder().name(clusterName).build());

        AWSCredentials awsCredentials = new BasicAWSCredentials(
                AWS_ACCESS_KEY_ID,
                AWS_SECRET_ACCESS_KEY);
        var authentication = new EKSAuthentication(new STSSessionCredentialsProvider(awsCredentials),
                                                   region.toString(),
                                                   clusterName);

        ApiClient client = new ClientBuilder()
                .setBasePath(clusterInfo.cluster().endpoint())
                .setAuthentication(authentication)
                .setVerifyingSsl(true)
                .setCertificateAuthority(Base64.getDecoder().decode(clusterInfo.cluster().certificateAuthority().data()))
                .build();
        Configuration.setDefaultApiClient(client);

Now you'll be able to query and monitor Polaris from outside AWS's Kubernetes offering, EKS.