Wednesday, February 11, 2026

My Polaris PR

I've had some issues with a federated (a.k.a catalog) in Polaris connecting to GCP so I raised this ticket outlining the problem

Having a bit of time to implement it, I've raised a PR. The first thing I had to do was get familiar with:

The Polaris Architecture

Note the DTOs are automatically generated (see spec/polaris-management-service.yml). See client/python/spec/README.md for full instructions, but running:

redocly bundle spec/polaris-catalog-service.yaml -o spec/generated/bundled-polaris-catalog-service.yaml

brings all the YML together and 

The reason for doing it this way is to generate both Python (with make client-regenerate) and Java (with ./gradlew :polaris-api-management-model:openApiGenerate) that are in lockstep with the spec.

So, the DTOs are auto generated but the DPOs are hand coded. This is because they are internal whereas DTOs are client facing and that client could be Java, Python or something else.

After making the change, then it's:

./gradlew assemble -x test && ./gradlew publishToMavenLocal -x test

to push it to the awaiting code in my project.

Git

Then as I make my changes, I keep pulling from original repo with:

git pull https://github.com/apache/polaris.git main --rebase

The --rebase at the end is saying "make my branch exactly the same as the original repo then add my deltas on to it at the end of its history."

Following the Polaris instructions, I noticedthat my origin was the Polaris Git repo (see this with git remote -v).I actually found it easier to run:

git remote set-url origin https://github.com/PhillHenry/polaris.git
git remote add upstream  https://github.com/apache/polaris
git push --force-with-lease origin 3451_federated_google_auth # this is the branch

to push my changes (and any from Apache) to my own branch.

Now, with:

$ git remote -v
origin  https://github.com/PhillHenry/polaris.git (fetch)
origin  https://github.com/PhillHenry/polaris.git (push)
upstream        https://github.com/apache/polaris (fetch)
upstream        https://github.com/apache/polaris (push)

I can keep my repo in synch with the original and ensure that my changes are always the last commits in the history with:

git fetch upstream
git fetch origin
git rebase upstream/main

as rebase flattens the history graph and rewrites the hash of commits (not the commits themselves).

To squash the commits, run:

git config --global core.editor "vim" # I prefer vim to emacs
git rebase -i HASH_OF_LAST_COMMIT_THAT_IS_NOT_YOURS

then edit the file such that the top line starts with pick and the subsequent list of commits begin with squash. Save it then you'll be prompted to write another file. Put the final, informative comment here. Save it too then push.

If you get into a pickle,

git reset --hard
rm -fr ".git/rebase-merge"

gets you back to where you were.

Once you're happy

Don't forget to 

./gradlew spotlessApply

Note, this will change the files on disk. Also, run:

./gradlew  build -x rat

For my 24 core Intel Ultra 9 185H:

BUILD SUCCESSFUL in 23m 52s

so, I don't want to do this too often...

Debugging

Polaris is heavily dependent on Quarkus which was throwing an HTTP 400 according to the logs but gave no further information. So, it's good at this point to put a breakpoint in org.jboss.resteasy.reactive.server.handlers.RequestDeserializeHandler as I suspected that it was related to my new DTOs. 

Google

Google by default stops an account from impersonating itself. 

So, to mitigate this in my integration tests, I've created two service accounts - one that my Polaris always runs as and the second to pretend to be the account that manages access to the external catalog. You get the Polaris SA to impersonate the external SA with:

gcloud iam service-accounts add-iam-policy-binding EXTERNAL_SA@PROJECT_ID.iam.gserviceaccount.com --member="serviceAccount:POLARIS_SA@PROJECT_ID.iam.gserviceaccount.com"  --role="roles/iam.serviceAccountTokenCreator"

An unexpected regression

Almost there, I came across this unexpected error:

2026-02-09 09:38:11,924 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] ... java.lang.NoClassDefFoundError: Could not initialize class com.google.cloud.iam.credentials.v1.stub.GrpcIamCredentialsStub
        at com.google.cloud.iam.credentials.v1.stub.IamCredentialsStubSettings.createStub(IamCredentialsStubSettings.java:145)       

The error was deep in some class initialization so I added this code:

      try {
          ProtoUtils.marshaller(GenerateAccessTokenRequest.getDefaultInstance());
      } catch (Throwable t) {
          t.getCause().printStackTrace();
          LOGGER.error( "Failed to create IAM credentials stub", t);
      }

which gave:

Caused by: com.google.protobuf.RuntimeVersion$ProtobufRuntimeVersionException: Detected incompatible Protobuf Gencode/Runtime versions when loading GenerateAccessTokenRequest: gencode 4.33.2, runtime 4.32.1. Runtime version cannot be older than the linked gencode version.
        at com.google.protobuf.RuntimeVersion.validateProtobufGencodeVersionImpl(RuntimeVersion.java:120)
        at com.google.protobuf.RuntimeVersion.validateProtobufGencodeVersion(RuntimeVersion.java:68)
        at com.google.cloud.iam.credentials.v1.GenerateAccessTokenRequest.<clinit>(GenerateAccessTokenRequest.java:32)
        ... 77 more
com.google.protobuf.RuntimeVersion$ProtobufRuntimeVersionException: Detected incompatible Protobuf Gencode/Runtime versions when loading GenerateAccessTokenRequest: gencode 4.33.2, runtime 4.32.1. Runtime version cannot be older than the linked gencode version.
        at com.google.protobuf.RuntimeVersion.validateProtobufGencodeVersionImpl(RuntimeVersion.java:120)
        at com.google.protobuf.RuntimeVersion.validateProtobufGencodeVersion(RuntimeVersion.java:68)
        at com.google.cloud.iam.credentials.v1.GenerateAccessTokenRequest.<clinit>(GenerateAccessTokenRequest.java:32)
        at org.apache.polaris.core.storage.gcp.GcpCredentialsStorageIntegration.createIamCredentialsClient(GcpCredentialsStorageIntegration.java:287)

Urgh. It appears that GenerateAccessTokenRequest (which is itself @com.google.protobuf.Generated) in JAR proto-google-cloud-iamcredentials-v1:2.83.0 says in its static initializer that it is associated with protobuf version 4.33.2. Meanwhile, RuntimeVersion in JAR protobuf-java:4.32.1 checks this against itself and obviously fails it.

Tuesday, February 3, 2026

Polaris Federation notes

Here are some miscellaneous notes I made as I work my way through groking Polaris Federation and Authorization.

First:

Polaris DevOps

Polaris has integration tests in the integration-tests/src/main/java/ directory not the test directory as you would have imagined. The reason for this is that they can be packaged as a JAR and used elsewhere in the codebase. 

The advantage to doing it this way is that the same tests can be run against a local Polaris, a Polaris in the cloud, a Polaris running in Docker etc.

So, if we take CatalogFederationIntegrationTest, we can see it subclassed in 
the spark-tests Gradle package where it can be run with:

./gradlew :polaris-runtime-spark-tests:intTest

If you try to run the superclass in its own module with Gradle, it cannot be found as it's not in the test directory. If you try to run it with your IDE, you'll find that classes to be wired in at runtime are missing. Running the subclass, CatalogFederationIT, starts a MinIO docker container against which it can run.

Federation

The DTOs (Data Transfer Objects) for creating catalogs etc live in 

org.apache.polaris.core.admin.model 

For example ExternalCatalog which can be serialized into JSON.

These are passed across the wire and are turned into DPOs (Data Persistence Objects) that live in 

org.apache.polaris.core.connection.iceberg

In the case of the IcebergRestConnectionConfigInfoDpo, this DPO object is not a mere aneamic domain model. It has the logic to, for instance, create the properties that will be used to instantiate the class that will govern authentication. It does this by delegating to this factory class:

org.apache.iceberg.rest.auth.AuthManagers

Notice that we have moved from Polaris to the world of Iceberg. The various AuthManagers implement access to OAuth2 providers, Google, SigV4 for AWS etc.

However, there is a mismatch. The AuthenticationParameters DTO classes don't fully align with the AuthManager classes. For instance, there doesn't appear to be a way of creating an external catalog with authorisation via org.apache.iceberg.gcp.auth.GoogleAuthManager.

So, after a day of investigating and trying to hack something together, it looks like this:
  • Iceberg can talk to Google no problem using org.apache.iceberg.gcp.auth.GoogleAuthManager.
  • However, there is currently no Polaris code to use GoogleAuthManager in an external catalog.
  • Instead, the only way to do it currently is to use the standard OAuth2 code.
  • However, Google does not completely follow the OAuth2 spec, hence this Iceberg ticket that lead to the writing of GoogleAuthManager and this StackOverflow post that says GCP does not support the grant_type that Iceberg's OAuth2Util uses.
This has no been raised in this Polaris ticket.

Saturday, January 31, 2026

Notes on Poetry

The dependencies in Poetry can get screwed. If hashes don't agree then blatting the poetry.lock file will not help. Instead, run:

poetry cache clear pypi --all
poetry cache clear --all .
rm -rf ~/.cache/pypoetry

When updating a dependency, run:

poetry lock
poetry install

This will install an environment in a subfolder of:

~/.cache/pypoetry/virtualenvs/

You can point your IDE at the Python executable underneath this.

Poetry is pretty nice when showing you dependencies. Running something like:

poetry show pandas

shows you everything Pandas needs and everything that depends on it.

This was necessary when decoding a bizarre error in a Jupyter notebook where an import was failing when it was clearly there. In this case, statsmodel and Pandas seem to be disagreeing. 

The code to put in the notebook to check it was using the right version of a library is:

import statsmodels
import pandas

print(statsmodels.__version__)
print(pandas.__version__)
print(statsmodels.__file__)

Now, compare this to the Python environment:

poetry run python - <<EOF
import statsmodels, pandas
print(statsmodels.__version__)
print(statsmodels.__file__)
print(pandas.__version__)
EOF

I had changed dependency versions but my IDE (PyCharm) did not recognise the change until I restarted it.

Some useful one-liners

Run your tests with:

poetry run pytest

The whereabouts of your tests can be found in your pyproject.toml file. It should look something like:

testpaths = ["tests"]
pythonpath = "src"

With this, your tests can import anything under the ROOT/src directory.

Add a dependency with something like:

poetry add ipykernel

This will update your metadata file.

Monday, January 12, 2026

The Federation: AWS Glue

Apache Polaris can act as a proxy to other catalogs. This still appears to be work in progress as the roadmap proposal has "Catalog Federation" as "Tentatively Planned" at least until release 1.5.

If you're running it from source, you'll need to enable:

polaris.features."ENABLE_CATALOG_FEDERATION"=true
polaris.features."SUPPORTED_CATALOG_CONNECTION_TYPES"=["ICEBERG_REST", "HIVE"]
polaris.features."SUPPORTED_EXTERNAL_CATALOG_AUTHENTICATION_TYPES"=["OAUTH", "BEARER", "SIGV4"]


Apache Polaris can be a proxy for an Iceberg REST endpoint. In each case, a org.apache.polaris.core.admin.model.ExternalCatalog is passed across the wire to create a catalog. Only the details differ.

AWS Glue

Glue is its own beast but it does offer an Iceberg REST endpoint. To use it, the AuthenticationParameters in the ExternalCatalog must be of type SigV4AuthenticationParameters.
"In IAM authentication, instead of using a password to authenticate against the [service], you create an authentication token that you include with your ... request. These tokens can be generated outside the [service] using AWS Signature Version 4(SigV4) and can be used in place of regular authentication." [1]
So, the SigV4AuthenticationParameters ends up taking the region, role ARN, etc. The role must be available to the Principal that is associated with the Polaris instance. In addition, there must be a --policy-document that allows the Action glue:GetCatalog.

Finally, the Glue database and table must be created with Parameters that contain iceberg.table.default.namespace and an IcebergInput block.

TL;DR - most of the work is in configuring AWS not the calling code.

[1] Security and Microservice Architecture on AWS

Tuesday, January 6, 2026

Cross account AWS Permissions

You can have one AWS account see the contents of the S3 bucket of another entirely separate account if you configure it correct. Note that S3 buckets are unique across the whole AWS estate, irrespective of who owns it. This was for historical reasons, it seems.

Anyway, to have account Emrys (say) read the bucket of account Odin (say), run the commands below.

Note, that all of this can be run from the same command line if you have Emrys as your [default] account and [odin] as Odin's in ~/.aws/config and credentials. You'll need source_profile = odin in config to point to the correct credentials.

First, we say create a role in Odin that Emrys will assume:

aws iam create-role --role-name S3ReadWriteRoleEmrys --assume-role-policy-document '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
              "AWS": "arn:aws:iam::EMRYS_ID:root"
      } ,
      "Action": "sts:AssumeRole"
    }
  ]
}' --profile odin 

Then we create a policy and attach it to the role.

aws iam create-policy --policy-name ReadOdinS3IAMPolicy  --policy-document file://emrys-policy.json --profile odin

aws iam attach-role-policy   --role-name S3ReadWriteRoleEmrys   --policy-arn arn:aws:iam::ODIN_ID:policy/ReadOdinS3IAMPolicy --profile odin

Note that emrys-policy.json is just a collection of s3 Actions that act on a Resource that is Odin's bucket - nothing special.

Then in the Emrys real estate, we 

aws iam create-policy \
  --policy-name AssumeOdinRole \
  --policy-document file://assume-account-odin-role.json

aws iam attach-user-policy \
  --user-name MY_USER \
  --policy-arn arn:aws:iam::EMRYS_ID:policy/AssumeOdinRole

where assume-account-odin-role.json just contains the sts:AssumeRole for Odin's S3ReadWriteRoleEmrys.

Finally, we get the temporary credentials to read the bucket with:

aws sts assume-role \
  --role-arn arn:aws:iam::ODIN_ID:role/S3ReadWriteRoleEmrys \
  --role-session-name s3-access

Just paste the output of this into your AWS environment variables.

"For a principal to be able to switch roles, there needs to be an IAM policy attached to it that allows the principal to call the AssumeRole action on STS [Security Token Service].

"In addition, IAM roles can also have a special type of a resource-based policy attached to them, called an IAM role trust policy. An IAM role trust policy can be written just like any other resource-based policy, by specifying the principals that are allowed to assume the role in question... Any principal that is allowed to access a role is called a trusted entity for that role." [1]

Note that the key in the Prinicipal map is significant as it defines the category of the identity. It can be 
  • a Service that is allowed to assume a role (eg, EKS)
  • AWS which indicates an IAM user or role or an assumed role (see below). Note that root does not indicate the most powerful user as in Unix. On the contrary, it means anybody legitimately associated with this account.
  • Federated which means it's a provider external to the native AWS ecosystem.
STS is the system an identity must apply if it wishes to assume a role. This system checks that the identity is indeed allowed to do this.

An assumed role looks like this:

arn:aws:sts::123456789012:assumed-role/S3ReadWriteRoleSF/snowflake

where S3ReadWriteRoleSF is the normal, IAM role name and snowflake is the session name. This session name is merely a tag and has no intrinsic permissions (although it may be used in Condition/StringEquals). This will be set in --role-session-name (see above) when assuming the role.

[1] Security and Microservice Architecture on AWS

Wednesday, December 31, 2025

Debugging JNI calls to the GPU

I'm playing aroung with a Java based LLM (code here). When running a JVM that calls the GPU using the TornadoVM, it crashed and in the log, I saw:

Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libcuda.so.1+0x1302b0]
C  [libcuda.so.1+0x332420]
C  [libtornado-ptx.so+0x64b8]  Java_uk_ac_manchester_tornado_drivers_ptx_PTXStream_cuLaunchKernel+0x198
j  uk.ac.manchester.tornado.drivers.ptx.PTXStream.cuLaunchKernel([BLjava/lang/String;IIIIIIJ[B[B)[[B+0 tornado.drivers.ptx@2.2.1-dev
...

Now, finding the Shared Object files (*.so), I called: 

objdump -d /usr/lib/x86_64-linux-gnu/libcuda.so.1 
objdump -d /usr/local/bin/Java/tornadovm-2.2.1-dev-ptx/lib/libtornado-ptx.so

and looked at the addresses in the stack dump.

First, libtornado-ptx.so. Note that the address (0x64b8) is the return address from a call, that is the next line after the call that went Pete Tong. 

    64b3:       e8 b8 e1 ff ff          call   4670 <cuLaunchKernel@plt>
    64b8:       48 83 c4 30             add    $0x30,%rsp

So, it's the call to cuLaunchKernel that is interesting.

  33241b:       e8 00 de df ff          call   130220 <exit@plt+0x4e460>
  332420:       5a                      pop    %rdx

and the final (top most) stack frame:

  1302ab:       4d 85 e4                test   %r12,%r12
  1302ae:       74 58                   je     130308 <exit@plt+0x4e548>
  1302b0:       41 8b 04 24             mov    (%r12),%eax

The instruction test %x,%y is a common idiom in null checks (basically, it's x and y are ANDed and the je jumps if the Zero Flag is set - note that this flag is set if the result of the AND is non-zero or both x and y are zero).

So, it looks like we've essentially got what's equivalent to a NullPointerException in the machine code. Still looking at what's null... [Solved: had to use a model that is compatible with GPULlama3.java)

Monday, December 15, 2025

AWS cheatsheet

Various command lines that have helped me recently.

IAM

List the role's attached, inline and assumed (trust) policies with:

aws iam list-attached-role-policies --role-name $ROLE_NAME

aws iam list-role-policies --role-name $ROLE_NAME

Whoami with:

aws sts get-caller-identity 

Policies are a collection of actions and services that can be assigned. List all homemade policies with:

aws iam list-policies --scope Local --query 'Policies[].Arn' --output table

Similarly, list all roles with:

aws iam list-roles --query 'Roles[].RoleName' --output table

List all the Actions for a policy with:

aws iam get-policy-version --policy-arn $POLICY_ARN --version-id $(aws iam get-policy --policy-arn $POLICY_ARN --query 'Policy.DefaultVersionId' --output text) --query 'PolicyVersion.Document.Statement[].Action'   --output json | jq -r '.[]' | sort -u

List all the trust policies for a given role:

aws iam get-role --role-name $ROLE_NAME --query 'Role.AssumeRolePolicyDocument' --output json

Note that assuming a role implies some temporary elevation of privileges while attaching a role is more about defining what a role actually is.

List everything attached to a policy:

aws iam list-entities-for-policy --policy-arn $POLICY_ARN

Instance profiles contain roles. They act as a bridge to securely pass an IAM role to an EC2 instance, enabling the instance to access other AWS services without needing to store long-term, hard-coded credentials like access keys. You can see them with:

aws iam list-instance-profiles-for-role --query 'AttachedPolicies[*].PolicyArn' --role-name $ROLE_NAME --query "InstanceProfiles[].InstanceProfileName" --output text

In short: 
  • Trust policies say who can access a role. 
  • Permission policies say what a role can do.
Note that this is why trust polices have only one action: sts:AssumeRole.

Secrets

See access to K8s secrets with:

kubectl logs -n kube-system -l app=csi-secrets-store-provider-aws -XXX

See an AWS secret with:

aws secretsmanager get-secret-value --secret-id $SECRET_ARN --region $REGION

Deleting them is interesting as they will linger unless told otherwise:

aws --region $REGION secretsmanager  delete-secret --secret-id $SECRET_NAME --force-delete-without-recovery

Infra

To see why your EKS deployments aren't working:

kubectl get events --sort-by=.metadata.creationTimestamp | tail -20

Terraform seems to have a problem deleting load balancers in AWS. You can see them with:

aws elbv2 describe-load-balancers

List the load balancers:

aws elb describe-load-balancers --region $REGION

List the VPCs:

aws ec2 describe-vpcs --region $REGION

Glue

Create with:

aws glue create-database  --database-input '{"Name": "YOUR_DB_NAME"}'  --region $REGION

Create an Iceberg table with:

aws glue create-table \
    --database-name YOUR_DB_NAME \
    --table-input '
        {
            "Name": "TABLE_NAME",
            "TableType": "EXTERNAL_TABLE",
            "StorageDescriptor": {
                "Location": "s3://ROOT_DIRECTORY_OF_TABLE/",
                "Columns": [
                    { "Name": "id", "Type": "int" },
...
                    { "Name": "randomInt", "Type": "int" }
                ],
                "SerdeInfo": {
                    "SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"
                }
            },
            "Parameters": {
                "iceberg.table.default.namespace": "YOUR_DB_NAME"
            }
        }' \
    --open-table-format-input '
        {
            "IcebergInput": {
                "MetadataOperation": "CREATE",
                "Version": "2" 
            }
        }' \
    --region $REGION

Get all the databases with:

aws glue get-databases --query 'DatabaseList[*].Name' --output table

Get tables with:

aws glue get-tables --database-name YOUR_DB_NAME

Drop with:

aws glue delete-table --name TABLE_NAME --database-name YOUR_DB_NAME