Saturday, December 21, 2024

Debugging Polaris in Docker

Iceberg and Polaris sitting in a tree...

I have a proof of concept on GitHub that demonstrates how to use Apache Iceberg. Since I need Apache Polaris as Iceberg's metastore, I have it running in a container. 

If I create a catalog of FILE type, if the file path for storage is X, does this refer to the filesystem of Polaris or Spark?
Michael Collado
it's going to be both, so you shouldn't really use it in that scenario. The Polaris server is going to read/write metadata.json files in its own container's file system and the spark notebook will read/write data files in its own container's filesystem, so... [Discord]
In my PoC, I use a shared filesystem mount where both the Polaris container writes as well as the host's Spark instance.

However, tests were failing with the minimum of logging. When running Docker as a non-root user, the error in the Polaris logs looks like:

{"timestamp":1734433498899,"level":"INFO","thread":"dw-51 - POST /api/catalog/v1/manual_spark/namespaces/my_namespace/tables/IcebergCRUDSpec","logger":"org.apache.polaris.service.exception.IcebergExceptionMapper","message":"Handling runtimeException Failed to get file system for path: file:/tmp/polaris/my_namespace/IcebergCRUDSpec/metadata/00000-0daa8a08-5b5d-459a-bdd0-0663534f2007.metadata.json","mdc":{"spanId":"6ea71bffea6af726","traceId":"8b485bf56e7e27aac2e47ede876e02bd","realm":"default-realm","request_id":null},"params":{}}

When running containerised Polaris as root, the tests passed but I couldn't clean up the files on the shared filesystem mount afterwards as I was not running the test suite as root on the hosts.

Digging Deeper

That string ("Failed to get file system for path") lead me to org.apache.iceberg.hadoop.Util.getFs. Unfortuately, the nested exception is wrapped in the error reported above and lost.

So, we start the container with these flags:

 -eJAVA_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:8788 -p8788:8788

since polaris-dropwizard-service is part expecting JAVA_OPTS to be set. 

Great, now we can put a breakpoint in Util.getFs and printStackTrace on the nested exception. It shows:

Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name
        at jdk.security.auth/com.sun.security.auth.UnixPrincipal.<init>(UnixPrincipal.java:71)
        at jdk.security.auth/com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:134)
        at java.base/javax.security.auth.login.LoginContext.invoke(LoginContext.java:754)
        at java.base/javax.security.auth.login.LoginContext$4.run(LoginContext.java:678)
        at java.base/javax.security.auth.login.LoginContext$4.run(LoginContext.java:676)
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:714)
        at java.base/javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:676)
        at java.base/javax.security.auth.login.LoginContext.login(LoginContext.java:587)
        at org.apache.hadoop.security.UserGroupInformation$HadoopLoginContext.login(UserGroupInformation.java:2148)

A quick look at the JDK code shows that UnixSystem.getUsername appears to be returning a null. And this appears to be because there is no user with my ID in the container - d'oh.

A Solution

One solution is to have a bespoke Docker entrypoint that creates the user (if it doesn't exist) given a user ID passed by the Fabric8's docker-maven-plugin and runs Polaris as that user. If it's the same user as that running the integration tests, both Polaris and Spark can write to the same directory and tables can be dropped without permission issues.

No comments:

Post a Comment