Wednesday, September 17, 2025

Configuring Polaris for Azure


Azure Config

To dispense tokens, you need to register what Azure calls an app. You do this with:

az ad app create \
  --display-name "MyMultiTenantApp" \
  --sign-in-audience "AzureADMultipleOrgs"

The sign-in-audience is an enum that determines if we're allowing single- or multi-tenant access (plus some Microsoft specific accounts).

Create the service principle with:

az ad sp create --id <appId>

where <appId> was spat out by the create command.

Finally, you need to assign roles to the app:

az role assignment create  --assignee <appId>  --role "Storage Blob Data Contributor" --scope /subscriptions/YOUR_SUBSCRIPTION/resourceGroups/YOUR_RESOURCE_GROUP

One last thing: you might need the credentials for this app so you can pass them to Polaris. You can do this with:

az ad app credential reset --id <appId>

and it will tell you the password. This is AZURE_CLIENT_SECRET (see below).

Polaris

First, Polaris needs these environment variables set to access Azure:

AZURE_TENANT_ID
AZURE_CLIENT_ID
AZURE_CLIENT_SECRET

Note that these are the credentials of the user connecting to Azure and not those of the principal that will dispense tokens. That is, they're your app's credentials not your personal ones.

Configuring the Catalog

The values perculiar to Azure that you need to add to the storageConfigInfo that is common to all cloud providers. These are:
  • tenantId. You can find this by running az account show --query tenantId.
  • multiTenantAppName. This is the Application (client) ID that was generated when the app was created. You can see it in Microsoft Entra ID -> App Registrations -> All Applications in the Azure portal or using the CLI: az ad app list, find your app with the name you created above and use its appId.
  • consentUrl. I'm not entirely sure what this is but can be generated with APPID=$(az ad app list --display-name "MyMultiTenantApp" --query "[0].appId" -o tsv) && echo "https://login.microsoftonline.com/common/oauth2/v2.0/authorize?client_id=$APPID&response_type=code&redirect_uri=http://localhost:3000/redirect&response_mode=query&scope=https://graph.microsoft.com/.default&state=12345"
Find out which url to use:

$ az storage account show --name odinconsultants --resource-group my_resource --query "{kind:kind, isHnsEnabled:isHnsEnabled}" -o table
Kind       IsHnsEnabled
---------  --------------
StorageV2  True

HNS stands for hierarchical nested structure.
For StorageV2 and HNS equal to True, use abfss in the allowedLocations part of the JSON sent to /api/management/v1/catalogs.

Testing the credentials

Check the validity of the SAS token with:

az storage blob list  --account-name odinconsultants  --container-name myblob --sas-token $SAS_TOKEN --output table

We get SAS_TOKEN by putting a break point in Iceberg's ADLSOutputStream constructor.

Iceberg bug?

Iceberg asserts that the key in the map of Azure properties sent by Polaris for the expiry time is the string is adls.sas-token-expires-at-ms.PROJECT_NAME. But Polaris is emitting the string expiration-time. I've raised a bug in the Iceberg project here (14069).

I also raised a concurrency bug here (14070) but closed it when I realised it had been fixed in the main branch even if it wasn't in the latest (1.9.2) release.

Anyway, the upshot is that my workaround is to mask the Iceberg code by putting this (almost identical) class first in my classpath.

No comments:

Post a Comment