After successfully configuring Polaris to vend AWS credentials, I've moved on to the Google Cloud Platform.
As far as the Spark client and Polaris are concerned, there's not much difference to AWS. The only major change is that upon creating the Catalog, you need to put in your JSON a gcsServiceAccount. This refers to the service account you need to create.
You create it with something like:
$ gcloud iam service-accounts create my-vendable-sa --display-name "Vendable Service Account"
and then add to it the user for whom it will deputize:
$ gcloud iam service-accounts add-iam-policy-binding my-vendable-sa@philltest.iam.gserviceaccount.com --member="user:phillhenry@gmail.com" --role="roles/iam.serviceAccountTokenCreator"
where philltest is the project and phillhenry is the account for which the token will proxy.
$ gcloud auth application-default login --impersonate-service-account=my-vendable-sa@philltest.iam.gserviceaccount.com
Note it will warn you to run something like gcloud auth application-default set-quota-project philltest or whatever your project is called.
This will open your browser. After you sign in, it will write a credential in JSON file (it will tell you where). You must point the environment variable GOOGLE_APPLICATION_CREDENTIALS to this file before you run Polaris.
Note that it's the application-default switch that writes the credential as JSON so it can be used by your applications.
Not so fast...
Running my Spark code just barfed when it tried to write to GCS with:
my-vendable-sa@philltest.iam.gserviceaccount.com does not have serviceusage.services.use access
Note that if the error message mentions your use (in my case PhillHenry@gmail.com) not the service account, you've pointed GOOGLE_APPLICATION_CREDENTIALS at the wrong application_default_credentials.json file.
Basically, I could write to the bucket using the command line but not using the token. I captured the token by putting a breakpoint here in Iceberg's parsing of the REST response from Polaris. Using the token (BAD_ACCESS_TOKEN), I ran:
$ curl -H "Authorization: Bearer ${BAD_ACCESS_TOKEN}" https://storage.googleapis.com/storage/v1/b/odinconsultants_phtest
{
"error": {
"code": 401,
"message": "Invalid Credentials",
"errors": [
{
"message": "Invalid Credentials",
"domain": "global",
"reason": "authError",
"locationType": "header",
"location": "Authorization"
}
]
}
}
The mistake I was making was that the command line was associated with me (PhillHenry) not the service account - that's why I could upload on CLI. Check your CLOUDSDK_CONFIG environment variable to see which credential files you're using.
$ gcloud projects get-iam-policy philltest --flatten="bindings[].members" --format="table(bindings.role)" --filter="my-vendable-sa@philltest.iam.gserviceaccount.com"
ROLE
roles/serviceusage.serviceUsageAdmin
roles/storage.admin
roles/storage.objectAdmin
roles/storage.objectCreator
This appears to fix it because the service account must be able to see the project before it can see the project's buckets [SO].
Now, Spark is happily writing to GCS.