After successfully configuring Polaris to vend AWS credentials, I've moved on to the Google Cloud Platform.
As far as the Spark client and Polaris are concerned, there's not much difference to AWS. The only major change is that upon creating the Catalog, you need to put in your JSON a gcsServiceAccount. This refers to the service account you need to create.
You create it with something like:
$ gcloud iam service-accounts create my-vendable-sa --display-name "Vendable Service Account"
and then add to it the user for whom it will deputize:
$ gcloud iam service-accounts add-iam-policy-binding my-vendable-sa@philltest.iam.gserviceaccount.com --member="user:phillhenry@gmail.com" --role="roles/iam.serviceAccountTokenCreator"
where philltest is the project and phillhenry is the account for which the token will proxy.
This creates a token for you but not that for a service account. For that you need:
gcloud iam service-accounts keys create ~/service-account.json --iam-account my-vendable-sa@philltest.iam.gserviceaccount.com
You might need your friendly admin to run:
gcloud org-policies set-policy allow-key-creation.yaml
where that YAML file contains:
name: projects/fjords-450014/policies/iam.disableServiceAccountKeyCreation
spec:
rules:
- enforce: false
Not so fast...
Running my Spark code just barfed when it tried to write to GCS with:
my-vendable-sa@philltest.iam.gserviceaccount.com does not have serviceusage.services.use access
Note that if the error message mentions your user (in my case PhillHenry@gmail.com) not the service account, you've pointed GOOGLE_APPLICATION_CREDENTIALS at the wrong application_default_credentials.json file.
Basically, I could write to the bucket using the command line but not using the token. I captured the token by putting a breakpoint here in Iceberg's parsing of the REST response from Polaris. Using the token (BAD_ACCESS_TOKEN), I ran:
$ curl -H "Authorization: Bearer ${BAD_ACCESS_TOKEN}" https://storage.googleapis.com/storage/v1/b/odinconsultants_phtest
{
"error": {
"code": 401,
"message": "Invalid Credentials",
"errors": [
{
"message": "Invalid Credentials",
"domain": "global",
"reason": "authError",
"locationType": "header",
"location": "Authorization"
}
]
}
}
The mistake I was making was that the command line was associated with me (PhillHenry) not the service account - that's why I could upload on CLI. Check your CLOUDSDK_CONFIG environment variable to see which credential files you're using.
$ gcloud projects get-iam-policy philltest --flatten="bindings[].members" --format="table(bindings.role)" --filter="my-vendable-sa@philltest.iam.gserviceaccount.com"
ROLE
roles/serviceusage.serviceUsageAdmin
roles/storage.admin
roles/storage.objectAdmin
roles/storage.objectCreator
This appears to fix it because the service account must be able to see the project before it can see the project's buckets [SO].
Now, Spark is happily writing to GCS.
No comments:
Post a Comment