tofu state list
followed by
tofu state rm XXX
I had to delete load balancers manually through the AWS Web Console and then also the EKS instance. I then had to manually delete any references to them from my JSON.
Tip: regularly delete the directory in which the Terraform lives as state gets kept there that the next run implicitly relies upon. The consequence if you don't is that after a major refactor, you run the configuration and everything looks fine. You check in thinking you've done a good job but there was an invisible dependency on the previous running and checking out to a fresh directory fails. So:
Delete all files regularly
I was getting lots of:
│ Error: Get "https://21D13D424AA794FA2A76DE52CA79FBE9.gr7.eu-west-2.eks.amazonaws.com/api/v1/namespaces/default/services/jupyter-lb": dial tcp: lookup 21D13D424AA794FA2A76DE52CA79FBE9.gr7.eu-west-2.eks.amazonaws.com on 127.0.0.1:53: no such host
│
even after blatting my Terrafrom cdktf.out/stacks directory. Turns out state files were accumulating in the root directory of my project (which contained cdktf.out). Once they too were blatted, things looked better.
Changing the cdktf.out.json file resulted in:
│ Error: Inconsistent dependency lock file
│
│ The following dependency selections recorded in the lock file are inconsistent with the current configuration:
│ - provider registry.opentofu.org/hashicorp/helm: required by this configuration but no version is selected
│
│ To update the locked dependency selections to match a changed configuration, run:
│ tofu init -upgrade
The solution was to run tofu init -upgrade
GCP
You might see this error when running Terraform on GCP:
│ Error: Error setting access_token
│
│ with data.google_client_config.gcp-polaris-deployment_currentClient_7C40CA9C,
│ on cdk.tf.json line 25, in data.google_client_config.gcp-polaris-deployment_currentClient_7C40CA9C:
│ 25: }
│
│ oauth2: "invalid_grant" "reauth related error (invalid_rapt)" "https://support.google.com/a/answer/9368756"
It's nothing really to do with TF but rather your GCP credentials. Login with gcloud auth application-default login and try again. D'oh.
AWS
aws ec2 describe-network-interfaces --filters Name=vpc-id,Values=$VPC --region $REGION
aws ec2 describe-internet-gateways --filters Name=attachment.vpc-id,Values=$VPC --region $REGION
aws ec2 describe-subnets --filters Name=vpc-id,Values=$VPC --region $REGION
aws ec2 describe-security-groups --filters Name=vpc-id,Values=$VPC --region $REGION
This last one showed 3 security groups.
The reason that these AWS entities lingered is because my tofu destroy was always hanging. And the reason it never finished is that there were finalizers that prevented it. To avoid this, I needed to run:
kubectl patch installation default -p '{"metadata":{"finalizers":[]}}' --type=merge
kubectl patch service YOUR_LOAD_BALANCER -p '{"metadata":{"finalizers":null}}' --type=merge
Also, CRDs need to be destroyed:
for CRD in $(kubectl get crds | awk '{print $1}') ; do {
kubectl patch crd $CRD --type=json -p='[{"op": "remove", "path": "/metadata/finalizers"}]'
kubectl delete crd $CRD --force
} done
I would then run these scripts as a local-exec provisioned in a resource.
I asked on the DevOps Discord server how normal this was:
PhillHenryI'm using Terraform to manage my AWS stack that (amongst other things) creates a load balancer using an aws-load-balancer-controller. I'm finding destroying the stack just hangs then times out after 20 minutes.I've had to introduce bash scripts that patch finalizers in services and installations plus force delete CRDs. Finally, tofu detroy cleans everything up but I can't help feeling I'm doing it all wrong by having to add hacks.Is this normal? If not, can somebody point me in the right direction over what I'm doing wrong?
snuufixIt is normal with buggy providers, it's just sad that even AWS is one.
Redeploying
Redeploying a component was simply a matter of running:
tofu apply -replace=kubernetes_manifest.sparkConnectManifest --auto-approve
This is a great way to redeploy just my Spark Connect pod when I've changed the config:
If you want to find out what version of a Helm chart you're using when you forget to set it, this might help. It's where Helm caches the charts it downloads.
$ ls -ltr ~/.cache/helm/repository/
...
-rw-r--r-- 1 henryp henryp 107929 Nov 12 10:41 spark-kubernetes-operator-1.3.0.tgz
-rw-r--r-- 1 henryp henryp 317214 Nov 20 15:56 eks-index.yaml
-rw-r--r-- 1 henryp henryp 433 Nov 20 15:56 eks-charts.txt
-rw-r--r-- 1 henryp henryp 36607 Nov 24 09:15 aws-load-balancer-controller-1.15.0.tgz
-rw-r--r-- 1 henryp henryp 493108 Dec 11 14:47 kube-prometheus-stack-51.8.0.tgz
-rw-r--r-- 1 henryp henryp 38337 Dec 15 12:20 aws-load-balancer-controller-1.16.0.tgz
...
No comments:
Post a Comment