eksctl create cluster --name $CLUSTERNAME --nodes 3
but this really hides a huge amount of what is going on. Apart from IAM, eksctl automatically creates:
- a new Virtual Private Cloud (VPC) in which sit the K8s control plane and workers. A VPC is "a logically isolated and secure network environment that is separate from the rest of the AWS cloud" [1]
- two public subnets and two private subnets (best practice if you want high availability). By putting worker nodes in the private subnet, they cannot be maliciously scanned from the internet.
- all necessary NAT Gateways to allow the private subnets to access the internet
- Internet Gateways allowing the internet to talk to your public subnets.
- Route Tables which are just rules for network traffic. It's the "Routers use a route table to determine the best path for data packets to take between networks" [2]
You can see some details with:
$ eksctl get cluster --name=$CLUSTERNAME --region=$REGION
NAME VERSION STATUS CREATED VPC SUBNETS SECURITYGROUPS PROVIDER
spark-cluster 1.32 ACTIVE 2025-10-27T10:36:02Z vpc-REDACTED subnet-REDACTED,subnet-REDACTED,subnet-REDACTED,subnet-REDACTED,subnet-READACTED,subnet-REDACTED sg-REDACTED EKS
Terraform
If you use Terraform, you might need to configure your local kubectl to talk to the EKS cluster by hand.
First, back up your old config with:
mv ~/.kube ~/.kube_bk
then run:
aws eks update-kubeconfig --name $CLUSTERNAME --region $REGION
But if you are running aws via Docker, this will have updated ~/.kube/config in the container, not the host. So, run:
docker run --rm -it -v ~/.aws:/root/.aws -v ~/.kube:/root/.kube amazon/aws-cli eks update-kubeconfig --name $CLUSTERNAME --region $REGION
Now it will write to your host's config but even then you'll have to change the command at the end of the file to point to a non-Docker version (yes, you'll have to install the AWS binary - preferably in a bespoke directory so you can continue using the Docker version).
Another issue I had was the connection to the new EKS cluster was different to my ~/.kube/config. This in itself was not a problem as you can put in (using Java and CDKTF):
LocalExecProvisioner.builder()
.when("create") // Run only when the resource is created
.command(String.format(
"aws eks update-kubeconfig --name %s --region %s",
CLUSTER_NAME,
AWS_REGION)
)
.type("local-exec")
.build()
which depends on the EksCluster and the DataAwsEksClusterAuth and in turn, the failing resource are to depend on it.
However, this introduced other problems.
First, I tried to get the reading of ~/.kube/config to depends_on the EKS cluster. That way, I'd only read it once the cluster was up and running, right? Well, no. This introduces a circular dependency as it's read before the cluster is started.
Any fiddling with the dependency tree leads to reading ~/.kube/config when it's stale. So, you need to initialize the Kubernetes details (which appears to be global and otherwise implicit) directly with:
String base64CertData = cluster.getCertificateAuthority().get(0).getData();
String encodedCert = com.hashicorp.cdktf.Fn.base64decode(base64CertData);
KubernetesProvider kubernetesProvider = KubernetesProvider.Builder.create(this, "kubernetes")
.host(cluster.getEndpoint())
.clusterCaCertificate(encodedCert)
.token(eksAuthData.getToken()) // Dynamically generated token
.build();
Strangely, you still need to define the environment variable, KUBE_CONFIG_PATH as some resources need it, albeit after it has been correctly amended with the current cluster's details.
Zombie Clusters
Running:
tofu destroy -auto-approve
just kept hanging. So, I ran:
tofu state list | grep -E "(nat_gateway|eip|eks_cluster)"
and found some EKS components running that I had to kill with:
tofu destroy -auto-approve -target=...
Finally, kubectl get pods barfed with no such host.
Load balancers
The next problem was the tofu destroy action was constantly saying:
aws_subnet.publicSubnet2: Still destroying... [id=subnet-XXX, 11m50s elapsed]
So, I ran:
aws ec2 describe-network-interfaces \
--filters "Name=subnet-id,Values=subnet-XXX" \
--query "NetworkInterfaces[].[NetworkInterfaceId, Description, InterfaceType, Status]" \
--output table
and got an ENI that I tried to delete with:
aws ec2 delete-network-interface --network-interface-id eni-XXX
only to be told that it was still in use. Ho, hum:
$ aws ec2 describe-network-interfaces \
--network-interface-ids eni-XXX \
--query "NetworkInterfaces[0].{ID:NetworkInterfaceId, Description:Description, Status:Status, Attachment:Attachment}" \
--output json
...
"InstanceOwnerId": "amazon-elb",
...
So, let's see what that load balancer is:
$ aws elb describe-load-balancers \
--query "LoadBalancerDescriptions[?contains(Subnets, 'subnet-XXX')].[LoadBalancerName]" \
--output text
which gives me its name and now I can kill it with:
aws elb delete-load-balancer --load-balancer-name NAME
Finally, the destroy just wasn't working, failing ultimately with:
│ Error: deleting EC2 VPC (vpc-XXX): operation error EC2: DeleteVpc, https response error StatusCode: 400, RequestID: 8412a305-..., api error DependencyViolation: The vpc 'vpc-XXX' has dependencies and cannot be deleted.
Just going into the web console and deleting it there was the simple but curious solution.
[1] Architecting AWS with Terraform
[2] The Self-Taught Cloud Computing Engineer