When trying to run ArgoCD, I came across this problem that was stopping me from connecting. Using kubectl port-forward..., I was able to finally connect. But even then, if I ran:
$ kubectl get services --namespace argocd
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
argocd-applicationset-controller ClusterIP 10.98.20.142 <none> 7000/TCP,8080/TCP 19h
argocd-dex-server ClusterIP 10.109.252.231 <none> 5556/TCP,5557/TCP,5558/TCP 19h
argocd-metrics ClusterIP 10.106.130.22 <none> 8082/TCP 19h
argocd-notifications-controller-metrics ClusterIP 10.109.57.97 <none> 9001/TCP 19h
argocd-redis ClusterIP 10.100.158.58 <none> 6379/TCP 19h
argocd-repo-server ClusterIP 10.111.224.112 <none> 8081/TCP,8084/TCP 19h
argocd-server LoadBalancer 10.102.214.179 <pending> 80:30081/TCP,443:30838/TCP 19h
argocd-server-metrics ClusterIP 10.96.213.240 <none> 8083/TCP 19h
Why was my
EXTERNAL-IP still
pending? It appears that this is a natural consequence of running my K8s cluster in Minikube [
SO].
So, I decided to build my own Kubernetes cluster.
This step-by-step guide proved really useful. I built a small cluster of 2 nodes on heterogeneous hardware. Note that although you can use different OSs and hardware, you really need to use the same version of K8s on all boxes (see this
SO).
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
adele Ready <none> 18h v1.28.2 192.168.1.177 <none> Ubuntu 18.04.6 LTS 5.4.0-150-generic containerd://1.6.21
nuc Ready control-plane 18h v1.28.2 192.168.1.148 <none> Ubuntu 22.04.4 LTS 6.5.0-18-generic containerd://1.7.2
Great! However, Flannel did not seem to be working properly:
$ kubectl get pods --namespace kube-flannel -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-flannel-ds-4g8gg 0/1 CrashLoopBackOff 34 (2m53s ago) 152m 192.168.1.148 nuc <none> <none>
kube-flannel-ds-r4xvt 0/1 CrashLoopBackOff 26 (3m11s ago) 112m 192.168.1.177 adele <none> <none>
And journalctl -fu kubelet was puking "Error syncing pod, skipping" messages.
Aside: Flannel is a container on each node that coordinates the segmentation of the virtual network. For coordination, it can use etcd, which can be thought of like Zookeeper in the Java ecosystem. "Flannel does not control how containers are networked to the host, only how the traffic is transported between hosts." [
GitHub]
The guide seemed to omit one detail that lead to me to see the Flannel container puking something like this error:
E0427 06:08:23.685930 13405 memcache.go:265] couldn’t get current server API group list: Get “https://X.X.X.X:6443/api?timeout=32s 2”: dial tcp X.X.X.X:6443: connect: connection refused
Following
this SO answer revealed that the cluster's CIDR had not been set. So, I patched it following
this [SO] advice so:
kubectl patch node nuc -p '{"spec":{"podCIDR":"10.244.0.0/16"}}'
kubectl patch node adele -p '{"spec":{"podCIDR":"10.244.0.0/16"}}'
which will work until the next reboot (one of the SO answers describes how to make that permanent as does
this one).
Anyway, this was the puppy and now the cluster seems to be behaving well.
Incidentally, this gives a lot of log goodies:
kubectl cluster-info dump