Tuesday, February 11, 2025

Kafka on Kubernetes notes

I installed Strimzi which keeps a Kafka cluster in line inside a Kubernetes cluster. The instructions seem easy. 

helm repo add strimzi https://strimzi.io/charts/
helm repo update
kubectl delete namespace kafka
helm install strimzi-kafka-operator strimzi/strimzi-kafka-operator --namespace kafka --create-namespace

then run your configuration of choice:

kubectl apply -f kafka/kraft/kafka-ephemeral.yaml -n kafka

from the strimzi-kafka-operator/examples directory. Note: you'll have to change the cluster name in that YAML from my-cluster to your cluster name (kafka for me).

But I had a helluva job making it work. This post is how I solved the many problems.

What is Strimzi?

First, let's look at the architecture of Kubernetes.

Kubernetes is built on the notion of primitives that are constructed to create the whole. You can extend Kubernetes using a CustomResourceDefinition. A "CRD, allows us to add any new resource type to a cluster dynamically. We simply provide the API server with the name of the new resource type and a specification that’s used for validation, and immediately the API server will allow us to create, read, up­ date, and delete resources of that new type." [1]

Note that CRDs have no actual functionality of their own. For that you need Operators. CRDs are the plug into which Operators fit. They are notified of events to the CRD via an Informer. "The Kubernetes API server provides a declar­ative API, where the primary actions are to create, read, update, and delete resources in the cluster." [1] That is, you tell K8s what you want and it does it's best to achieve that aim. Your Operator is that workhorse.

In our example above, we deploy the CRD with helm install... and invoke its API with kubectl apply -f....

"DaemonSets are used to manage the creation of a particular Pod on all or a selected set of nodes in a cluster. If we configure a DaemonSet to create Pods on all nodes, then if new nodes are added to the cluster, new pods will be created to run on these new nodes." [2] This is useful for system related pods like Flannel, a CNI (Container Network Interface) plugin, which should run on each node in the cluster. Contrast this to ReplicaSets for which a typical use case is managing your application pods.

"The kube-dns Service connects to a DNS server Deployment called Core­DNS that listens for changes to Services in the Kubernetes cluster. CoreDNS updates the DNS server configuration as required to stay up to date with the current cluster configuration." [1]. CoreDNS gave me some issues too (see below).

BTW, you can also look at Cruise Control which is an open source Java that "helps run Apache Kafka clusters at large scale".

Debugging

And it's Flannel that started crashing for me with strange errors. It was largely by bloody mindedness that I fixed things. Here's a few things I learned along the way. 

The first is that I needed to start kubeadm with --pod-network-cidr=10.244.0.0/16 [SO] when I was following the instructions in a previous post (apparently, you need another CIDR if you use a different plugin like Calico). This prevents "failed to acquire lease" error messages.

Flannel was still complaining with a "cni0" already has an IP address different from 10.... error message [SO]. It appears that some network config from the previous installation needed to be rolled back. Well,  kubeadm reset does warn you that The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d.

So, to completely expunge all traces of the previous Kubernetes installation, I needed to run something like this script on all boxes:

sudo kubeadm reset && \
sudo rm -rf /etc/cni/net.d && \
sudo ipvsadm --clear && \
sudo systemctl stop kubelet && \
sudo systemctl stop docker && \
sudo systemctl stop containerd.service && \
rm -rf ~/.kube && echo Done!
sudo systemctl restart containerd && sudo systemctl restart kubelet

Bear in mind that this will completely blat your installation so you will need to run sudo kubeadm init... again.

Next I was getting infinite loops [GitHub coredns] in domain name resolution with causes the pod to crash with CrashLoopBackOff. This offical doc helped me. As I understand it, Kubernetes should not use /etc/resolv.conf as this will forward a resolution on to K8s which will then look up it's nameserver in this file and so on forever. Running:

sudo echo "resolvConf: /run/systemd/resolve/resolv.conf" >> /etc/kubernetes/kubelet.conf 
sudo systemctl restart kubelet

solved that. Note that /run/systemd/resolve/resolv.conf should contain something like:

nameserver 192.168.1.254
nameserver fe80::REDACTED:d575%2
search home

and no more. If you have more than 3 entries [GitHub], kubelet prints an error but it seems to continue anyway.

Next were "Failed to watch *v1.Namespace" in my coredns pods.

First I tried debugging but deploying a test pod with:

kubectl run -n kafka dns-test --image=busybox --restart=Never -- sleep 3600
kubectl exec -it -n kafka dns-test -- sh

(If you want to SSH into a pod that has one container, use [SO], add a -c CONTAINER_NAME.)

This confirmed that there was indeed a network issue as it could not contact the api-server either. Note that although BusyBox is convenient, you might prefer "alpine rather than busybox as ... we’ll want to use some DNS commands that require us to install a more full­ featured DNS client." [1]

Outside the containers, this worked on the master host:

nslookup kubernetes.default.svc.cluster.local 10.96.0.10

but not a worker box. It should as it's the virtual IP address of the core DNS (see this by running kubectl get svc -n kube-system)

The problem wasn't Kubernetes config at all but firewalls. Running this on my boxes:

sudo iptables -A INPUT -s IP_ADDRESS_OF_OTHER_BOX -j ACCEPT
sudo iptables -A FORWARD -s IP_ADDRESS_OF_OTHER_BOX  -j ACCEPT

(where IP_ADDRESS_OF_OTHER_BOX is for each box in the cluster) finally allowed my Strimzi Kafka cluster to start and all the other pods seemed happy too. Note there are security implications to these commands as they allow all traffic from IP_ADDRESS_OF_OTHER_BOX.

Nodes on logging

To get all the untruncated output of the various tools, you'll need:

kubectl get pods -A  --output=wide
journalctl  --no-pager  -xeu kubelet
systemctl status -l --no-pager  kubelet

And to test connections to the name servers use:

curl -v telnet://10.96.0.10:53

rather than ping as ping may be disabled.

This command shows all events in a given namespace:

kubectl get events -n kube-system

and this will print out something to execute showing the state of all pods:

for POD in $(kubectl get pods -A | awk '{print $2 " -n " $1}' | grep -v NAME) ; do { echo "kubectl describe pod $(echo $POD) | grep -A20 ^Events"  ;   } done

Simple scriptlets but they helped me so I making a note for future reference.

[1] Book of Kubernetes, Holm
[2] The Kubernetes Workshop

No comments:

Post a Comment