Deploying Kubernetes With Kafka: Best Practices
Not sure how to set up and run Apache Kafka on Kubernetes? Learn best practices and steps to follow to deploy Kubernetes Cluster with Apache Kafka.
Kafka
Kafka is an open-source event streaming platform first developed by LinkedIn, then donated to the Apache software foundation. Kafka is written in Java and Scala. Apache Kafka aims to provide the user with a unified, high latency, high throughput platform for handling real-time data.
As we talk about running Kafka on Kubernetes - Kafka runs as a cluster of nodes called Kafka brokers. Kafka was developed first as a messaging queue and works as a pub-sub model. It’s used as a popular message queue for distributed systems, and is commonly used to stream data in the Internet of Things use cases.
How Apache Kafka Works:
- Producers create messages and publish them to topics.
- Kafka categorizes the messages into topics and stores them so that they are immutable.
- Consumers subscribe to a specific topic and absorb the messages provided by the producers.
Zookeeper In Kafka
Zookeeper is used in Kafka for choosing the controller, and is used for service discovery for a Kafka broker that deploys in a Kubernetes cluster. Zookeeper sends changes of the topology to Kafka, so nodes in the cluster know when something is new, broken, joined, finished, or the same topic is removed or a new topic is added. Zookeeper must be deployed within the Kafka cluster for high availability.
Kubernetes
Nowadays every big company has shifted to Kubernetes, or is planning to do so. Kubernetes manages the cluster of worker and master nodes and allows you to deploy, scale, and automate containerized workloads such as Kafka. Kubernetes was first developed by Google as an in-house project to orchestrate their containerized technology and software known as Borg, maintained and developed by CNCF.
Kafka On Kubernetes
There are a variety of reasons why Kafka's architecture in Kubernetes is attractive. If your organization is standardizing the use of Kubernetes as an application platform, then this is a great reason to consider running Kafka there as well. Running Kafka on Kubernetes enables organizations to simplify operations such as updates, restarts, and monitoring that are more or less integrated into the Kubernetes platform.
Steps To Follow To Deploy Kafka On Kubernetes Cluster:
As we’ve discussed, Kafka depends upon Zookeeper to track its configuration of what topics are added or removed to served subscribers. First, we have to deploy Zookeeper in the cluster.
Step 1: Deploying Zookeeper
kind: Deployment
apiVersion: apps/v1
metadata:
name: zookeeper-deploy
spec:
replicas: 2
selector:
matchLabels:
app: zookeeper-1
template:
metadata:
labels:
app: zookeeper-1
spec:
containers:
- name: zoo1
image: digitalwonderland/zookeeper
ports:
- containerPort: 2181
env:
- name: ZOOKEEPER_ID
value: "1"
- name: ZOOKEEPER_SERVER_1
value: zoo1
This will create a deployment of a Zookeeper with two replicas. Containers created with digitalwonderland/zookeeper exposing on port 2181 with two environment variables: zookeeper_id value of 1 and zookeeper_server_1 value of zoo1
To create this resource:
kubectl.exe create -f .\zooservice.yaml
Step 2: Exposing Zookeeper Service
apiVersion: v1
kind: Service
metadata:
name: zoo1
labels:
app: zookeeper-1
spec:
ports:
- name: client
port: 2181
protocol: TCP
- name: follower
port: 2888
protocol: TCP
- name: leader
port: 3888
protocol: TCP
selector:
app: zookeeper-1
This will create service for Zookeeper and route traffic to the Zookeeper pod. This service has the name of zoo1 which we will use later when we deploy the Kafka brokers.
To create this resource:
kubectl.exe create -f .\zooservice.yaml
Step 3: Deploying Kafka
apiVersion: v1
kind: Service
metadata:
name: kafka-service
labels:
name: kafka
spec:
ports:
- port: 9092
name: kafka-port
protocol: TCP
selector:
app: kafka
id: "0"
type: LoadBalancer
To create this resource:
kubectl.exe create -f .\zooservice.yaml
Now that we have the service, we can start the Kafka broker.
Step 4: Deploying Kafka Broker
kind: Deployment
apiVersion: apps/v1
metadata:
name: kafka-broker0
spec:
replicas: 2
selector:
matchLabels:
app: kafka
id: "0"
template:
metadata:
labels:
app: kafka
id: "0"
spec:
containers:
- name: kafka
image: wurstmeister/kafka
ports:
- containerPort: 9092
env:
- name: KAFKA_ADVERTISED_PORT
value: "30718"
- name: KAFKA_ADVERTISED_HOST_NAME
value: 192.168.1.240
- name: KAFKA_ZOOKEEPER_CONNECT
value: zoo1:2181
- name: KAFKA_BROKER_ID
value: "0"
- name: KAFKA_CREATE_TOPICS
value: admintome-test:1:1
KAFKA_ADVERTISED_HOST_NAME is set to the IP address we noted earlier. Also, note that we tell Kafka Broker to automatically create an admintome-test theme with 1 partition and 1 replica. You can create multiple themes by using the same vernacular, and separating them with commas (i.e. theme1:1:1, theme2:1:1).
To create this resource:
kubectl.exe create -f .\zooservice.yaml
You can clone this repository from Github to start working right now. Here
Best Practice For Running Kafka On A Kubernetes Cluster
1. Low Latency Network And Storage
Kafka demands low latency for network and storage which means it must have low-contention, high-throughput, and low noising access storage. To provide this we have to use high-performance disks such as solid-state, and consider the location where data access for brokers is local, and where the pod will increase overall system performance.
Kafka performance depends upon on low latency network and high bandwidth. As with Kafka broker, don’t attempt to put all brokers on a single node, as this would reduce availability. Different availability zones are an acceptable trade-off.
If storage in the container is not persistent, data will be lost after restart. EmptyDir is used for Kafka data, and will persist if the container restarts. So if the container starts, the failing broker first has to replicate all the data which is a time-consuming process. That's why you should use a persistence volume, and storage has to be non-local so that Kubernetes will be more flexible in choosing another node after restart or relocation.
2. High Availability For Kafka Brokers
Kafka runs as a cluster of brokers that can be deployed across the Kubernetes system on different nodes. As Kubernetes can automatically recover nodes or containers by restarting, Kafka brokers can do the same. Kafka can rebuild brokers after node failure, but these rebuilds have a lower I / O cost to the application during the rebuild. Consider a data replication strategy that allows you to take advantage of multiple brokers for better performance, but also allows for faster failover in the event of a node failure.
3. Data Protection
Kafka replicates the topic across the cluster. With replication, you can achieve fault tolerance. Also, with data mirroring, you can have data available across data centers. Consider your performance requirements and use only as many brokers as you need to achieve that level of performance, while leveraging technologies that replicate data to other hosts in the cluster in order to maximize availability, performance, and protection.
4. Data Security
Encryption Of Data In-flight Using SSL / TLS: This allows your data to be encrypted between your producers and Kafka, and also with your consumers and Kafka. This is a very common pattern that everyone has used when connecting to the web. The “S” in HTTPS stands for secure (that beautiful green padlock you get everywhere on the web).
Authentication Using SSL Or SASL: This allows its producers and consumers to authenticate themselves to their Kafka cluster, which verifies their identity - it’s also a secure way to allow your customers to expose an identity.
Authorization Using ACLs: Once their clients are authenticated, their Kafka agents can run them on access control lists (ACLs) to determine whether or not a particular client would be authorized to write or read a topic.
5. Operator
The operator is a method of packaging, deploying, and managing a Kubernetes application. There are many operators in the market for Kafka, the best one is Strimzi. Strimzi operator makes it easy to spin up the Kafka cluster in a minute. Using a Strimzi operator, you almost don’t need any configuration for spinning up the cluster. It also adds a feature for point to point TLS encryption.
Confluent has also announced a Kubernetes operator for Apache Kafka recently. It can be deployed in the following:
- Amazon Elastic Kubernetes Service (EKS)
- Azure Kubernetes Service (AKS)
- Google Kubernetes Engine (GKE)
- Red Hat OpenShift Container Platform (see note below)
- Pivotal Container Service (PKS)
- Vanilla Kubernetes
6. Monitoring
Monitoring is a way to handle errors or management of clusters easily. Prometheus can collect metrics from all Java, Kafka, Zookeeper processes with JMX export as a built-in. Strimzi also has a Grafana dashboard for Kafka to visualize key metrics and supplement those metrics with resource usage and performance, as well as stability indicators. This way you get Kafka monitoring basic for free - same with confluent, it supports metrics aggregation using JMX/Jolokia and Supports aggregated metrics export to Prometheus.
7. Health Check
Kubernetes uses liveness and readiness probes to find out if pods are healthy or not. If the liveness probe fails, Kubernetes will kill the container and automatically restart it, if the restart policy is configured accordingly. If the readiness probe fails, Kubernetes will remove the pod from serving requests. This means that manual intervention is no longer needed in such cases, which is a great advantage.
8. Logging
Logging is what makes it easier to debug an issue. All containers in your Kafka installation will log to stdout and stderr and aggregate all logs in a central logging infrastructure, such as Elasticsearch.
TL;DR
- Kafka is a distributed streaming platform that is used to publish and subscribe to streams of records.
- Zookeeper is used to keep a record of Kafka topics and broker.
- Kafka on Kubernetes - deploy Zookeeper and its service to route traffic, then deploy Kafka broker and its service.
- Logging and monitoring are the best ways to keep service intact and know the errors and performance issues beforehand.
- Strimzi and Kafka Confluent Operator are the best ways to bootstrap a Kafka cluster, almost without any configuration.
- Prometheus and Grafana are the best tools to monitor your cluster.
- The health check is used to restart the container if the liveness probe failed, and when the readiness probe failed, it won’t allow that container to serve the user.
- Kafka broker has to be deployed across the cluster to ensure high availability.
- Persistent storage has to be used for data available in the container restart.
- SSD is best used to serve with high throughput.