Deploy Kafka Cluster with Zookeeper on GCP — The hard way
If you accidentally landed on this page looking for the original (Franz) Kafka, however unlikely it may seem, I have nothing to offer to you, except this wonderful quote. 😄
By believing passionately in something that still does not exist, we create it. The nonexistent is whatever we have not sufficiently desired
— Franz Kafka
But if you came here for Apache Kafka, I assume you already know what Kafka is. For the uninitiated, here is a fun illustrated introduction to Kafka, aptly named Gently Down The Stream. But don’t stop there, read a more serious introduction here.
With that out of the way, let's get straight to the point of this post. We recently had the opportunity of deploying self-managed Kafka clusters at scale for our multiple customers as part of migrating them from On-premise / AWS environments to Google Cloud Platform. This post is a how-to guide on deploying a highly available Kafka (version 3.0.0) cluster with Zookeeper ensemble on Compute Engine.
Kafka and It’s Keeper
We are going to deploy this Kafka cluster along with a Zookeeper ensemble. I am sure some of you are wondering why Kafka still needs a keeper. Zookeeper is a well-known Ope source server, which is used to manage Kafka cluster’s metadata, among other things. To avoid external metadata management and simplify Kafka clusters, Kafka in the future is not going to need an external Zookeeper for metadata management, instead — Kafka will internally manage metadata quorum using KRaft(Kafka Raft Metadata Mode). Unfortunately, KRaft is not production ready yet, so for this post — we will go with Zookeeper.
What will we build
We will deploy a highly available Kafka cluster in Mumbai region on GCP. We will configure a 3 node Zookeeper ensemble (or cluster) spread across 3 availability zones and configure 3 Kafka brokers (you can configure more, depending on your throughput requirements)
Let’s get started!
VPC, Subnets, and firewall rules
If you already have a VPC and subnets, feel free to skip this section. If you don’t have one already, follow along.
Configure your project and region in gcloud
gcloud auth login
gcloud config set project YOUR-PROJECT-NAME
gcloud config set compute/region asia-south1
If you already have a VPC, you can skip this part. For the sake of this guide, let’s create a simple VPC and one subnet. In production, you would ideally provision these in a private subnet dedicated to the database layer.
We will use the network tag kafka
to enable access between Kafka and zookeeper nodes. Let's create a firewall rule. This is rather permissive, you might want to open only necessary ports in production.
Configure Zookeeper
Let’s provision a VM and configure Zookeeper on it. While launching these VMs, we will assign the network tag kafka
so that the firewall rule we created above comes into effect and allows traffic between these VMs
Verify if the VM is up and running
gcloud compute instances list --filter="tags.items=kafka"
SSH to this VM. Install and configure Zookeeper.
gcloud beta compute ssh zk-01 \
--tunnel-through-iap \
--zone=asia-south1-a
Install JDK. After installing JDK, it’s important to set the Java heap size in production environments. This will help avoid swapping, which will degrade Zookeeper’s performance. Refer to the Zookeeper administration guide here
sudo apt update
sudo apt install default-jdk -y
Add a user to run Zookeeper services and grant relevant permissions, we will go with zkadmin
Time to configure Zookeeper. Let’s make a config file.
vi /opt/zookeeper/conf/zoo.cfg
Add the following configuration.
tickTime=2000
dataDir=/data/zookeeper
clientPort=2181
initLimit=10
syncLimit=5
We can quickly verify if everything went well so far by starting zookeeper
cd /opt/zookeeper && /opt/zookeeper/bin/zkServer.sh start
You can connect and verify
bin/zkCli.sh -server 127.0.0.1:2181
You should see something like this
WATCHER::WatchedEvent state:SyncConnected type:None path:null
[zk: 127.0.0.1:2181(CONNECTED) 0]
let’s stop Zookeeper and proceed with the rest of the setup
sudo bin/zkServer.sh stop
Let’s create a systemd unit file so that we can manage Zookeeper as a service — sudo vi /etc/systemd/system/zookeeper.service
Add the following
Now that we have tested that Zookeeper on one instance, we can create an image out of it and use it to spin up 2 more nodes. Let’s make an image.
gcloud beta compute machine-images \
create zookeeper-machine-image \
--source-instance zk-01 \
--source-instance-zone asia-south1-a
We can now use this image to create two more VMs — zk-02
and zk-03
in ap-south1-b
and ap-south1-c
zones respectively. Let's create these VMs
Once all the zookeeper nodes are up and running, We will create a myid
file in each of the zookeeper nodes. The myid
file consists of a single line containing only the text of that machine's id. So myid of server 1 would contain the text "1" and nothing else. The id must be unique within the Zookeeper ensemble and should have a value between 1 and 255.
Let’s now edit the zookeeper configuration file on all nodes to form the cluster quorum. We will add the hostname and ports of each node in the configuration file. You can find the FQDN of your zookeeper nodes by running the command hostname -A
, usually it will be in the following format
[SERVER-NAME].[ZONE].[PROJECT-NAME].internal
Example: zk-01.asia-south1-a.c.cloudside-academy.internal
. If your zookeeper cluster is a single zone deployment, you can simply resolve using the VM name - Ex: ping zk-03
. Since our deployment is multi-zone and you can not resolve names without using an FQDN across Zones, let's use an FQDN. If you want to keep things simpler, you may add the static internal IP of each node instead of the domain name.
SSH to each zookeeper node. On each node, change to zkadmin
user - su -l zkadmin
and open /opt/zookeeper/conf/zoo.cfg
. The config file on all nodes should look like this:
we have added 4lw.commands.whitelist=*
config option to whitelist commands like stat, ruok, conf, isro.
Start Zookeeper Cluster
We are now ready to start the cluster. Let’s enable zookeeper.service
on all nodes and also start it.
You may now connect to one of the zookeeper nodes and check the cluster status.
You will see the status as either leader
or follower
for each node
Setting Up Kafka
Zookeeper ensemble is up and running, so let’s move on to configuring Kafka brokers. We will configure broker on one VM, and from it’s image, spin up two more Kafka brokers.
You might want to choose the right instance type in production and also attach an SSD disk for storage. For this guide, we will go ahead with e2-small VM. Let’s reserve three internal static IPs for our Kafka brokers
SSH to the VM and start setting up Kafka.
gcloud beta compute ssh kafka-broker-01 \
--tunnel-through-iap \
--zone=asia-south1-a
Create a user for running Kafka services and download binaries
You can use the following configuration file, to begin with.
For this first kafka broker, we will assign broker.id
as 0
and you can add more brokers and increment the id. use the internal IP of the broker VM to configure advertised.listeners
, and listeners
. Finally, configure zookeeper.connect
to include all zookeeper nodes (use either FQDNs or IP addresses)
Move the existing existing sample config file and create a new config with the contents from above gist
Create a Kafka Service
Let’s create a systemd file — sudo vi /etc/systemd/system/kafka.service
Add the following:
Start the service — systemctl start kafka
You can now connect to one of the Zookeeper nodes and verify that the kafka broker with id 0 shows up.
#SSH to any zookeeper node and cd /opt/zookeeper bin/zkCli.sh -server 127.0.0.1:2181
After connecting, run
ls /brokers/ids
You should see this
Create an Image
We are now ready to make an image out of this VM and use it to spin up more Kafka broker nodes. Let’s make an image
gcloud beta compute machine-images \
create kafka-machine-image \
--source-instance kafka-broker-01 \
--source-instance-zone asia-south1-a
Let’s now create kafka-broker-02
, kafka-broker-03
VMs in asia-south1-b
, asia-south1-c
zones respectively.
Once these VMs are online, all you need to do is to SSH to these VMs and change the following properties in the configuration file — vi /data/kafka/config/server.properties
For kafka-broker-02 VM
broker.id=1
listeners=PLAINTEXT://10.10.0.21:9092
advertised.listeners=PLAINTEXT://10.10.0.21:9092
For kafka-broker-03 VM
broker.id=2
listeners=PLAINTEXT://10.10.0.22:9092
advertised.listeners=PLAINTEXT://10.10.0.22:9092
Since we are launching these VMs from an image, and there might be some data left which belonged to Broker 0, we have to cleanup the data (log) directories before we start the service.
sudo rm -rf /data/kafka/logs/ && systemctl start kafka
You can now finally verify the status of all brokers in one of the zookeeper nodes
Congratulations! you have now set up Kafka the hard way on Google Compute Engine. Let’s clean up now.
Cleanup
Hope you found this guide useful. If you need help running self-managed kafka at scale on Google Compute Engine, reach out to us! Happy stream processing :)