Setting up a Multi-Broker Kafka Cluster – Beginners Guide

multi-broker kafka cluster setup

Kafka is an open source distributed messaging system that is been used by many organizations for many use cases. Its use cases include stream processing, log aggregation, metrics collection and so on.

Note: This tutorial is based on Redhat 7 derivative. However, it will work on most Linux systems.

Multi-Node Kafka Cluster Setup

This tutorial will guide you to set up a latest Kafka cluster from scratch.

Prerequisites

1. You need a Zookeeper cluster before setting up a Kafka cluster. Refer this zookeeper cluster setup if you don’t have one.

2. Launch three instances. Make sure you allow the traffic between Zookeeper and Kafka instances in the security groups.

3. Set hostnames for three instances for identification using the following command.

hostnamectl set-hostname (node1,2,3)

Kafka Installation

Perform the following tasks on all the servers. 1. Update the server.

sudo yum update -y

2. Install java 8.

sudo yum  -y install java-1.8.0-openjdk

3. Get the latest version of Kafka from here.

cd /opt
sudo wget http://mirror.fibergrid.in/apache/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz

4. Untar the Kafka binary.

sudo tar -xvf kafka_2*

5. Rename the extracted Kafka folder with versions to Kafka.

sudo mv kafka_2.11-0.10.0.0 kafka

Creating a Kafka Service

6. Open the server.properties file, find zookeeper.connect

at the bottom and enter the zookeeper IPs as shown below. Replace zk1, zk2, and zk3 with the IPs or DNS names of your zookeeper instances.

zookeeper.connect=zk1:2181,zk2:2181,zk3:2181

Create a Kafka Service

1. Create a systemd file.

sudo vi /lib/systemd/system/kafka.service

Copy the following contents on to the kafka.service unit file.

[Unit]
Description=Kafka
Before=
After=network.target

[Service]
User=ec2-user
CHDIR= {{ data_dir }}
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
Restart=on-abort

[Install]
WantedBy=multi-user.target

2. Reload the daemon.

sudo systemctl daemon-reload

Managing Kafka Service

Once the Kafka service is created, you can manage the Kafka service using the Linux service module.

1. To start the Kafka service,

sudo service kafka start

2. To stop and restart,

sudo service kafka stop
sudo service kafka restart

Testing The Kafka Cluster

To test the kafka cluter setup, we will create a topic and few messages. Andn we will try to consume it from different node to conform that the cluster is working as intended.

To test, cd in to the kafka bin directory to get access to the kafka scripts.

cd /opt/kafka/bin

Step 1: Create a Topic from kafka node 1 with replication-factor 3. Replace zk with zookeper IP or DNS name.

./kafka-topics.sh –create –zookeeper zk:2181 –replication-factor 3 –partitions 1 –topic test

Step 2: Describe Topic from kafka node 2.

./kafka-topics.sh –describe –zookeeper zk:2181 –topic test

Step 3: Create a Message using the following command. Enter the message in the terminal with weach message per line. Use Ctrl + C to exit.

./kafka-console-producer.sh –broker-list kafka-node:9092 –topic test

Step 4: From a different node, try to consume the message using the following command.

./kafka-console-consumer.sh –zookeeper zk:2181 –from-beginning –topic test

If you were able to do all the test mentioned above, you have a working kafka cluster.

If you face any issue during the setup, feel free to drop us a comment below.

multi-broker kafka cluster setup

Service Discovery and Other Cluster Management Techniques Using Consul

consul service discovery

Consul is a cluster management tool from Hashicorp and it is very useful for creating advanced micro-services architecture. Consul is a distributed configuration system which provides high availability, multi-data center, service discovery and High fault tolerance. So managing micro-services with Consul is pretty easy and simple.

Current micro-service architecture based infrastructure has following challenges

  1. Uncertain service locations
  2. Service configurations
  3. Failure Detection
  4. Load balancing between multiple data-centers

Since Consul is distributed and agent-based, It could solve all the above challenges easily.

Consul Technology and Architecture

Consul is an agent-based tool, which means it should be installed in each and every node of the cluster with servers and a client agent nodes. Hashicorp provides an open source binary package for installing Consul, and it can be downloaded from this (https://www.consul.io/downloads.html).

To install Consul to all the nodes, we need to download the binary file and keep it in the bin folder (/etc/local/bin), so that we can run this from anywhere within the node.

Consul needs to be started as a process and it will continuously shares information. For this, we should  start the agent on all the nodes and connect each other for communicating each other.

Communication between nodes will be done through gossip protocol, Which means each node will send some data to other nodes like a virus and eventually to others.

gossip
Consul’s Gossip protocol and service discovery

Before going to the demonstration, I would like to explain about the architecture of this tool. So basically agent will be started as servers within the nodes where services are running. and a client agent can be used for UI and query the information about the cluster of server. It is not necessary that client can not have the service within it.

micro
Microservice architecture with Consul

To start the agent as a server, we need to mention server as a parameter.

[email protected]:~$consul agent -server -data-dir /tmp/consul

Consul will not automatically join the cluster. It needs to be joined by mentioning the hostname or IP address of other nodes with each other.

[email protected]:~$ consul join 172.20.20.11

Consul maintains the information about the cluster members, and this can be seen at all other instance’s console.

[email protected]:~$ consul members
Node Address Status Type Build Protocol DC
consuldemo 172.28.128.16:8301 alive server 0.6.4 2 dc1

Consul exposes the information about the instance through API and because of this consul can be used for other infrastructural application, example dashboard, Monitoring tool or our own event management system.

[email protected]:~$ curl localhost:8500/v1/catalog/nodes
[{"Node":"consuldemo","Address":"172.28.128.16","TaggedAddresses":{"lan":"127.0.0.1","wan":"127.0.0.1"},"CreateIndex":4,"ModifyIndex":110}]

Similarly, We can run the consul agent within the client and we need to join this client with the server clusters so that we setup our querying mechanism or dashboard or cluster monitoring.

[email protected]:~$consul agent -data-dir /tmp/consul -client172.28.128.17-ui-dir /home/your_user/dir -join 172.28.128.16

Service discovery is an another great feature of consul. For our infrastructure services, we need to create separate service configuration file in JSON format for consul. service configuration file should be kept inside the consul.d configuration folder for getting identified by consul agent. so we need to create consul.d inside /etc/ folder.

[email protected]:~$ sudo mkdir /etc/consul.d

Let us assume we have a service named as “nginx” and it is running on port 80. so we will create a service configuration file inside our consul.d folder for this service “nginx

[email protected]:~$ echo '{"service": {"name": "nginx", "tags": ["rails"], "port": 80}}' \
>/etc/consul.d/nginx.json

Later when we are starting our agent, we can see our services which are mentioned inside consul.d folder also synced with the consul agent.

[email protected]:~$ consul agent -server -config-dir /etc/consul.d
==> Starting Consul agent...
...
[INFO] agent: Synced service 'nginx'
...

Which mean the service can be communicating with consul agent. So the availability of a node and health of the node can be shared across the cluster.

We can query the service using either  DNS or HTTP API. If we are using DNS, then we need to use dig for query and DNS name will be like NODE_NAME.service.consul. If we are having multiple services with the same application, we can separate it with tags. And its DNS will be like TAG.NODE_NAME.service.consul. Since we have an internal DNS name within the cluster, we can manage DNS issue which usually occurs while load balancer fails.

[email protected]:~$ dig @127.0.0.1 -p 8600 web.service.consul
...
;; QUESTION SECTION:
;web.service.consul. IN A

;; ANSWER SECTION:
web.service.consul. 0 IN A 172.20.20.11

If we use HTTP API for querying the service then it will be like

[email protected]:~$ curl http://localhost:8500/v1/catalog/service/nginx
[{"Node":"consuldemo","Address":"172.28.128.16","ServiceID":"nginx", \
"ServiceName":"nginx","ServiceTags":["rails"],"ServicePort":80}]

So here we could see how it can be helpful for service discovery right..?

Just like Service discovery, Health checking of nodes is also taken care by this consul. Consul could expose the status of the node so that we could easily find the solution for failure detection among the nodes. For this example, i am manually crashing the server

[email protected]:~$ curl http://localhost:8500/v1/health/state/critical
[{"Node":"my-first-agent","CheckID":"service:nagix","Name":"Service 'nginx' check","Status":"critical","Notes":"","ServiceID":"nginx","ServiceName":"nginx"}]

Usually, infrastructural configurations are stored with key/value pair since consul provides that we could use it for dynamic configurations.
for example:

[email protected]:~$ curl -X PUT -d 'test' http://localhost:8500/v1/kv/nginx/key1
true
[email protected]:~$ curl -X PUT -d 'test' http://localhost:8500/v1/kv/nginx/key2?flags=42
true
[email protected]:~$ curl -X PUT -d 'test' http://localhost:8500/v1/kv/nginx/sub/key3
true
[email protected]:~$ curl http://localhost:8500/v1/kv/?recurse
[{"CreateIndex":97,"ModifyIndex":97,"Key":"nginx/key1","Flags":0,"Value":"dGVzdA=="},
{"CreateIndex":98,"ModifyIndex":98,"Key":"nginx/key2","Flags":42,"Value":"dGVzdA=="},
{"CreateIndex":99,"ModifyIndex":99,"Key":"nginx/sub/key3","Flags":0,"Value":"dGVzdA=="}]

Since key/value pair configuration is more effective for infrastructure, this will be the distributed asynchronous – ly solution of centralized dynamic configuration.

The big feature of consul is UI for everything. We can check health of cluster members, store/delete key/values in consul, service management etc.. to get this dashboard go to the browser

http://consul_client_IP:8500/ui

And for live demo consul provides demo dashboard[link]{https://demo.consul.io/ui/}

Setting up Consul Using Anisble

For installation and basic configuration, download the ansible role and simply run the sample playbook from here (https://github.com/PrabhuVignesh/consul_installer).

OR

Download ansible role from ansible-galaxy:

-$ ansible-galaxy install PrabhuVignesh.consul_installer

Or simply download the ansible playbook with Vagrant file from here (https://github.com/PrabhuVignesh/consul_experiment) and just follow the instructions from the README.md file.

Conclusion

Converting your application into microservices is not a big deal. Making it as a scalable application is always a challenging thing. This challenges can be solved if we can able to combine tools like Consul, serf, messaging queue tools together. This will make your microservices scalable, Fault tolerant and highly available for Zero downtime application.

consul service discovery

How to Setup a Replicated GlusterFS Cluster on AWS EC2

GlusterFS cluster on AWS ec2

GlusterFS  is one of the best open source distributed file systems. If you want a highly available distributed file system for your applications, GlusterFs is one of the good options.

Now: AWS offers a managed scalable file storage called Elastic File System. If you don’t want the administrative overhead of glusterFS clusters, you can give EFS a try.

GlusterFS cluster on AWS ec2

This guide covers all the necessary steps to setup a GlusterFs cluster using ec2 instances and extra EBS volumes. Here we are setting up a two node cluster, however, you can increase the node count based on your needs.

Instance Configurations

1. Create two instances with extra EBS volumes.

2. Make sure the instances can talk to each other.

3. Login to the servers and set the hostnames as node1 and node2 using the following commands.

hostnamectl set-hostname node1
hostnamectl set-hostname node2

Run bash on the terminal for the hostname to show up.

4. Make an entry in the /etc/hosts file with the IP and hostname of both servers as shown below. The IP’s should be resolvable using the hostnames. Change the IP in the following to your server IP’s.

172.31.21.201 node1
172.31.21.202 node2

Instance Port Configuration

You need to open the following ports int he ec2 security groups as well the server firewall if it is enabled.

111
24007 - GlusterFS Daemon. 24008 - GlusterFS Management 38465 to 38467 - GlusterFS NFS service 49152 to n - Depends on number of bricks.

Create Mount Points for EBS Volumes

You need to do the following in both the ec2 instances. 1. Format the volume to xfs.

sudo mkfs -t xfs /dev/xvdb

xvdb is the name of the EBS volume. You can list the available devices using lsblk command.

2. Create a mount directory named /gshare and mount the formatted volume.

sudo mkdir /gshare
sudo mount /share /dev/xvdb

3. Add the mount to /etc/fstab

/dev/xvdb    /gshare   xfs   defaults,nofail     0

GlusterFS Installation

Perform the following steps on both the servers.

1. Create a GlusterFs repo.

sudo vi /etc/yum.repos.d/Gluster.repo

Copy the following to the repo file.

[gluster38]
name=Gluster 3.8
baseurl=http://mirror.centos.org/centos/7/storage/$basearch/gluster-3.8/
gpgcheck=0
enabled=1

2. Install GlusterFS server.

sudo yum install glusterfs-server -y

3. Start and verify the glusterd service.

sudo systemctl  start glusterd
sudo systemctl  status glusterd

GlusterFs Configuration

1. From node1 execute the following command to create a trusted storage pool with node2.

sudo gluster peer probe node2

After successful execution, you would get peer probe: success. as output.

2. Check the peer status using the following command.

[[email protected] ~]$ sudo gluster peer status Number of Peers: 1 Hostname: node2 Uuid: 47ee8304-36ea-4b95-9214-4854bc98b737 State: Peer in Cluster (Connected) [[email protected] ~]$

3. Create a data directory on gshare mount on both the servers.

sudo mkdir /gshare/data

4. Create a GlusterFS HA shared volume.

sudo gluster volume create gdata replica 2 node1:/gshare/data node2:/gshare/data

5. Start the gdata volume.

sudo gluster volume start gdata

6. By default NFS is disabled. If you want NFS functionality for glusterFs volume, you can enable it using the following command.

sudo gluster volume set gdata nfs.disable off

7. Set the volume permissions for gdata volume for client access. Here I am using 172.* CIDR. You need to replace it based on your network range.

sudo gluster volume set gdata auth.allow "172.*"

7. To get all the info about the volume, execute the following command.

sudo gluster volume info gdata

GlusterFS Client Setup

1. Enable fuse kernel module.

sudo modprobe fuse

2. Install all the glusterFS client dependencies.

sudo yum install fuse fuse-libs openib libibverbs -y

3. Install the GlusterFS client.

sudo yum install  glusterfs-client -y

GlusterFS Client Configuration

The data will get replicated only if you are writing from a GlusterFS client. You can mount the GlusterFS volume to any number of clients. We highly recommend you to map the gluster nodes to a domain name and use it with the clients for mounting.

Note A client machine is not part of the glusterFS cluster. It is the machine in which you want to mount the replicated volume.

1. Create a client mount directory.

sudo mkdir /gfdata

2. Mount gfdata directory to the glusterFS replicated volume.

sudo mount -t glusterfs node1:/gdata /gfdata

Troubleshooting GlusterFs

1. You can view all the logs of gluserFS server on the following directory.

/var/log/glusterfs

To monitor logs in real time you can use tail -f along with the path to log file.
2. All glusterfs client logs are saved in the following with the volume name.

/var/log/glusterfs/
GlusterFS cluster on AWS ec2

Linux VI Editor Shortcuts, Tips and Productivity Hacks For Beginners

Linux VI Editor Shortcuts, Tips and Productivity Hacks

In our last blog post,  we have covered Linux CLI Productivity tips. When you work on Linux systems, vi editor is something which we use very often for editing files. It is not like using a GUI editor and people using it for the first time gets intimidated by it as you have to use various keystrokes for controlling the editing. However with a little bit of practice and using the vi editor Shortcuts in our day to day activities will save your time, increases the Productivity and you will start loving the VI editor. Moreover, it is a very powerful test editor in the Linux ecosystem.

VI Editor Shortcuts For Beginners

In this article, we will cover the necessary shortcuts that you could use in day to day Linux activities that include the vi editor.

Note: Most of the commands explained in this tutorial works in normal mode. (Press ESC to make sure you are in Normal mode before executing commands)

Setting up VI environment

Before diving into shortcuts and commands, you must understand the vi editor settings. You can set all the necessary vi editor parameters in the ~/.vimrc file that will be loaded by default. If you don’t have that file, create one using touch ~/.vimrc

Following are the common parameters that you might need in the vimrc file.

set number
set autoindent
set shiftwidth=4
set softtabstop=4
set expandtab

The set number parameter will set line number for your vi editor. You can unset this temporarily by running :set nonumber from the editor.

Cut Copy Paste

1. press ESC and click v and move the cursor to select the string you want to copy. Use capital V to select the whole line.

2. press y to copy.

3. Place the cursor in the desired location and press p to paste.

Deleting Lines/Words

1. Place the cursor on the line you want to delete and press dd to delete that line

2. To delete a specific number of lines, you can use d10d. This will delete 10 lines starting from the cursor. You can give any number in place of 10 based on your needs.

3. To select and delete specific line, use Shift + V and then use up and down arrows for the selection. Once you selected the lines, press d for deleting all the selected lines.

In insert mode, you can do the following.

1. Ctrl + w will delete the last word where the cursor is in.

3. Ctrl + u will delete the all the words which are immediate to left of the cursor.

Searching and Replacing Texts

You can search through your files by pressing /.

1. For example, if you want to search a keyword data, you should do the following.

/data

To find next occurrence, just press n

Note: Searches are case sensitive. If you want  case-insensitive search, you should set :set ignorecase in the editor.

2. Use the following syntax for replacing a pattern.

:%s/pattern/replace/g

3. Use the following syntax for replacing  every occurrence with a prompt. It will highlight all the occurrences.

:%s/pattern/replace/gc

Convert to Upper and Lowercase

For any case conversions, keep the cursor on that line and use the following shortcuts.

1. To convert a line to uppercase use gUU

2. To convert a line to Lowercase, use guu

Copying Contents From Another File

This not something that you do very often. However, if you need to add the contents of another file to the file you are editing, place the cursor on the desired line and you can use the following. The contents of the specified file will be copied from the next line of the cursor.

:r /path/to/file

Fo example, if you to copy CPU info to an existing file, you would use the following command.

:r /proc/cpuinfo

Executing/Copying Contents From Command

1. To execute commands from the editor, you can use :! <command>. For example,

:! pwd

2. If you want the output of a command to be copied to the editing file, you can use :r! <command>. For example,to get and copy eth0 IP address, you could do the following.

:r! ip addr | grep eth0 | grep inet | awk '{print $2}' | cut -d / -f1

Getting Help

At any point in time, if you need help on the vi editor, you can run vimtutor command to open the command line tutorial in your terminal.

You can use all the normal vi commands to browse through the help document.

Wrapping Up

VI is a very powerful editor. Understanding the full functionality takes time and constant practice. We have explained some commands, tips and vi editor shortcuts  that will save some time while working with the vi editor. If you think you have some tip, please share it with us in the comment section. It could help others.

Linux VI Editor Shortcuts, Tips and Productivity Hacks