In this blog we will look at the high-level kubernetes cluster best practices that have to the taken into consideration when setting up a Kubernetes cluster.
Let’s look at the list in detail.
1. Kubernetes Networking (Cloud, Hybrid, or On-Prem):
Kubernetes network has to be designed in such a way that it can accommodate future cluster and application requirements.
One common Kubernetes networking mistake organizations do is using CIDR ranges that are not part of the organization’s network. In the future when they want the clusters to be in a hybrid network, it ends up in migration.
It is better to discuss with the organization’s network team before finalizing the network design. This way, you can carve out and reserve an IP range even if you are not part of the hybrid network.
Each cloud provider gives multiple options for Node and Pod networking.
For example, Google Kubernetes Engine offers multi-cluster services, VPC native clusters with routable pod IPs from the same VPC, and the peered VPCs.
AWS EKS provides options for VPC native and secondary range based pod networking.
But if you don’t want to expose your pod IPs, you might need to use something like an IP masquerading agent in your cluster so that the outgoing traffic will always have the Node IP as the source identity.
Also, Ingress and egress traffic design are essential. There could be API gateways, on-prem systems, proxy servers, and third-party APIs that you need to connect from the cluster apps.
Your design should include all the access requirements so that you won’t face any access restrictions during implementation.
2. Kubernetes Security, Compliance & Benchmarks
Following are the generic Kubernetes security best practices.
- Understand the compliance requirements and security benchmarks as per your organization’s policy. If you are using managed services, make sure it complies with the organization’s compliance policies.
- You can take a look at the CIS benchmark for kubernetes. Also, Aquasec has a utility names Kube-bench to check the CIS benchmarks against a kubernetes cluster.
- Will there be any PCI/PII data apps? If yes, then segregate these apps in terms of access and storage based on organizational policy.
- Implement pod security ( disabling container root access, privileged access, read-only file system, etc).
- Access container registries securely.
- Implement Network policies to control pod-to-pod traffic and isolate apps as per access requirements.
- A well-designed CI/CD pipeline to make sure only approved container images are deployed in the cluster. Containers should be scanned for vulnerabilities and fail the CI process if the image scan fails to meet the security requirements.
- Have a discussion with the organization’s security experts and get a signoff on the design and tooling before starting the implementation.
3. Kubernetes Cluster Access
It is very important to design and document the way the kubernetes cluster is accessed.
The following are the key considerations.
- Restricting manual cluster-admin access. Instead, cluster-admin access should only be allowed through automation.
- Implement RBAC’s authorization and integrate it with the organizations’ IAM.
- Allow kubernetes API access via service accounts with limited privileges.
- Implement policy-based access for fine-grained access controls using cncf tools like Open Policy Agent or Kyverno
- Consider options for openID connect
- Have a good audit mechanism for checking the roles and removing unused users, roles, service accounts, etc.
Design the access levels so that you can hand off responsibilities to other teams using the cluster. It would save time for everyone, and you can focus more on the engineering par rather than working on repeated tasks.
4. Kubernetes High Availability & Scaling
High availability is another key factor in the kubernetes cluster.
Here you need to consider the worker node availability across different availability zones.
Also, consider Pod Topology Spread Constraints to spread pods in different availability zones.
When we talk about scaling, it’s not just the autoscaling of instances or pods.
It’s about how gracefully you can scale down and scale up the apps without any service interruptions.
Depending on the type of apps that needs to be hosted on kubernetes, you can design deployments to evict the pods gracefully during scale-down and patching activities.
Also, consider chaos engineering experiments before production to check the cluster and app stability.
5. Kubernetes Ingress
Ingress is an essential component of Kubernetes clusters. There are many ways to set up a kubernetes ingress.
Also, there are different types of ingress controllers.
You can try out the best option that will be suitable for your organization’s compliance policies and scaling requirements.
- Have separate ingress controllers for the platform tools.
- SSL management for ingress endpoints.
- Do not try to route all the apps through the same ingress. If your apps grow day by day, they could end up in a big configuration file creating issues.
6. Kubernetes Backup & Restore Strategy
Whether it is a managed service or custom kubernetes implementation, it is essential to back up the cluster.
When we say backup, it is primarily backing up etcd.
You should have a very good design to automate the backup of the kubernetes cluster and its associated components.
Also, a design to restore the cluster is required.
There are also options to take the dump of existing objects in JSON format. You can use dump to restore the objects in the same or a different cluster.
7. Kubernetes Node & Container Image Patching & Lifecycle Management
Patching is a repeated process.
When it comes to kubernetes, there is node and container patching.
Make sure you implement DevSecOps principles in your CI/CD pipelines.
Here are some best practices,
- An automated pipeline integrated with container scanning tools to patch container images on a monthly schedule.
- An automated pipeline to perform node patching without downtime.
- An automated pipeline to manage the lifecycle of container images. You don’t want to keep so many versions in your registry that are outdated.
8. Kubernetes Cluster Upgrades
Generally, you can perform a cluster upgrade in two ways
- Upgrading the existing cluster (In-place upgrade)
- Create a new cluster and migrate the apps to the new cluster.
You need a very good automated pipeline design to perform a cluster upgrade.
There could be Networking, DNS, and other component changes during an upgrade. It all depends on the design & organizational policies.
9. Kubernetes Cluster Capacity & Storage
Cluster capacity is a very important topic of discussion.
You need to decide on the number of clusters you need to run.
Some organization prefers running multiple clusters to reduce the blast radius and easy maintenance. While others prefer a big cluster with a large number of worker nodes or less number of nodes with a huge instance capacity.
You can decide on the cluster capacity based on your needs and the size of the team to manage the clusters.
Next comes the storage part.
Plan how you want to attach volumes to containers. Follow all the standard storage security practices on kubernetes.
When it comes to the cloud, there is out-of-the-box support for provisioning storage,
If you are planning to run stateful sets, it is very important to design the storage to get high throughputs and maximum availability.
Also, stateful set backup and restore is also important.
10. Kubernetes Logging & Monitoring
Most organizations will have a centralized logging and monitoring system and they prefer to integrate kubernetes with these systems.
Here are some of the logging and monitoring best practices.
- Find out an estimate of how much log data will be generated.
- Mechanisms to ingest Kubernetes logs into the logging systems considering huge data volume.
- Scaling logging and monitoring components deployed in the cluster.
- Data retention as per the organization’s policy.
- Define and document the KPIs for monitoring.
These are some of the kubernetes design best practices that often get missed while setting up a kubernetes cluster.
Missing these aspects while implementing kubernetes could lead to issues in the overall cluster and might impose compromises for the business.
It is not just about creating a Kubernetes cluster using automation, you need to keep the Kubernetes cluster lifecycle management in consideration and plan automation efforts accordingly.
Ideally, the Solution/ Technical architect should keep all the mentioned Items (there could be many but worth considering) as a checklist while designing the cluster architecture to make sure they are implemented during the IaaC development.