When it comes to running applications today, Kubernetes is becoming the gold standard.
However, in recent times, several companies have chosen to move away from Kubernetes due to its complexity and the challenges it presents.
In this blog, I want to share three such stories and their learnings. Learning from others' experiences is a great way to avoid huge mistakes in infrastructure management.
Looking at such learnings can help you make better design choices when working with Kubernetes.
1. Gitpod Story
Gitpod is a cloud-based development environment that provides pre-configured, ready-to-code workspaces for developers. It allows users to spin up a development environment in seconds.
Gitpod started with Kubernetes as the backbone for their cloud-based development environments.
Like many others, they believed Kubernetes’ scalability, automation, and orchestration would be perfect for handling thousands of development environments daily.
But as they grew, they ran into unexpected challenges. They struggled with the unique needs of development environments, which are highly stateful and interactive.
Here’s where things started to break:
- CPU Bursts – Developers need instant processing power, but Kubernetes scheduling wasn’t fast enough, causing frustrating delays.
- Memory Management – RAM is expensive, so they tried overbooking it (k8s allows it through requests and limits. But without proper swap space, Kubernetes would just kill processes when memory ran out. - OOM (Out of Memory) Killer.
- Storage Performance – Fast SSDs improved performance, but they were tied to specific nodes. Persistent storage (PVCs) should have helped, but it was slow and unreliable.
- Security & Isolation – Developers needed root-like access to install packages and configure their environments, but that clashed with Kubernetes strict security model.
- Networking Complexity – Each environment had to be isolated for security, but also needed flexible access for developers.
So they built Gitpod Flex, a Kubernetes-inspired but simplified control plane.
It solves most issues by,
- Removing Kubernetes overhead while keeping declarative APIs.
- Offering better security, performance, and ease of deployment.
- Supporting self-hosting in under 3 minutes with better compliance.
Gitpod learned this the hard way and built a leaner, more efficient alternative.
Source: GitPod Blog
2. Juspay Story
Juspay has a payment processing backend called Hyperswitch.
For Hyperswitch, Kafka plays an important role in event streaming, ensuring smooth data flow between application servers and storage.
Initially, Kubernetes was the go-to choice for container orchestration, providing a managed environment for scaling Kafka nodes.
However, as the workload grew, several unexpected challenges came up, making Kafka on Kubernetes inefficient and costly.
Following are the three pain points.
- Resource Allocation Inefficiencies: Kubernetes resource management often under-provisioned resources, leading to wasted CPU and memory. At scale, it led to led to much higher costs than expected.
- Auto-Scaling Struggles: Kafka is stateful, but Kubernetes auto-scaling is designed for stateless applications. This led to 15-second message processing delays and increased latency during scale-ups.
- Operational Complexity with Strimzi: Managing Kafka clusters with Strimzi (operator for running Apache Kafka on k8s) became a manual, error-prone process. Adding new nodes often failed to integrate, requiring frequent interventions.
To cut costs and improve Kafka performance, Hyperswitch migrated from Kubernetes to EC2.
Here are the results.
- 28% cost reduction, from $180/month per instance on Kubernetes to $130/month on EC2.
- Easy vertical scaling, upgrading from T-class to C-class instances without disruptions.
- More control over performance, ensuring stable operations under peak loads.
While it excels at stateless, highly dynamic applications, stateful systems like Kafka often require more control over resources and scaling.
Source: JusPay Blog
3. Threekit Story
Threekit is an enterprise-grade 3D visualization and augmented reality (AR) platform that enables businesses to create interactive experiences for e-commerce, retail, etc.
In 2018, the Threekit team looked for a fully managed compute solution.
It required batch processing to efficiently handle large-scale rendering, data transformation, and content generation.
Kubernetes had just emerged as the industry standard, making it the clear choice at the time.
Kubernetes soon revealed the following.
- High Costs: Running a cluster required redundant management nodes and over-provisioning due to slow autoscaling, leading to wasted resources.
- Scaling Issues: Managing high job volumes was challenging, and solutions like Argo introduced additional complexity.
- Operational Overhead: Even simple tasks required deep Kubernetes expertise, adding a DevOps burden. Maintaining the cluster demanded dedicated Kubernetes engineers.
- Lock-In Trap: Kubernetes clusters created dependencies that made it difficult to integrate external resources or migrate to a different setup.
While Kubernetes solved hardware management problems, it made infrastructure more complex and expensive to maintain.
To reduce complexity, they adopted Google Cloud Run.
- Cloud Run scales to zero, meaning costs are based only on actual usage, unlike Kubernetes, which required paying for idle resources.
- While Kubernetes scaling took minutes, Cloud Run scales up in seconds, ensuring seamless handling of traffic spikes.
- Cloud Run, built on Google’s Borg, eliminates the need for Kubernetes cluster maintenance, simplifying deployments.
- Cloud Run Tasks allowed up to 10,000 jobs per batch with built-in retries, removing the need for custom job scheduling infrastructure.
However, for companies focused on simpler, cost-efficient, and scalable solutions, solutions like Cloud Run offers a compelling alternative, eliminating infrastructure overheads while not compromising performance.
Source: Threekit Blog
Conclusion
For those who transitioned from legacy systems to VMs and then to Kubernetes, it is clear that Kubernetes isn’t always the best fit for every workload.
In my personal experience, we tried to host all stateful apps (databases, messaging systems, etc.) outside of Kubernetes. (Although many companies successfully run those in Kubernetes.)
Usually, problems start when you begin operating at scale, maintenance overhead, cost, etc.
Does This Mean Kubernetes Isn’t the Ideal Platform for Apps?
Not at all! Kubernetes is more popular than ever, with rapid adoption across industries, including AI/ML workloads, cloud-native applications, and large-scale microservices architectures.
However, it’s not a one-size-fits-all solution. While Kubernetes excels in many areas, there are certain use cases where it may not be the best choice.
skyquestt.com
If you have any doubts about this blog, drop them in a comment!
Want to Stay Ahead in DevOps & Cloud? Join the Free Newsletter Below.