Kubernetes community started offering several built-in features to deploy, manage, and scale AI/ML applications efficiently.
In this blog, I will keep a track on all the native AI/ML features offered by Kubernetes (Alpha, Beta and GA features)
Gateway API Inference Extension
Kubernetes Gateway API Inference Extension is an official Kubernetes project to to support service ML model.
It addresses the traffic-routing challenges for modern GenAI and LLM inference workloads.
Mounting Container Images as Volumes (Beta)
Kubernetes version 1.31 has introduced a new alpha feature that allows you to use OCI image volumes directly within Kubernetes pods.

OCI images are images that follow Open Container Initiative specifications. You can use this feature to store binary artifacts in images and mount them to pods.
This is particularly useful for ML projects dealing with LLMs. Large Language Model deployment often involves pulling models from various sources like cloud object storage or other URIs.
OCI images containing model data make it much easier to manage and switch between different models. One project already experimenting with a similar feature is KServe, which has a feature called Modelcars.
Modelcars allows you to use OCI images that contain model data. With native OCI volume support in Kubernetes, some of the current challenges are simplified, making the process smoother.
Kubernetes Device Plugins (Stable)
GPUs are one of the key requirements for AI and ML applications.
To support this need, Kubernetes offers a feature called device plugins.
These plugins allow nodes to advertise their hardware resources to the kubelet, giving containers access to specialized devices like GPUs (NVIDIA GPUs, AMD GPUs). Typically runs as DaemonSets.
This setup enables Kubernetes to efficiently manage and allocate the necessary hardware for running AI and ML workloads.
For example, you can use GPU nodes with EKS using Nvidia device plugins.
Official Kubernetes Community Projects
The following are the other key official kubernetes community projects to watch out for.
- JobSet: For distributed training orchestration
- Kueue: For intelligent job queueing with topology awareness
- LeaderWorkerSet: API for deploying groups of pods as units, specifically designed for multi-host inference workloads.