How to Set Up Cloud Controller Manager in AWS with Kubeadm

Cloud Controller Manager on AWS Kubeadm Cluster

In this guide, I have added detailed steps to set up Kubernetes cloud controller manager on an AWS kubeadm cluster.

The idea of this setup is to understand the AWS configurations involved in the Cloud Controller manager in a self-hosted kubernetes setup.

If you want to understand how cloud controller manager works, please go through the detailed kubernetes architecture.

Prerequisites

This setup is performed in Ubuntu 22.04 Instance and the instance type is t2.medium.

You need a minimum of two nodes for this setup. (One controller and one worker node).

Also, ensure you have opened up all the required ports for Kubernetes on both nodes.

Let’s get started with the setup.

Step 1: Attach IAM Roles

Both controller and worker nodes need IAM roles with required permissions for the cloud controller manager to interact with the AWS APIs.

IAM Policy for the controller node

Create an IAM role with the following permissions and attach it to the controller node

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:DescribeTags",
                "ec2:DescribeInstances",
                "ec2:DescribeRegions",
                "ec2:DescribeRouteTables",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVolumes",
                "ec2:DescribeAvailabilityZones",
                "ec2:CreateSecurityGroup",
                "ec2:CreateTags",
                "ec2:CreateVolume",
                "ec2:ModifyInstanceAttribute",
                "ec2:ModifyVolume",
                "ec2:AttachVolume",
                "ec2:AuthorizeSecurityGroupIngress",
                "ec2:CreateRoute",
                "ec2:DeleteRoute",
                "ec2:DeleteSecurityGroup",
                "ec2:DeleteVolume",
                "ec2:DetachVolume",
                "ec2:RevokeSecurityGroupIngress",
                "ec2:DescribeVpcs",
                "elasticloadbalancing:AddTags",
                "elasticloadbalancing:AttachLoadBalancerToSubnets",
                "elasticloadbalancing:ApplySecurityGroupsToLoadBalancer",
                "elasticloadbalancing:CreateLoadBalancer",
                "elasticloadbalancing:CreateLoadBalancerPolicy",
                "elasticloadbalancing:CreateLoadBalancerListeners",
                "elasticloadbalancing:ConfigureHealthCheck",
                "elasticloadbalancing:DeleteLoadBalancer",
                "elasticloadbalancing:DeleteLoadBalancerListeners",
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:DescribeLoadBalancerAttributes",
                "elasticloadbalancing:DetachLoadBalancerFromSubnets",
                "elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
                "elasticloadbalancing:ModifyLoadBalancerAttributes",
                "elasticloadbalancing:RegisterInstancesWithLoadBalancer",
                "elasticloadbalancing:SetLoadBalancerPoliciesForBackendServer",
                "elasticloadbalancing:AddTags",
                "elasticloadbalancing:CreateListener",
                "elasticloadbalancing:CreateTargetGroup",
                "elasticloadbalancing:DeleteListener",
                "elasticloadbalancing:DeleteTargetGroup",
                "elasticloadbalancing:DescribeListeners",
                "elasticloadbalancing:DescribeLoadBalancerPolicies",
                "elasticloadbalancing:DescribeTargetGroups",
                "elasticloadbalancing:DescribeTargetHealth",
                "elasticloadbalancing:ModifyListener",
                "elasticloadbalancing:ModifyTargetGroup",
                "elasticloadbalancing:RegisterTargets",
                "elasticloadbalancing:DeregisterTargets",
                "elasticloadbalancing:SetLoadBalancerPoliciesOfListener",
                "iam:CreateServiceLinkedRole",
                "kms:DescribeKey"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

IAM Policy for the Worker Nodes

Create an IAM role with the following permissions and attach it to the worker nodes.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstances",
                "ec2:DescribeRegions",
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:GetRepositoryPolicy",
                "ecr:DescribeRepositories",
                "ecr:ListImages",
                "ecr:BatchGetImage"
            ],
            "Resource": "*"
        }
    ]
}

Step 2: Set Hostname (On All Nodes)

Ensure each node hostname has its own private DNS address. If not, change the hostname using the following command. This is a requirement.

Run the following command on each node (controller and worker).

sudo hostnamectl set-hostname $(curl -s http://169.254.169.254/latest/meta-data/local-hostname)

Verify the hostname by executing the hostname command.

ubuntu@node01:~$ hostname
ip-172-31-16-213.us-west-2.compute.internal

Step 3: Install Kubeadm System Utilities

Container Runtime (CRI-O), kubelet, Kubeadm, and kubectl are the important utilities that should be present in each node (controller and worker).

    Install utilities using a shell script utilities.sh. It installs kubeadm version 1.30

    #!/bin/bash
    #
    # Common setup for all servers (Control Plane and Nodes)
    
    set -euxo pipefail
    
    # Kubernetes Variable Declaration
    KUBERNETES_VERSION="v1.30"
    CRIO_VERSION="v1.30"
    KUBERNETES_INSTALL_VERSION="1.30.0-1.1"
    
    # Disable swap
    sudo swapoff -a
    
    # Keeps the swap off during reboot
    (crontab -l 2>/dev/null; echo "@reboot /sbin/swapoff -a") | crontab - || true
    sudo apt-get update -y
    
    # Create the .conf file to load the modules at bootup
    cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
    overlay
    br_netfilter
    EOF
    
    sudo modprobe overlay
    sudo modprobe br_netfilter
    
    # Sysctl params required by setup, params persist across reboots
    cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
    net.bridge.bridge-nf-call-iptables  = 1
    net.bridge.bridge-nf-call-ip6tables = 1
    net.ipv4.ip_forward                 = 1
    EOF
    
    # Apply sysctl params without reboot
    sudo sysctl --system
    
    sudo apt-get update -y
    sudo apt-get install -y apt-transport-https ca-certificates curl gpg
    
    # Install CRI-O Runtime
    sudo apt-get update -y
    sudo apt-get install -y software-properties-common curl apt-transport-https ca-certificates
    
    curl -fsSL https://pkgs.k8s.io/addons:/cri-o:/stable:/$CRIO_VERSION/deb/Release.key |
        gpg --dearmor -o /etc/apt/keyrings/cri-o-apt-keyring.gpg
    
    echo "deb [signed-by=/etc/apt/keyrings/cri-o-apt-keyring.gpg] https://pkgs.k8s.io/addons:/cri-o:/stable:/$CRIO_VERSION/deb/ /" |
        tee /etc/apt/sources.list.d/cri-o.list
    
    sudo apt-get update -y
    sudo apt-get install -y cri-o
    
    sudo systemctl daemon-reload
    sudo systemctl enable crio --now
    sudo systemctl start crio.service
    
    echo "CRI runtime installed successfully"
    
    # Install kubelet, kubectl, and kubeadm
    curl -fsSL https://pkgs.k8s.io/core:/stable:/$KUBERNETES_VERSION/deb/Release.key |
        gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
    
    echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/$KUBERNETES_VERSION/deb/ /" |
        tee /etc/apt/sources.list.d/kubernetes.list
    
    sudo apt-get update -y
    sudo apt-get install -y kubelet="$KUBERNETES_INSTALL_VERSION" kubectl="$KUBERNETES_INSTALL_VERSION" kubeadm="$KUBERNETES_INSTALL_VERSION"
    
    # Prevent automatic updates for kubelet, kubeadm, and kubectl
    sudo apt-mark hold kubelet kubeadm kubectl
    
    sudo apt-get update -y
    
    # Install jq, a command-line JSON processor
    sudo apt-get install -y jq
    
    # Retrieve the local IP address of the eth0 interface and set it for kubelet
    local_ip="$(ip --json addr show eth1 | jq -r '.[0].addr_info[] | select(.family == "inet") | .local')"
    
    # Write the local IP address to the kubelet default configuration file
    cat > /etc/default/kubelet << EOF
    KUBELET_EXTRA_ARGS=--node-ip=$local_ip
    EOF

    To run the script, provide execute permission.

    chmod +x utilities.sh

    Login as root and execute the script.

    ./utilities.sh

    Step 4: Initialize Kubeadm Configuration (Only in Controller)

    In this step, we will initialize the control plane with configurations required for the cloud controller manager.

    Create configuration file kubeadm.config

    apiVersion: kubeadm.k8s.io/v1beta3
    kind: ClusterConfiguration
    apiServer:
      certSANs:
        - 127.0.0.1
        - 52.38.15.235
      extraArgs:
        bind-address: "0.0.0.0"
        cloud-provider: external
    clusterName: kubernetes
    scheduler:
      extraArgs:
        bind-address: "0.0.0.0"
    controllerManager:
      extraArgs:
        bind-address: "0.0.0.0"
        cloud-provider: external
    networking:
      podSubnet: "10.244.0.0/16"
      serviceSubnet: "10.96.0.0/12"
    ---
    apiVersion: kubeadm.k8s.io/v1beta3
    kind: InitConfiguration
    nodeRegistration:
      name: ip-172-31-21-29.us-west-2.compute.internal
      kubeletExtraArgs:
        cloud-provider: external
    

    In certSANs instead of 52.38.15.235 choose your controller’s public IP and change the node registration name to your controller’s private DNS (hostname we set in step 2). Modify the pod and service subnet if required.

    Initialize the configuration to bootstrap the kubeadm configuration.

    kubeadm init --config=kubeadm.config

    At the end of the initialization, a token will be generated by the controller to join the workers. This is required to create the controller configuration file.

    It would like the following highlighted token. Note down the token and cert hash.

    kubeadm join 172.31.21.29:6443 --token wthapw.xgfjvonbiidea4nr --discovery-token-ca-cert-hash sha256:8b2127f960d88432fb35fd7488a501ad189e1a4ab319158d0e01a1db7fec96d7
    

    Step 5: Join the Worker Nodes (Only in Worker Nodes)

    Create a configuration file in each worker node kubeadm-join-config.yaml.

    ---
    apiVersion: kubeadm.k8s.io/v1beta3
    kind: JoinConfiguration
    discovery:
      bootstrapToken:
        token: 30am0s.hleb5xs1dyz4ridc   
        apiServerEndpoint: "172.31.21.29:6443"
        caCertHashes:
          - "sha256:8b2127f960d88432fb35fd7488a501ad189e1a4ab319158d0e01a1db7fec96d7" 
    nodeRegistration:
      name: ip-172-31-18-193.us-west-2.compute.internal
      kubeletExtraArgs:
        cloud-provider: external
    

    Change the token apiServerEndpoint, and caCertHashes values to the intended values that your control plane generated when it initialized. name should be the full hostname of the respective node.

    Modify the node registration name to your worker node’s private DNS address.

    To join the workers to the controller, use the following command.

    kubeadm join --config kubeadm-join-config.yaml

    Modify the intended values of the configuration file and run the kubeadm join command on each worker node to join with the controller node.

    Step 6: Tag AWS Resources

    Tagging is essential to configure the Cloud Controller Manager because it ensures which AWS resources are used by which cluster.

    For example, if a cluster uses AWS Network Load Balancer and now the cluster is being destroyed, the specified NLB also be destroyed and not affect other resources. To tag the resource, the cluster-ID is required

    To find the Cluster ID, use the following command.

    kubectl config view

    The output would be like this.

    apiVersion: v1
    clusters:
    - cluster:
        certificate-authority-data: DATA+OMITTED
        server: https://172.31.21.29:6443
      name: kubernetes
    contexts:
    - context:
        cluster: kubernetes
        user: kubernetes-admin
      name: kubernetes-admin@kubernetes
    current-context: kubernetes-admin@kubernetes
    kind: Config
    preferences: {}
    users:
    - name: kubernetes-admin
      user:
        client-certificate-data: DATA+OMITTED
        client-key-data: DATA+OMITTED
    

    In the clusters section, the value of name is taken as the Cluster ID.

    If the AWS resources are managed by one cluster, we can provide the tag key kubernetes.io/cluster/kubernetes and the tag value as owned and if the resources are managed by multiple clusters, the tag key would be the same as kubernetes.io/cluster/kubernetes and the tag value should be mentioned as shared.

    Tags should be added to resourecs the controller and worker node consumes, such as VPC, Subnet, EC2 instance, Security Group, etc.

    Here is an example.

    Step 7: Configure the Cloud Controller Manager

    Clone the AWS Cloud Controller repository to the controller plane node where you have the Kubectl access.

    git clone https://github.com/kubernetes/cloud-provider-aws.git

    Navigate to the base directory. It has all the kubernetes manifests for the cloud controller manager and the Kustomize file.

    cd cloud-provider-aws/examples/existing-cluster/base

    Create the daemonset using the following command. -k is for Kustomize.

    kubectl create -k .

    To verify the daemonset is running properly, use the following command.

    kubectl get daemonset -n kube-system

    To ensure the CCM pod is running, use the following command.

    kubectl get pods -n kube-system

    Step 8: Provision Network Load Balancer

    To test if the Cloud controller manager is working, we will deploy a sample Nginx deployment and expose it over service type LoadBlancer. The cloud controller manager should be able to create a Network load balancer that routes traffic to the nginx deployment.

    Create a deployment file for the Nginx web server using the following YAML. Execute it directly.

    cat <<EOF | kubectl apply -f -
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment
    spec:
      replicas: 3  
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx-container
            image: nginx:latest
            ports:
            - containerPort: 80
    EOF

    Create a service file to provision a load balancer nginx-service-nlb.yaml. Here the annotation service.beta.kubernetes.io/aws-load-balancer-type: nlb is very important. It tells the cloud controller manager to create a Network Load Balancer. By default, it deploys a classic load balancer.

    cat <<EOF | kubectl apply -f -
    apiVersion: v1
    kind: Service
    metadata:
      name: nginx-service
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-type: nlb
    spec:
      selector:
        app: nginx
      ports:
        - protocol: TCP
          port: 80
          targetPort: 80
      type: LoadBalancer
    EOF

    To ensure the deployment is running properly, use the following command.

    kubectl get pods

    To ensure the service is running properly, use the following command.

    kubectl get svc

    You should see the load balancer endpoint in the output as highlighted below.

    $ kubectl get svc nginx-service
    NAME            TYPE           CLUSTER-IP      EXTERNAL-IP                                                                     PORT(S)        AGE
    nginx-service   LoadBalancer   10.101.171.34   aa64c7e28c3384b5598493b6fbb04d4c-f53de39b06106733.elb.us-west-2.amazonaws.com   80:30249/TCP   39s

    The registration process takes a few minutes. If you check the load balancer after a while you should see the worker nodes registered as healthy targets. The service NodePort would be used as a health check port.

    AWS network loadbalancer as Kubernetes service with worker nodes as registered targets.

    Once the nodes are registered to the NLB, you should see the Nginx homepage if you visit the Load balancer URL as shown below.

    Accessing Nginx deployment over NLB URL

    The following image shows the CCM load balancer traffic workflow to pods.

    Possible Errors & Troubleshooting

    To find the logs of the CCM pod, first, you need to find the pod name and use the following command

    kubectl logs -n kube-system aws-cloud-controller-manager-zwctc

    Following are the errors I have faced in this setup

    Error 1: Cloud provider could not be initialized

    E1028 08:39:59.394822       1 tags.go:95] Tag "KubernetesCluster" nor "kubernetes.io/cluster/..." not found; Kubernetes may behave unexpectedly.
    F1028 08:39:59.394858       1 main.go:106] Cloud provider could not be initialized: could not init cloud provider "aws": AWS cloud failed to find ClusterID
    

    Solution:

    Ensure all resources which are used by CCM (controller and worker) should be tagged with the correct cluster-ID.

    Example:

    Error 2: node has no providerID

    E1028 13:25:07.544115       1 node_controller.go:277] Error getting instance metadata for node addresses: error fetching node by provider ID: Invalid format for AWS instance (), and error by node name: could not look up instance ID for node "ip-172-31-30-122": node has no providerID
    

    solution:

    Before starting the cluster configuration, ensure all the servers’ hostname is their private DNS address also keep in mind that the configuration files also use private DNS addresses.

    Example:

    root@ip-172-31-21-29:~# hostname
    ip-172-31-21-29.us-west-2.compute.internal
    

    Error 3: couldn’t get current server API group list

    E1030 06:04:54.829375    9140 memcache.go:265] couldn't get current server API group list: Get "<http://localhost:8080/api?timeout=32s>": dial tcp 127.0.0.1:8080: connect: connection refused
    The connection to the server localhost:8080 was refused - did you specify the right host or port?
    

    Solution:

    Ensure you have a copy of admin.conf file in your home directory, if not make a copy using cp command

    cp /etc/kubernetes/admin.conf ~/.kube/config

    Example:

    root@ip-172-31-21-29:~# ls -l /etc/kubernetes/ ~/.kube/
    /etc/kubernetes/:
    total 36
    -rw------- 1 root root 5648 Oct 30 05:17 admin.conf
    -rw------- 1 root root 5676 Oct 30 05:48 controller-manager.conf
    -rw------- 1 root root 2112 Oct 30 05:49 kubelet.conf
    drwxr-xr-x 3 root root 4096 Oct 30 06:17 manifests
    drwxr-xr-x 3 root root 4096 Oct 30 05:17 pki
    -rw------- 1 root root 5624 Oct 30 05:17 scheduler.conf
    
    /root/.kube/:
    total 12
    drwxr-x--- 4 root root 4096 Oct 30 06:07 cache
    -rw------- 1 root root 5648 Oct 30 06:07 config
    

    Error 4: error execution phase preflight

    [preflight] Running pre-flight checks
    error execution phase preflight: [preflight] Some fatal errors occurred:
            [ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
            [ERROR Port-10250]: Port 10250 is in use
            [ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
    [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
    To see the stack trace of this error execute with --v=5 or higher
    

    Solution:

    Usually, this error happens when you remove the worker node and try to join again. To resolve this issue, remove the kubelet.conf and ca.crt.

    sudo rm -f /etc/kubernetes/kubelet.conf
    sudo rm -f /etc/kubernetes/pki/ca.crt

    Then join the worker node again using the following command.

    kubeadm join --config kubeadm-join-config.yaml --ignore-preflight-errors=Port-10250

    Conclusion

    I have done a high-level setup and tested provisioning the Network load balancer.

    Next, I will be testing storage provisioning using the cloud controller manager.

    I have added the steps after several trials and errors due to less information in the documentation. If you face any error during the setup, do drop an comment and I will look into it. To explore more, you can refer to the official documentation.

    Also, if you are getting started with the Kubeadm cluster for CKA certification, take a look at the detailed kubeadm setup guide.

    6 comments
    1. In Case, Step 02 doesn’t work as expected try this instead

      # Step 1: Obtain a token
      TOKEN=$(curl -X PUT “http://169.254.169.254/latest/api/token” -H “X-aws-ec2-metadata-token-ttl-seconds: 21600”)

      # Step 2: Use the token to retrieve the local hostname
      LOCAL_HOSTNAME=$(curl -H “X-aws-ec2-metadata-token: $TOKEN” -s http://169.254.169.254/latest/meta-data/local-hostname)

      # Step 3: Set the hostname
      sudo hostnamectl set-hostname “$LOCAL_HOSTNAME”

    2. I have followed this doc its very helpfull for me but when i install the nginx ingress i have facing issue with loadbalancer like its created the NLB when i check the health for instance in target group its shown one node is healthy and another node is unhealthy can you please help on this

      1. Hi Subhash,

        This guide is focussed on understanding how the cloud controller manager works with Kubeadm cluster. You can check with the Kubeadm community on this issue.

    3. Your deployed the nginx application and also service as load balance but i need to deploy the nginc-ingress control for communicating different micro services i have followd your doc but i can’t able to deployed the nginx-ingress controller please help on this how to deploy the nginx ingress controller for kubeadm

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You May Also Like