3x Reduction in GitHub Actions Runner Costs with AWS EC2

Reduction in GitHub Actions Runner Costs

Much of your project’s commercial success hangs on its infrastructure.

It is a powerful thing that can speed up or stall delivery and time-to-market, significantly increase or decrease time and effort spent on updates and addressing customer feedback.

Robust DevOps and TestOps play an extremely important role in how well you manage your quality assurance. Subsequently, it greatly affects what kind of quality you are offering to your customers.

In the world of software development, TestOps—a blend of testing and operations – is part of DevOps processes that emphasize the importance of integrating testing throughout the various stages of the development lifecycle.

It introduces multiple useful practices. Among them, the use of Continuous Integration (CI) pipelines plays a pivotal role, especially if automated testing is heavily used in the DevOps workflows of a project.

These pipelines are usually managed by CI/CD tools like Jenkins, GitHub Actions, and TeamCity.

They also often rely on cloud providers, with AWS being a popular choice because it offers great scalability and is considered very fail-safe.

While running CI pipelines is an excellent solution for test automation, the use of AWS tends to be not exactly budget-friendly.

In this article, we will look into a workflow that can help significantly reduce AWS costs when running CI pipelines using self hosted AWS ec2 Github Actions runners.

Choosing the Right CI Approach with GitHub Actions

GitHub Actions offers two main approaches for running CI workflows:

  1. GitHub-Hosted Runners: These are provided directly by GitHub and offer a convenient, though sometimes costly, solution for businesses who are not keen on managing infrastructure. The costs rise dramatically if your project scales.
  2. Self-Hosted Runners: These are free and customizable, a great solution for companies willing to maintain their own infrastructure. However, they require a dedicated worker or even a team, depending on your project scale, for administration and maintenance.

The conventional way of running CI pipelines involves the use of EC2 instances. These instances continuously run on dedicated machines of a cloud provider, for example, AWS.

However, because the load is not distributed evenly and there’s plenty of idle time for which you still pay, it is not the most cost-efficient method.

An innovative solution to this is dynamically creating and deleting EC2 instances per workflow to cut idle time. At first glance, this may seem like a complicated task that requires a lot of effort.

Fortunately, this method is streamlined by Philips Labs’ Terraform AWS GitHub Runner which is what we will use in our scheme.

Philips Labs Terraform AWS GitHub Runner

First, let’s take a closer look into how Philips Labs Terraform AWS GitHub Runner plays into our workflow.

This open-source project uses Terraform to automate and scale the provision of AWS resources. All the terraform modules and lambdas used for this workflow are part of this repository.

https://github.com/philips-labs/terraform-aws-github-runner/

This workflow particularly helps to automatically deploy and manage GitHub Actions self-hosted runners on AWS.

What it does for cost reduction is ridding you of the pain of manual handling of GitHub Actions self-hosted runners, freeing up a lot of your time and effort.

Pricing Comparison for Different CI Methods

Now that we have outlined the possible ways of running CI workflows, let’s calculate their cost using an example project and compare them.

Let’s consider a scenario where a team of 10 conducts PR checks that take 20 minutes, over 20 working days (one calendar month):

1. Github Runners: At $0.008 per minute, the cost is $32/month

    We use a setup offered by the official documentation, using a Linux 2-core machine, with 2 cores and 16 GB of RAM. 

    We use the following calculation formula:

    0,008 * 20 * 10 * 20 = 32 USD/per month

    2. 24/7 EC2 Instances: Two t4g.xlarge instances running round the clock cost approximately $193.54/month

      Again, we use a configuration described in the official documentation, using a t4g.xlarge Linux machine with 4 vCPUs 16 GB of RAM. It costs 0,1344 per hour. To allow 2 workflows to run in parallel, we have to use 2 machines. 

      We use a formula for 24/7 operation:

      0,1344 * 2 * 24 * 30 = 193,536 USD/per month

      3. Auto-scaling EC2 Instances: This method costs just $8.96/month

        We use the same configuration as in the case with 24/7 EC2 instances, which is  2 t4g.xlarge Linux machines with 4 vCPUs 16 GB of RAM. However, since we run our EC2 instances on demand only, we will use the same time frame we used for the first variant.

        The team, again, needs to run PR checks for 20 minutes per day, for 20 working days. We need additional AWS tools to implement this setup: Lambda, SQS and S3 buckets.

        Costwise, their input is negligible, literally, <$1. However, you will need to configure them separately, of which we speak below. 

        Now, we use the following calculation formula:

        0,00224 * 20 * 10 * 20 = 8,96 usd/per month

        As you can see, the third setup offers a significant cost reduction as compared to the other two.

        Automating Dynamic EC2 GitHub Action Runners: A Detailed Workflow

        Now that we have established that running EC2 Instances on demand and creating one instance per workflow is much cheaper than other options, let’s look into how it’s done. 

        Here is the high level workflow of the setup.

        Automating EC2 Instance Creation for CI Workflows using Github Actions

        Note: All the code required for the entire setup is part of the philip labs Github repo

        Step 1: Infrastructure Setup with Terraform

        First, we need to ensure idempotency and replicability in the provisioning of AWS resources. For this, we use Terraform, an IaC software tool.

        With its help, we define our infrastructure and set it up for consistent use, regardless of the quantity of instances we need to spawn. At this stage, we define the required EC2 instances, networking configurations, and associated security settings

        Step 2: Event-Driven Triggers with GitHub and AWS

        • GitHub Webhooks: 

        We start by setting up a webhook in our GitHub repository. This webhook fires an HTTP request when specific GitHub events occur, such as a new pull request or a push to a branch.

        • AWS API Gateway: 

        Then we also configure an API Gateway to receive webhook events. This acts as the entry point for the webhook data and helps us to secure and manage incoming traffic before it triggers downstream processes.

        • Lambda for Event Verification:

        We implement an AWS Lambda function which is triggered by the API Gateway. This function will verify the signature of the incoming event and help us make sure it’s a legitimate and secure request from GitHub.

        Step 3: Queue Management with AWS SQS

        Once the verification is okayed, the Lambda function sends a message to an Amazon Simple Queue Service (SQS) queue.

        This message contains details about what instance is required based on the CI workload. SQS acts as a buffer and manages the handling of these requests in an efficient order.

        Step 4: Dynamic Resource Provisioning

        Another Lambda function, triggered by messages in the SQS queue, executes the Terraform code to create a new EC2 Instance. To expedite the setup, the GitHub Actions runner binary is pre-stored in an Amazon S3 bucket.

        This Lambda function configures the new EC2 instance to pull this binary directly from S3. This allows you to skip redundant download steps and speeds up the runner setup. 

        Step 5: Cleanup and Scale Down

        Now, after the CI workflow is completed, you need to scale down and terminate resources so that you don’t pay for their use longer than necessary.

        A final Lambda function is responsible for monitoring when the EC2 instances become idle. Once it detects an idle state, it triggers Terraform to close the idle instances.

        Insights We Have Gained

        Cost Efficiency

        This is just one example of using dynamic CI strategies, but it’s clear that they can drastically reduce our infrastructure maintenance cost, so we plan to look into and implement more of them.

        For us, this strategy translates into increased budget flexibility: literally, we would rather spend this money elsewhere. 

        Agility at Scale

        We also realized that efficient CI development cycles can speed up delivery and feedback incorporation time. This can be our competitive advantage, as now we are able to offer more agility and speed of development to our customers. 

        Motivation to Innovate More

        It may seem like a small thing to optimize a single process, but we view it as a part of a bigger process of encouraging innovation within our company.

        This success motivates our QA engineers and developers to seek out inventive solutions that can help us deliver higher-quality products. 

        Conclusion

        AWS is one of the most widely used and lauded cloud providers. However, it comes with a sufficient cost. While many companies are quite ready to pay for it, there are ways to optimize it and in the article above we have described one of them.

        This approach combines cutting-edge technology, allows you to use the full potential of AWS tools, and create a more dynamic and cost-effective CI environment.

        Leave a Reply

        Your email address will not be published. Required fields are marked *

        You May Also Like