In this comprehensive guide, I have explained what is Docker, its evolution, underlying core Linux concepts, and how it works.
Docker has become the defacto standard when it comes to container-based implementations. Docker is the base for container-based orchestration from small-scale implementations to large-scale enterprise applications.
Docker gained so much popularity and adoption in the DevOps community quickly because it’s developed for portability and designed for modern microservice architecture.
In this blog, you will learn,
- What is Docker?
- Learn about Docker and see why Docker is beneficial and different from other container technologies.
- Docker core architecture and its key components
- Container evolution and the underlying concept of Linux Containers
- What is a container, and what Linux features make it work?
- The difference between a process, container, and a VM
Here, the idea is to get your basics right to understand what Docker really is and how it works.
Table of Contents
What is Docker?
Docker is a popular open-source project written in go and developed by Dotcloud (A PaaS Company).
It is a container engine that uses the Linux Kernel features like namespaces and control groups to create containers on top of an operating system. So you can call it OS-level virtualization.
Docker was initially built on top of Linux containers (LXC). Later Docker replaced LXC with its container runtime libcontainer (now part of runc). I have explained the core LXC & container concepts towards the end of the article.
You might ask how Docker is different from a Linux Container (LXC) as all the concepts and implementation look similar?
Besides just being a container technology, Docker has well-defined wrapper components that make packaging applications easy. Before Docker, it was not easy to run containers. Meaning it does all the work to decouple your application from the infrastructure by packing all application system requirements into a container.
For example, if you have a Java jar file, you can run it on any server which has java installed. Same way, once you package a container with required applications using Docker, you can run it on any other host which has Docker installed.
We will have containers up and running by executing a few Docker commands & parameters.
Difference Between Docker & Container
Docker is a technology or a tool developed to manage containers efficiently.
So, can I run a container without Docker?
Yes! of course. You can use LXC technology to run containers on Linux servers. In addition, the latest tools like Podman offers similar workflows like Docker.
Things you should know about Docker:
- Docker is not LXC
- Docker is not a Virtual Machine Solution.
- Docker is not a configuration management system and is not a replacement for Chef, Puppet, Ansible, etc.
- Docker is not a platform as a service technology.
- Docker is not a container.
What Makes Docker So Great?
Docker has an efficient workflow for moving the application from the developer’s laptop to the test environment to production. You will understand more about it when you look at a practical example of packaging an application into a Docker image.
Do you know that starting a docker container takes less than a second?
It is incredibly fast, and it can run on any host with compatible Linux Kernel. (Supports Windows as well)
Note: you cannot run a Windows container on a Linux host because there is no Linux Kernel support for Windows. You can read about Windows containers from here
Docker uses a Copy-on-write union file system for its image storage. Therefore, when changes are made to a container, only the changes will be written to disk using copy on the write model.
With Copy on write, you will have optimized shared storage layers for all your containers.
Docker Adoption Statistics
Here is the google trends data on Docker. You can see it has been an exploding topic for the last five years.
Here is a survey result from Datadog, which shows the rise in Docker adoption.
Docker Core Architecture
The following sections will look at the Docker architecture and its associated components. We will also look at how each component works together to make Docker work.
Docker architecture has changed a few times since its inception. When I published the first version of this article, Docker was built on top of LXC
Here are some notable architectural changes that happened for the Docker
- Docker moved from LXC to libcontainer in 2014
- runc – a CLI for spinning up containers that follow all OCI specifications.
- containerd – Docker separated its container management component to containerd in 2016
OCI: Open Container Initiative is an open industry standard for container runtime and specifications.
When Docker was initially launched, it had a monolithic architecture. Now it is separated into the following three different components.
- Docker Engine (dockerd)
- docker-containerd (containerd)
- docker-runc (runc)
Docker and other big organizations contributed to a standard container runtime and management layers. Hence
runc are now part of the Cloud Native Foundation with contributors from all the organizations.
Note: When installing Docker, all these components get installed. You don’t have to install it separately. For exaplanation, we are showing it as different components.
Now let’s have a looks at each Docker component.
Docker engine comprises the docker daemon, an API interface, and Docker CLI. Docker daemon (dockerd) runs continuously as
dockerd systemd service. It is responsible for building the docker images.
To manage images and run containers,
dockerd calls the
containerd is another system daemon service than is responsible for downloading the docker images and running them as a container. It exposes its API to receive instructions from the
runc is the container runtime responsible for creating the namespaces and cgroups required for a container. It then runs the container commands inside those namespaces. runc runtime is implemented as per the OCI specification.
Read this excellent 3 part blog post series to understand more about container runtimes.
How Does Docker Work?
We have seen the core building blocks of Docker.
Now let’s understand the Docker workflow using the Docker components.
The following official high-level docker architecture diagram shows the common Docker workflow.
Docker ecosystem is composed of the following four components
- Docker Daemon (dockerd)
- Docker Client
- Docker Images
- Docker Registries
- Docker Containers
What is a Docker Daemon?
Docker has a client-server architecture. Docker Daemon (
dockerd) or server is responsible for all the actions related to containers.
The daemon receives the commands from the Docker client through CLI or REST API. Docker client can be on the same host as a daemon or present on any other host.
By default, the docker daemon listens to the
docker.sock UNIX socket. If you have any use case to access the docker API remotely, you need to expose it over a host port. One such use case is running Docker as Jenkins agents.
If you want to run Docker inside Docker, you can use the
docker.sock from the host machine.
What is a Docker Image?
Images are the basic building blocks of Docker. It contains the OS libraries, dependencies, and tools to run an application.
Images can be prebuilt with application dependencies for creating containers. For example, if you want to run an Nginx web server as a Ubuntu container, you need to create a Docker image with the Nginx binary and all the OS libraries required to run Nginx.
What is a Dockerfile?
Docker has a concept of
Dockerfile that is used for building the image. A Dockerfile a text file that contains one command (instructions) per line.
Here is an example of a Dockerfile.
A docker image is organized in a layered fashion. Every instruction on a
Dockerfile is added a layer in an image. The topmost writable layer of the image is a container.
Every image is created from a base image.
For example, if you can use a base image of Ubuntu and create another image with the Nginx application. A base image can be a parent image or an image built from a parent image. Check out his docker article to know more about it.
You might ask where this base image (Parent image) comes from? There are docker utilities to create the initial parent base image. It takes the required OS libraries and bakes them into a base image. You don’t have to do this because you will get the official base images for Linux distros.
The top layer of an image is writable and used by the running container. Other layers in the image are read-only.
What is a Docker Registry?
It is a repository (storage) for Docker images.
A registry can be public or private. For example, Docker Inc provides a hosted registry service called Docker Hub. It allows you to upload and download images from a central location.
Note: By default, when you install docker, it looks for images from the public Docker hub unless you specify a custom registry in Docker settings.
Other Docker hub users can access all your images if your repository is public. You can also create a private registry in Docker Hub.
Docker hub acts like git, where you can build your images locally on your laptop, commit it, and then be pushed to the Docker hub.
Tip: When using docker in enterprise networks/project, set up your own docker registries instead of using the public docker hub. All cloud providers have their own container registry services.
What is a Docker Container?
Docker Containers are created from existing images. It is a writable layer of the image.
If you try to relate image layers and a container, here is how it looks for a ubuntu-based image.
You can package your applications in a container, commit it, and make it a golden image to build more containers from it.
Containers can be started, stopped, committed, and terminated. If you terminate a container without committing it, all the container changes will be lost.
Ideally, containers are treated as immutable objects, and it is not recommended to make changes to a running container. Instead, make changes to a running container only for testing purposes.
Two or more containers can be linked together to form tiered application architecture. However, hosting hight scalable applications with Docker has been made easy with the advent of container orchestration tools like kubernetes.
Evolution of Containers
If you think containerization is a new technology, it is not. Google has been using its container technology in its infrastructure for years.
The concept of containers started way back in the 2000s. In fact, the roots go back to 1979 when we had chroot, a concept of changing the root directory of a process.
Here is a list of container-based projects that started in 2000.
|2000||FreeBSD jails introduced the container concept.|
|2003||Linux-V server project released with the concept of OS-level virtualization|
|2005||Solaris Zones– OS-level virtualization project introduced|
|2007||Google released a paper on Generic Process Containers|
|2008||The initial release of LXC containers|
|2011||cloudfoundry announced warden|
|2013||lcmcty– Open-sourced by Google|
|2013||Docker project was announced by DotCloud|
|2014||Rocket. (rkt) announced by CoreOS|
|2016||Windows container preview as released as part of Windows server 2016|
What is a Linux container (LXC)?
Now let’s understand what a Linux Container is.
In a typical virtualized environment, one or more virtual machines run on top of a physical server using a hypervisor like Xen, Hyper-V, etc.
On the other hand, Containers run on top of the operating system’s kernel. You can call it OS-level virtualization. Before getting into the underlying container concepts, you need to understand two key Linux concepts.
- Userspace: All the code required to run user programs (applications, process) is called userspace. When you initiate a program action, for example, to create a file, the process in the userspace makes a system call to Kernal space.
- Kernel Space: This is the heart of the operating system, where you have the kernel code, which interacts with the system hardware, storage, etc.
A container is a Process.
You start a process when you start an application, for example, an Nginx web server. A process itself is a self-contained instruction with limited isolation.
What if we can isolate the process with only files and configuration required to run and operate. That is what a container does.
A container is a process with enough isolation of userspace components to give a feeling of a separate operating system.
The parent container process may have a child process. So you can say a container is also a group of processes.
For example, when you start an Nginx service, it starts a parent Nginx process. The parent process spans its child processes like cache manager, cache loader, and workers.
So when you start an Nginx container, you are starting a master Nginx process in its isolated environment.
I will show you this practically in the below sections.
Each container has its isolated userspace, and you can run multiple containers on a single host.
Does that mean a container has the whole OS?
No. Unlike a VM with its kernel, a container contains the required files related to a specific distro and uses the shared host kernel.
More interestingly, you can run different Linux distros-based containers on a single host that shares the same kernel space.
For example, you can run an RHEL, CentOS, a SUSE-based container on an Ubuntu server. It is possible because only the userspace is different for all the Linux distros, and kernel space is the same.
Underlying Concept of Linux Containers
The following image gives you a visual representation of Linux continers.
Containers are isolated using the two Linux kernel features called namespaces and control groups.
A real-world analogy would be an Apartment building. Even though it’s a single big building, each condo/flat is isolated for individual households having their own identity with metered water, gas, and electricity. We use concrete, steel structures, and other construction materials to establish this isolation. You do not have visibility into other homes unless they allow you in.
Similarly, you can relate this to a single host containing multiple containers. To isolate containers with their CPU, memory, IP address, mount points, processes, you need two Linux kernel features called namespaces and control groups.
A container is all about having a well-isolated environment to run a service (Process). To achieve that level of isolation, a container should have its file system, IP address, mount points, process IDs, etc. You can achieve this using the Linux Namespaces.
Namespaces are responsible for containers’ mount points, user, IP address, process management, etc. So essentially, it sets boundaries for the containers.
Following are the key namespaces in the Linux
- pid namespace: Responsible for isolating the process (PID: Process ID).
- net namespace: It manages network interfaces (NET: Networking).
- ipc namespace: It manages access to IPC resources (IPC: InterProcess Communication).
- mnt namespace: Responsible for managing the filesystem mount points (MNT: Mount).
- uts namespace: Isolates kernel and version identifiers. (UTS: Unix Timesharing System).
- usr namespace: Isolates user IDs. In simple words, it isolates the user ids between the host and container.
- Cgroup namespace: It isolates the control group information from the container process
Using the above namespaces, a container can have its network interfaces, IP address, etc. Each container will have its namespace and the processes running inside that namespace will not have any privileges outside its namespace.
Interestingly, you can list the namespaces in a Linux machine using the
Linux Control groups
We don’t specify any memory or CPU limit when starting a service. Instead, we leave it to the kernel to prioritize and allocate resources for the services.
However, you can explicitly set CPU memory limits for your services using a Linux kernel feature called
CGroups. It is not a straightforward approach; you need to make some extra configurations and tweaks to make it work.
Since you can run multiple containers inside a host, there should be a mechanism to limit resource usage, device access, etc. Here is where control groups come into the picture.
Linux control groups manage the resources used by a container. You can restrict CPU, memory, network, and IO resources of a container Linux control group.
So what happens if I don’t limit the CPU & Memory resource of a container?
A single container might use all the host resources, leaving other containers to crash because of resource unavailability.
Tools like Docker abstract away all the complex backend configurations and let you specify these resource limits with simple parameters.
Why Are Containers Better Than VMs?
Containers have some key advantages over VMs. Let’s take a looks at those.
Resource Utilisation & Cost
- You can use VMs to run your applications independently, which means one service per VM. But it can still be underutilized. And resizing a VM is not an easy task for a production application.
- On the other hand, containers can run with very minimal CPU and memory requirements. Also, you can even run multiple containers inside a VM for application segregation. Plus, resizing a container takes seconds.
Provisioning & Deployment
- Provisioning a VM and deploying applications on it might take minutes to hours, depending on the workflow involved. Even rollback takes time.
- But you can deploy a container in seconds and roll it back in seconds as well.
- Drift management in VMs is not easy. You need to have full-fledged automation and processes to ensure all the environments are similar. Following immutable deployment models avoids drift in VM environments.
- Once the image gets backed, it will be the same in all the environments for containers. So for any changes, you need to start making changes in dev env and re-bake the container image.
What is the difference between containerd & runc?
containerd is responsible for managing the container and runc is responsible for running the containers (create namespaces, cgroups and run commands inside the container) with the inputs from containerd
What is the difference between the Docker engine & the Docker daemon?
Docker engine is composed of the docker daemon, rest interface, and the docker CLI. Docker daemon is the systemd dockerd service responsible for building the docker images and sending docker instructions to containerd runtime.
By now, you should have a good understanding of what Docker is and how it works.
The best feature of Docker is collaboration. Docker images can be pushed to a repository and pulled down to any other host to run containers from that image.
Moreover, the Docker hub has thousands of images created by users, and you can pull those images down to your hosts based on your application requirements. Also, it is primarily used in container orchestration tools like kubernetes
If you want to run Docker for production workloads, make sure you follow Docker images’ recommended practices.
You can read my article on how to reduce docker image size where I have listed down all the standard approaches to optimize the docker image.
Also, if you are trying to become a DevOps engineer, I highly recommend you get hands-on experience with Docker.