What is Docker? How Does it Work?
Docker has become the defacto standard when it comes to container-based implementations. From small scale implementations to large scale enterprise applications, docker serves as the base for container-based orchestration.
Docker gained so much popularity and adoption in the DevOps community in a short time because of the way it’s developed for portability and designed for modern microservice architecture.
In this blog you will learn,
- Container evolution and the underlying concept of Linux Containers
- What really is a container and what Linux features make it work.
- The difference between a process, container, and a VM
- Learn about Docker and see why Docker is very useful and different from other container technologies.
- Docker core architecture and its key components
The idea here is to get your basics right so that you understand what Docker really is and how it works.
Evolution of Containers
If you think containerization is a new technology, it is not. Google has been using its own container technology in its Infrastructure for years.
The concept of containers started way back in the 2000s. In fact, the roots go back to 1979 where we had chroot, a concept of changing the root directory of a process.
Here is a list of container-based projects that started in 2000.
|2000||FreeBSD jails introduced the container concept.|
|2003||Linux-V server project released with the concept of OS-level virtualization|
|2005||Solaris Zones– OS-level virtualization project introduced|
|2007||Google released a paper on Generic Process Containers|
|2008||The initial release of LXC containers|
|2011||cloudfoundry announced warden|
|2013||lcmcty– Open-sourced by Google|
|2013||Docker project was announced by DotCloud|
|2014||Rocket. (rkt) announced by CoreOS|
|2016||Windows container preview as releases as part of Windows server 2016|
What is a Linux container (LXC)?
Before diving directly into Docker concepts, first, you need to understand what is a Linux Container.
In a typical virtualized environment, one or more virtual machines run on top of a physical server using a hypervisor like Xen, Hyper-V, etc.
Containers, on the other hand, run on top of operating systems kernel. You can call it as OS-level virtualization. Before getting into the underlying container concepts, you need to understand two key Linux concepts.
- Userspace: All the code which is required to run user programs (applications, process) is called userspace. When you initiate a program action, for example, to create a file, the process in the userspace makes a system call to Kernal space.
- Kernel Space: This is the heart of the operating system, where you have the kernel code which interacts with the system hardware, storage, etc.
A container is a Process
When you start an application, for example, an Nginx web server, you are actually starting a process. A process itself is a self-contained instruction with limited isolation.
What if we can isolate the process with only files and configuration required for the process to run and operate. That is what a container does.
A container is basically a process with enough isolation of userspace components so that it gives a feeling of a separate operating system.
The parent container process may have a child process. So you can say, a container is also a group of processes.
For example, when you start an Nginx service, it starts a parent Nginx process. The parent process then spans its child processes like cache manager, cache loader, and workers.
So when you start an Nginx container, you are starting a master Nginx process in its isolated environment.
I will show you this practically in the below sections.
Each container has its isolated userspace, and you can run multiple containers on a single host.
Does that mean a container has the whole OS?
No. As opposed to a VM with its own kernel, a container just contains the required files related to a specific distro and uses the shared host kernel.
More interestingly, you can run different Linux distros based containers on a single host which shares the same kernel space.
For example, you can run an RHEL, CentOS, a SUSE based container on an Ubuntu server. It is possible because for all the Linux distros, only the userspace is different, and kernel space is the same.
Underlying Concept of Linux Containers
The following image gives you a visual representation of Linux continers.
Containers are isolated in a host using the two Linux kernel features called namespaces and control groups.
A real-world analogy would be an Apartment building. Even though it’s a single big building, each condo/flat is isolated for individual households having their own identity with metered water, gas, and electricity. We use concrete, steel structures, and other construction materials to establish this isolation. You do not have visibility into other homes unless they allow you in.
Similarly, you can relate this to a single host containing multiple containers. To isolate containers with their own CPU, memory, IP address, mount points, processes, you need two Linux kernel features called namespaces and control groups.
A container is all about having a well-isolated environment to run a service (Process). To achieve that level of isolation, a container should have its own file system, IP address, mount points, process IDs, etc. You can achieve this using the Linux Namespaces.
Namespaces are responsible for containers to have their own mount points, user, IP address, process management, etc..Essentially it sets boundaries for the containers.
Following are the key namespaces in Linux
- pid namespace: Responsible for isolating the process (PID: Process ID).
- net namespace: It manages network interfaces (NET: Networking).
- ipc namespace: It manages access to IPC resources (IPC: InterProcess Communication).
- mnt namespace: Responsible for managing the filesystem mount points (MNT: Mount).
- uts namespace: Isolates kernel and version identifiers. (UTS: Unix Timesharing System).
- usr namespace: Isolates user IDs. In simple words, it isolates the user ids between the host and container.
- Cgroup namespace: It isolates the control group information from the container process
Using these namespaces a container can have its own network interfaces, IP address, etc. Each container will have its own namespace and the processes running inside that namespace will not have any privileges outside its namespace.
Interestingly, you can list the namespaces in a Linux machine using the
Linux Control groups
When we start a service, we don’t specify any memory or CPU limit. We leave it to the kernel to prioritize and allocate resources for the services.
However, you can explicitly set CPU, memory limits for your services using a Linux kernel feature called
CGroups. It is not a straight forward approach, you need to make some extra configurations and tweaks to make it work.
Since you can run multiple containers inside a host, there should be a mechanism to limit resource usage, device access, etc. Here is where control groups come into the picture.
The resources used by a container is managed by Linux control groups. You can restrict CPU, memory, network, and IO resources of a container Linux control groups.
So what happens if I don’t limit the CPU & Memory resource of a container?
Well, a single container might end up using all the host resources leaving other containers to crash because of resource unavailability.
Tools like docker abstract away all the complex backend configurations and lets you specify these resource limits with simple parameters.
What is Docker?
Docker is a popular open-source project written in go and developed by Dotcloud (A PaaS Company).
It is basically a container engine that uses the Linux Kernel features like namespaces and control groups to create containers on top of an operating system.
Meaning, all the container concepts, and functionalities we learned in the LXC section are made very simply by Docker. Just by executing a few Docker commands & parameters, we will have containers up and running.
You might ask how Docker is different from a Linux Container (LXC) as all the concepts and implementation look similar?
Docker was initially built on top of Linux containers (LXC). Later docker replaced LXC with its own container runtime libcontainer (now part of runc)
Well, apart from just being a container technology, Docker has well-defined wrapper components that make packaging applications easy. Before the Docker, it was not easy to run containers. Meaning, it does all the work to decouple your application from the infrastructure by packing all application system requirements into a container.
For example, if you have a Java jar file, you can run it on any server which has java installed. Same way, once you package a container with required applications using Docker, you can run it on any other host which has docker installed.
Difference Between Docker & Container
Docker is a technology or a tool developed to manage container implementations efficiently.
So, can I run a container without Docker?
Yes! of course. you can use LXC technology to run containers on Linux servers.
Things you should know about Docker:
- Docker is not LXC
- Docker is not a Virtual Machine Solution.
- Docker is not a configuration management system and is not a replacement for chef, puppet, Ansible, etc.
- Docker is not a platform as a service technology.
What Makes Docker So Great?
Docker has an efficient workflow for moving the application from the developer’s laptop to the test environment to production. You will understand more about it when you look at a practical example of packaging an application into a Docker image.
Do you know that starting a docker container takes less than a second?
It is incredibly fast and it can run on any host with compatible Linux Kernel. (Supports Windows as well)
Note: you cannot run a Windows container on a Linux host because there is no Linux Kernel support for Windows. You can read about Windows containers from here
Docker uses a Copy-on-write union file system for its image storage. Whenever changes are made to a container, only the changes will be written to disk using copy on write model.
With Copy on write, you will have optimized shared storage layers for all your containers.
Docker Adoption Statistics
Here is the google trends data on Docker. You can see it is an exploding topic for the last five years.
Here is a survey result from Datadog which shows the rise in Docker adoption.
Docker Core Architecture
In the following sections, we will look at the Docker architecture and its associated components. We will also look at how each component works together to make Docker work.
Docker architecture has changed a few times since its inception. When this article was first released, Docker was built on top of LXC
Here are some notable architectural changes that happened for the Docker
- Docker moved from LXC to libcontainer in 2014
- runc – a CLI for spinning up containers that follow all OCI specifications.
- contianerd – Docker separated its container management component to containerd in 2016
OCI: Open Container Initiative is an open industry standard for container runtime and specifications.
When docker was initially launched, it had a monolithic architecture. Now it is separated into following three different components.
- Docker Engine (dockerd)
- docker-containerd (containerd)
- docker-runc (runc)
Docker and other big organizations decided to contribute to a common container runtime and management layers. Hence
runc are now part of the Cloud Native Foundation with contributors from all the organizations.
Note: When installing Docker, all these components get installed. You don’t have to install it separately. For exaplanation, we are showing it as different components.
Now let’s have a looks at each Docker component.
Docker engine is composed of the docker daemon, an API interface, and Docker CLI. Docker daemon (dockerd) runs continuously as
dockerd systemd service. It is responsible for building the docker images.
To manage images and run containers,
dockerd calls the
containerd is another system daemon service than is responsible for downloading the docker images and running them as a container. It exposes its API to receive instructions from the
runc is the container runtime, which is responsible for creating the namespaces and cgroups required for a container. It then runs the container commands inside those namespaces. runc runtime is implemented as per the OCI specification.
To understand more about container runtimes read this excellent 3 part blog post series.
How Does Docker Work?
We have seen the core components for Docker. But to build, ship, share and run docker containers, there are other components involved.
Let’s look at the key Docker components in a Docker ecosystem.
Docker is composed of the following four components
- Docker Daemon (dockerd)
- Docker Client
- Docker Images
- Docker Registries
- Docker Containers
Here is the official high-level docker architecture diagram that shows the common Docker workflow.
Docker has a client-server architecture. Docker Daemon (
dockerd) or server is responsible for all the actions that are related to containers.
The daemon receives the commands from the Docker client through CLI or REST API. Docker client can be on the same host as a daemon or it can be present on any other host.
By default, the docker daemon listens to the
docker.sock UNIX socket. If you have any use case to access the docker API remotely, you need to expose it over a host port. One such use case us running Docker as Jenkins agents.
If you want to run docker inside docker, you can use the
docker.sock from the host machine.
Images are the basic building blocks of Docker. You need an image to run a Docker container. Images contain the OS libraries, dependencies, and tools to run an application.
Images can be prebuilt with application dependencies for creating containers. For example, if you want to run an Nginx web server as a Ubuntu container, you need to create a Docker image with the Nginx binary and all the OS libraries required to run Nginx.
Docker has a concept of
Dockerfile that is used for building the image. A Dockerfile basically a text file that contains one command (instructions) per line.
Here is an example of a Dockerfile.
A docker image is organized in a layered fashion. Every instruction on a
Dockerfile is added a layer in an image. The topmost writable layer of the image is a container.
Every image is created from a base image.
For example, if you can use a base image of Ubuntu and create another image with Nginx application in it. A base image can be a parent image or an image built from a parent image. Check out his docker article to know more about it.
You might ask where does this base image (Parent image) come from? there are docker utilities to create the initial parent base image. Basically it takes the required OS libraries and bakes it into a base image. You don’t have to do this because you will get the official base images for all the Linux distros.
The top layer of a image is writable and used by the running container. Other layers in the image are read only.
It is a repository for Docker images. Using the Docker registry, you can share images. It acts as a central repository for the Docker images.
A registry can be public or private. Docker Inc provides a hosted registry service called Docker Hub. It allows you to upload and download images from a central location.
Note: By default, when you install docker, it looks for images from the public Docker hub unless you specify a custom registry in Docker settings.
If your repository is public, all your images can be accessed by other Docker hub users. You can also create a private registry in Docker Hub.
Docker hub acts like git, where you can build your images locally on your laptop, commit it and then can be pushed to the Docker hub.
Tip: When using docker in enterprise networks/project, set up your own docker registries instead of using the public docker hub. All cloud providers have their own container registry services.
It is the execution environment for Docker. Containers are created from images. It is a writable layer of the image.
If you try to relate image layers and a container, here is how it looks for a ubuntu-based image.https
You can package your applications in a container, commit it, and make it a golden image to build more containers from it.
Containers can be started, stopped, committed, and terminated. If you terminate a container without committing it, all the container changes will be lost.
Ideally, containers are treated as immutable objects, and it is not recommended to make changes to a running container. Make changes to a running container only for testing purposes.
Two or more containers can be linked together to form tiered application architecture. However, hosting hight scalable applications with docker has been made easy with the advent of container orchestration tools like kubernetes.
You Might Like: List of Containers Orchestration Tools
Why Containers Are Better Than VMs?
Containers have some key advantages over VMs. Lets take a looks at those.
Resource Utilisation & Cost
- You can use VMs to run your applications independently, which means one service per VM. But it can still be underutilized. And resizing a VM is not an easy task for a production application.
- Containers, on the other hand, can run with very minimal CPU and memory requirements. Also, you can even run multiple containers inside a VM for application segregation. Plus, resizing a container takes seconds.
Provisioning & Deployment
- Provisioning a VM and deploying applications on it might take minutes to hours depending on the workflow involved. Even rollback takes time.
- But you can deploy a container in seconds and roll it back in seconds as well.
- Drift management in VMs is not easy. You need to have full-fledged automation and process in place to make sure all the environments are similar. Following immutable deployment models avoids drift in VM environments.
- When it comes to containers, once the image gets backed, it will be the same in all the environments. For any changes, you need to start making changes in dev env and re-bake the container image.
Recommended Course: Docker Mastery: The Complete Toolset From a Docker Captain
What is the difference between containerd & runc?
contianerd is responsible for managing the container and runc is responsible for running the containers (create namespaces, cgroups and run commands inside the container) with the inputs from containerd
What is the difference between the Docker engine & Docker daemon?
Docker engine is composed of the docker daemon, rest interface, and the docker CLI. Docker daemon is the systemd dockerd service responsible for building the docker images and send docker instructions to containerd runtime.
The best feature of Docker is collaboration.
Docker images can be pushed to a repository and can be pulled down to any other host to run containers from that image.
Moreover, Docker hub has thousands of images created by users, and you can pull those images down to your hosts based on your application requirements. Also, it is primarily used in container orchestration tools like kubernetes
If you want to run Docker for production workloads, make sure you follow Docker images’ recommended practices.
You can get started by installing Docker and run the basic operations.