What is Docker? How Does it Work?

what is docker

Docker has become the defacto standard when it comes to container-based implementations. From small scale implementations to large scale enterprise applications, docker serves as the base for container-based orchestration.

Docker gained so much popularity and adoption in the DevOps community in a short time because of the way it’s developed for portability and designed for modern microservice architecture.

In this blog you will learn,

  1. Container evolution and the underlying concept of Linux Containers
  2. What really is a container and what Linux features make it work.
  3. The difference between a process, container, and a VM
  4. Learn about Docker and see why Docker is very useful and different from other container technologies.
  5. Docker core architecture and its key components

The idea here is to get your basics right so that you understand what Docker really is and how it works.

Evolution of Containers

If you think containerization is a new technology, it is not. Google has been using its own container technology in its Infrastructure for years.

The concept of containers started way back in the 2000s. In fact, the roots go back to 1979 where we had chroot, a concept of changing the root directory of a process.

Here is a list of container-based projects that started in 2000.

YearTechnology
2000FreeBSD jails introduced the container concept.
2003Linux-V server project released with the concept of OS-level virtualization
2005Solaris Zones– OS-level virtualization project introduced
2007Google released a paper on Generic Process Containers
2008The initial release of LXC containers
2011cloudfoundry announced warden
2013lcmcty– Open-sourced by Google
2013Docker project was announced by DotCloud
2014Rocket. (rkt) announced by CoreOS
2016 Windows container preview as releases as part of Windows server 2016

What is a Linux container (LXC)?

Before diving directly into Docker concepts, first, you need to understand what is a Linux Container.

In a typical virtualized environment, one or more virtual machines run on top of a physical server using a hypervisor like Xen, Hyper-V, etc.

Containers, on the other hand, run on top of operating systems kernel. You can call it as OS-level virtualization. Before getting into the underlying container concepts, you need to understand two key Linux concepts.

  1. Userspace: All the code which is required to run user programs (applications, process) is called userspace. When you initiate a program action, for example, to create a file, the process in the userspace makes a system call to Kernal space.
  2. Kernel Space: This is the heart of the operating system, where you have the kernel code which interacts with the system hardware, storage, etc.

A container is a Process

When you start an application, for example, an Nginx web server, you are actually starting a process. A process itself is a self-contained instruction with limited isolation.

What if we can isolate the process with only files and configuration required for the process to run and operate. That is what a container does.

A container is basically a process with enough isolation of userspace components so that it gives a feeling of a separate operating system.

The parent container process may have a child process. So you can say, a container is also a group of processes.

For example, when you start an Nginx service, it starts a parent Nginx process. The parent process then spans its child processes like cache manager, cache loader, and workers.

Image Src: https://www.nginx.com

So when you start an Nginx container, you are starting a master Nginx process in its isolated environment.

I will show you this practically in the below sections.

Each container has its isolated userspace, and you can run multiple containers on a single host.

Does that mean a container has the whole OS?

No. As opposed to a VM with its own kernel, a container just contains the required files related to a specific distro and uses the shared host kernel.

More interestingly, you can run different Linux distros based containers on a single host which shares the same kernel space.

For example, you can run an RHEL, CentOS, a SUSE based container on an Ubuntu server. It is possible because for all the Linux distros, only the userspace is different, and kernel space is the same.

Underlying Concept of Linux Containers

The following image gives you a visual representation of Linux continers.

what is a linux container

Containers are isolated in a host using the two Linux kernel features called namespaces and control groups.

A real-world analogy would be an Apartment building. Even though it’s a single big building, each condo/flat is isolated for individual households having their own identity with metered water, gas, and electricity. We use concrete, steel structures, and other construction materials to establish this isolation. You do not have visibility into other homes unless they allow you in.

Similarly, you can relate this to a single host containing multiple containers. To isolate containers with their own CPU, memory, IP address, mount points, processes, you need two Linux kernel features called namespaces and control groups.

Linux Namespaces

A container is all about having a well-isolated environment to run a service (Process). To achieve that level of isolation, a container should have its own file system, IP address, mount points, process IDs, etc. You can achieve this using the Linux Namespaces.

Namespaces are responsible for containers to have their own mount points, user, IP address, process management, etc..Essentially it sets boundaries for the containers.

Following are the key namespaces in Linux

  1. pid namespace: Responsible for isolating the process (PID: Process ID).
  2. net namespace: It manages network interfaces (NET: Networking).
  3. ipc namespace: It manages access to IPC resources (IPC: InterProcess Communication).
  4. mnt namespace: Responsible for managing the filesystem mount points (MNT: Mount).
  5. uts namespace: Isolates kernel and version identifiers. (UTS: Unix Timesharing System).
  6. usr namespace: Isolates user IDs. In simple words, it isolates the user ids between the host and container.
  7. Cgroup namespace: It isolates the control group information from the container process

Using these namespaces a container can have its own network interfaces, IP address, etc. Each container will have its own namespace and the processes running inside that namespace will not have any privileges outside its namespace.

Interestingly, you can list the namespaces in a Linux machine using the lsns command.

listing namespaces in linux

Linux Control groups

When we start a service, we don’t specify any memory or CPU limit. We leave it to the kernel to prioritize and allocate resources for the services.

However, you can explicitly set CPU, memory limits for your services using a Linux kernel feature called CGroups. It is not a straight forward approach, you need to make some extra configurations and tweaks to make it work.

Since you can run multiple containers inside a host, there should be a mechanism to limit resource usage, device access, etc. Here is where control groups come into the picture.

The resources used by a container is managed by Linux control groups. You can restrict CPU, memory, network, and IO resources of a container Linux control groups.

So what happens if I don’t limit the CPU & Memory resource of a container?

Well, a single container might end up using all the host resources leaving other containers to crash because of resource unavailability.

Tools like docker abstract away all the complex backend configurations and lets you specify these resource limits with simple parameters.

What is Docker?

Docker is a popular open-source project written in go and developed by Dotcloud (A PaaS Company).

It is basically a container engine that uses the Linux Kernel features like namespaces and control groups to create containers on top of an operating system.

Meaning, all the container concepts, and functionalities we learned in the LXC section are made very simply by Docker. Just by executing a few Docker commands & parameters, we will have containers up and running.

You might ask how Docker is different from a Linux Container (LXC) as all the concepts and implementation look similar?

Docker was initially built on top of Linux containers (LXC). Later docker replaced LXC with its own container runtime libcontainer (now part of runc)

Well, apart from just being a container technology, Docker has well-defined wrapper components that make packaging applications easy. Before the Docker, it was not easy to run containers. Meaning, it does all the work to decouple your application from the infrastructure by packing all application system requirements into a container.

For example, if you have a Java jar file, you can run it on any server which has java installed. Same way, once you package a container with required applications using Docker, you can run it on any other host which has docker installed.

Difference Between Docker & Container

Docker is a technology or a tool developed to manage container implementations efficiently.

So, can I run a container without Docker?

Yes! of course. you can use LXC technology to run containers on Linux servers.

Things you should know about Docker:

  1. Docker is not LXC
  2. Docker is not a Virtual Machine Solution.
  3. Docker is not a configuration management system and is not a replacement for chef, puppet, Ansible, etc.
  4. Docker is not a platform as a service technology.

What Makes Docker So Great?

Docker has an efficient workflow for moving the application from the developer’s laptop to the test environment to production. You will understand more about it when you look at a practical example of packaging an application into a Docker image.

Do you know that starting a docker container takes less than a second?

It is incredibly fast and it can run on any host with compatible Linux Kernel. (Supports Windows as well)

Note: you cannot run a Windows container on a Linux host because there is no Linux Kernel support for Windows. You can read about Windows containers from here

Docker uses a Copy-on-write union file system for its image storage. Whenever changes are made to a container, only the changes will be written to disk using copy on write model.

With Copy on write, you will have optimized shared storage layers for all your containers.

Docker Adoption Statistics

Here is the google trends data on Docker. You can see it is an exploding topic for the last five years.

Here is a survey result from Datadog which shows the rise in Docker adoption.

source: datadog.com

Docker Core Architecture

In the following sections, we will look at the Docker architecture and its associated components. We will also look at how each component works together to make Docker work.

Docker architecture has changed a few times since its inception. When this article was first released, Docker was built on top of LXC

Here are some notable architectural changes that happened for the Docker

  1. Docker moved from LXC to libcontainer in 2014
  2. runc – a CLI for spinning up containers that follow all OCI specifications.
  3. contianerd – Docker separated its container management component to containerd in 2016

OCI: Open Container Initiative is an open industry standard for container runtime and specifications.

When docker was initially launched, it had a monolithic architecture. Now it is separated into following three different components.

  1. Docker Engine (dockerd)
  2. docker-containerd (containerd)
  3. docker-runc (runc)

Docker and other big organizations decided to contribute to a common container runtime and management layers. Hence contianerd and runc are now part of the Cloud Native Foundation with contributors from all the organizations.

Note: When installing Docker, all these components get installed. You don’t have to install it separately. For exaplanation, we are showing it as different components.

docker core architecture

Now let’s have a looks at each Docker component.

Docker Engine

Docker engine is composed of the docker daemon, an API interface, and Docker CLI. Docker daemon (dockerd) runs continuously as dockerd systemd service. It is responsible for building the docker images.

To manage images and run containers, dockerd calls the docker-containerd APIs.

docker-containerd (containerd)

containerd is another system daemon service than is responsible for downloading the docker images and running them as a container. It exposes its API to receive instructions from the dockerd service

docker-runc

runc is the container runtime, which is responsible for creating the namespaces and cgroups required for a container. It then runs the container commands inside those namespaces. runc runtime is implemented as per the OCI specification.

To understand more about container runtimes read this excellent 3 part blog post series.

How Does Docker Work?

We have seen the core components for Docker. But to build, ship, share and run docker containers, there are other components involved.

Let’s look at the key Docker components in a Docker ecosystem.

Docker Components

Docker is composed of the following four components

  1. Docker Daemon (dockerd)
  2. Docker Client
  3. Docker Images
  4. Docker Registries
  5. Docker Containers

Here is the official high-level docker architecture diagram that shows the common Docker workflow.

image source: docs.docker.com

Docker Daemon

Docker has a client-server architecture. Docker Daemon (dockerd) or server is responsible for all the actions that are related to containers.

The daemon receives the commands from the Docker client through CLI or REST API. Docker client can be on the same host as a daemon or it can be present on any other host.

By default, the docker daemon listens to the docker.sock UNIX socket. If you have any use case to access the docker API remotely, you need to expose it over a host port. One such use case us running Docker as Jenkins agents.

If you want to run docker inside docker, you can use the docker.sock from the host machine.

Docker Images

Images are the basic building blocks of Docker. You need an image to run a Docker container. Images contain the OS libraries, dependencies, and tools to run an application.

Images can be prebuilt with application dependencies for creating containers. For example, if you want to run an Nginx web server as a Ubuntu container, you need to create a Docker image with the Nginx binary and all the OS libraries required to run Nginx.

Docker has a concept of Dockerfile that is used for building the image. A Dockerfile basically a text file that contains one command (instructions) per line.

Here is an example of a Dockerfile.

example Dockerfile reference

A docker image is organized in a layered fashion. Every instruction on a Dockerfile is added a layer in an image. The topmost writable layer of the image is a container.

Every image is created from a base image.

For example, if you can use a base image of Ubuntu and create another image with Nginx application in it. A base image can be a parent image or an image built from a parent image. Check out his docker article to know more about it.

You might ask where does this base image (Parent image) come from? there are docker utilities to create the initial parent base image. Basically it takes the required OS libraries and bakes it into a base image. You don’t have to do this because you will get the official base images for all the Linux distros.

The top layer of a image is writable and used by the running container. Other layers in the image are read only.

Docker registry

It is a repository for Docker images. Using the Docker registry, you can share images. It acts as a central repository for the Docker images.

A registry can be public or private. Docker Inc provides a hosted registry service called Docker Hub. It allows you to upload and download images from a central location.

Note: By default, when you install docker, it looks for images from the public Docker hub unless you specify a custom registry in Docker settings.

If your repository is public, all your images can be accessed by other Docker hub users. You can also create a private registry in Docker Hub.

Docker hub acts like git, where you can build your images locally on your laptop, commit it and then can be pushed to the Docker hub.

Tip: When using docker in enterprise networks/project, set up your own docker registries instead of using the public docker hub. All cloud providers have their own container registry services.

Docker Container

It is the execution environment for Docker. Containers are created from images. It is a writable layer of the image.

If you try to relate image layers and a container, here is how it looks for a ubuntu-based image.https

You can package your applications in a container, commit it, and make it a golden image to build more containers from it.

Containers can be started, stopped, committed, and terminated. If you terminate a container without committing it, all the container changes will be lost.

Ideally, containers are treated as immutable objects, and it is not recommended to make changes to a running container. Make changes to a running container only for testing purposes.

Two or more containers can be linked together to form tiered application architecture. However, hosting hight scalable applications with docker has been made easy with the advent of container orchestration tools like kubernetes.

You Might Like: List of Containers Orchestration Tools

Why Containers Are Better Than VMs?

Containers have some key advantages over VMs. Lets take a looks at those.

Resource Utilisation & Cost

  1. You can use VMs to run your applications independently, which means one service per VM. But it can still be underutilized. And resizing a VM is not an easy task for a production application.
  2. Containers, on the other hand, can run with very minimal CPU and memory requirements. Also, you can even run multiple containers inside a VM for application segregation. Plus, resizing a container takes seconds.

Provisioning & Deployment

  1. Provisioning a VM and deploying applications on it might take minutes to hours depending on the workflow involved. Even rollback takes time.
  2. But you can deploy a container in seconds and roll it back in seconds as well.

Drift Management

  1. Drift management in VMs is not easy. You need to have full-fledged automation and process in place to make sure all the environments are similar. Following immutable deployment models avoids drift in VM environments.
  2. When it comes to containers, once the image gets backed, it will be the same in all the environments. For any changes, you need to start making changes in dev env and re-bake the container image.

Recommended Course: Docker Mastery: The Complete Toolset From a Docker Captain

FAQs

What is the difference between containerd & runc?

contianerd is responsible for managing the container and runc is responsible for running the containers (create namespaces, cgroups and run commands inside the container) with the inputs from containerd

What is the difference between the Docker engine & Docker daemon?

Docker engine is composed of the docker daemon, rest interface, and the docker CLI. Docker daemon is the systemd dockerd service responsible for building the docker images and send docker instructions to containerd runtime.

Conclusion

The best feature of Docker is collaboration.

Docker images can be pushed to a repository and can be pulled down to any other host to run containers from that image.

Moreover, Docker hub has thousands of images created by users, and you can pull those images down to your hosts based on your application requirements. Also, it is primarily used in container orchestration tools like kubernetes

If you want to run Docker for production workloads, make sure you follow Docker images’ recommended practices.

You can get started by installing Docker and run the basic operations.

what is docker

SQL For Data Science: A Comprehensive Beginners Guide

SQL For Data Science

In this blog, I am going to talk about the importance of SQL for data science, some statistics, and key concepts that you need to be aware of as a beginner in pursuit of a data scientist career path.

I have also covered some of the key SQL practices that have to be followed when working with data.

You will also learn the fundamentals of SQL and working with data so that one can begin analyzing it for data science purposes

SQL – Data Science Statistics

Data Science has emerged as the most popular field of the 21st century. It is because there is a pressing need to analyze and construct insights from the data. Industries transform raw data into furnished data products.

To do so, it requires several essential tools to churn the raw data. Collectively with Python and R, SQL is now considered to be one of the most demanded skills in Data Science.

As per KDnuggets, SQL is the third skill in demand for data scientist jobs.

importance of SQL skill for data scientist

SQL isn’t going anywhere. Yes, it’s true. SQL is more popular amongst data scientists and data engineers than R or Python. The truth that SQL is a language of choice is incredibly significant.

In the chart below, from recent StackOverflow’s 2020 developer survey, one can see that SQL eclipses both Python and R in popularity.

SQL demand as per developers survey

What’s SQL?

Structured Query Language (SQL) is a database language used to create, retrieve & manipulates the data from a Relational Database Management System (RDBMS) such as MySQL, Oracle, SQL Server, Postgre, etc. that stores the data in relational database tables.

The relational database table consists of columns and rows, where columns describe the stored data’s properties, and rows describe the real data entries.

As the name suggests, SQL comes in to picture when we have structured data, e.g., in the form of tables.

On the other hand, the databases that are not well structured or not relational, and therefore do not use SQL are known as NoSQL databases. Examples of such databases are MongoDB, DynamoDB, Cassandra, etc.

Some interesting facts about SQL that should Know

  • One of the interesting facts about SQL is that it’s made up of descriptive words. In other words, most of the commands used in SQL are reasonably easy to understand as compared to many other computer languages. It makes SQL, as a language, really simple to read and learn. For example, one wants to select a column NAME from the STUDENT table. Then he/she can write the SQL command as
SELECT NAME FROM STUDENT;
  • SQL statement keywords are case sensitive. It means that
SELECT ROLL_NO, NAME FROM STUDENT;

is not similar to,

Select ROLL_NO, NAME From STUDENT;
  • There is an ISO standard for SQL; most of the implementations lightly differ in syntax. So one may encounter queries that work in SQL Server but do not work in MySQL.
  • SQL is a non-procedural language that means we can’t write a complete application using SQL, but what you can do is interact and communicate with data. It makes it relatively simple, but also a clear language.

Why SQL for Data Science?

  • According to research, more than 2.5 quintillion bytes of data are produced every day. Hence for storing such vast amounts of data, it is strictly needed to make use of databases.
  • Industries are now giving more importance to the value of data. Data can be used in various ways to analyze more significant business problems, solve business problems, make predictions, and understand customer needs.
  • While performing manipulations on data, SQL can be accessed directly, and this is one of the main advantages of using SQL because this can expedite the workflow executions.

Before deep dive into SQL, prior knowledge in Relational Model, Some important terminologies related to the relational model, and keys in the relational model are essential for the beginners.

Using SQL for Data Science

So now, the question is, how do you bring SQL together for use in data science?

Every data science project starts with the aim of finding a solution to a problem.

These are a few approaches and standard practices followed when beginning with a new data science problem that requires SQL.

As a beginner, it’s essential to know how to work through a problem from beginning to end.

Here we are going to discuss the key practices that have to be followed before starting to write SQL queries for any data science-related projects.

  1. Understanding Data
  2. Understanding the Business Use Case
  3. Data Profiling
  4. Troubleshooting
  5. Data Formating
  6. Review Process
standard practices for using SQL in data science

Let’s understand each stage in detail.

Data Understanding

So the very first step is data understanding, and it is the most crucial step. It is why you need to spend enough time in modeling diagrams, discussing the associations in the data, because knowing the data is key to writing successful queries.

It’ll be very beneficial to kill the time to know the data as much as possible before interpreting it. It’s essential to know the associations and dependencies among the data.

That drives us to our second step, which is the business understanding.

Business Understanding

As we start to get familiar with the data, what happens, is that we move into questions regarding the business problem that we are striving to solve. 

Usually, one will be moving back and forth to study at the data and move back to a subject matter specialist who knows the core business problem and strives to resolve it. 

This is crucial in being capable of wrapping your head around the problem.

One of the things to be aware of is the unspoken need. There may be a business problem; for example, one wants to predict whether or not a customer is expected to buy the product. 

That looks pretty straight forward and easy. But as one falls into the data more, he/she may start to get questions like which customers? Which products? Some of the unspoken points are specific logical exclusions.

If you can get this and understand a problem, query writing will just become more comfortable. If you genuinely know the data and understand the problem, then writing the queries is like filling in the blanks.

Profiling Data

When we are trying to understand the data, we will start profiling the data. This is where we perform descriptive statistics.

It’s also a significant opportunity to classify any data quality issues before diving into the analysis. It’s a good approach to perform this profiling step before finalizing any of the data that we are extracting.

To catch a problem, we require a plan of what are the exact data components we need.

We require to understand the data we are working on going after and get some of the data from the profiling we have done.

If we are regularly obtaining the data, then it will begin with the select statement. So we have to use select and from.

Start with Select

Remember, you’re always going to start with the SELECT statement. It means that’s the great thing about SQL, it’s consistent in that way.

It is recommended to start simple, particularly if you’re new to the data. Begin with just one table, add more data, add in another table, check the result, and then go back from there.

If you’re using subqueries, remember to always start with the innermost query and then work out and build. That drives to the next point, which is test along the way.

Test & Troubleshoot

Don’t wait to test the query until we have combined various sources. And we have all our calculations done and completed. Think of this as little building blocks.

If we write an estimation of something’s average selling price, look at how many values we are getting back for that calculation from the table and make sure that appears right.

Then combine the result with different tables and then examine that. If we understand the data, we could jump in a little bit faster. But make sure that our order of manipulation is accurate.

It is important because it’s simple to get results back, but taking the correct results back that we require is a little bit harder.

Troubleshooting is necessary always to start short and simple, and slowly start to reconstruct the query to see where things went wrong. It’s essential to start with the basic things first.

Gradually start to develop a back up to understand where things have gone wrong.

Format & Comment

Be sure that when we are writing the query, the next thing to look at is to make sure we are formatting it accurately and commenting correctly.
The clean code says a lot about the coder.

Ensure that it’s easy to read the query, use recommended indentation, comment strategically where we need to do so, etc.

We never know when we will require to revisit the query, or we are going to require to deliver it into someone else, and they need to update to from there.

Just keep the code clean. Format the comments where necessary and strategically.

Review

Now make sure that we have reviewed what we have done till now. A lot of times, what happens is we write the query, we will be using it for our model, and we will be looking at various stats and stuff like that.

Then we need to go back and change that query. Always make sure to review the query to see if anything has changed. Has the data changed? Are business rules different? Do we require to update the date indicators?

And be cautious when you’re going back and using old queries.
These approaches take us over a problem from start to end, and it all begins with the data and problem understanding.

Make sure to spend the time there. Make sure to spend the time thinking about what we are doing before we start writing the queries.

And it’s going to save our time in the long run, then go through and understand the data by profiling it. Make sure to test enough and keep the code clean with proper commentation.

DBA vs Data Scientist

Many use SQL in their jobs. This includes everything from backend developers, QA engineers, system engineers, obviously data scientists, and even data analysts. But the one we want to know about more is the DBA (Database Administrators), and how he/she compares to data scientists.

The DBA is responsible for managing the whole database and supervising it. However, the data scientist is typically a user of that database. The DBA will be responsible for granting permissions to people and deciding who has access to what data. They’re often responsible for managing the tables and creating them. 

DBAs and data scientists are related to each other in such a way that they both use SQL to interpret the data, to query it, and retrieve it. They both write quite complex queries, but the main difference is that the data scientist is the end-user. However, the DBA is the one who manages it, directs it, and maintains the database as a whole.

Relational Model in DBMS

Now, let’s start with the practical basics to understand the underlying concepts of a relations model.

The relational model illustrates how data is stored in relational databases.  The data is stored in the form of relations or tables in a relational database. Suppose a table STUDENT having attributes or characteristics ROLL_NO, NAME, PHONE, and AGE shown in the following table.

STUDENT TABLE

ROLL_NONAMEPHONEAGE
1Asish943835176425
2Raj30
3Sidhu890898786540
4Emraan765890876525

Important Terminologies

  • Relation Schema: A relation schema depicts the name of the relation with its attributes. e.g.; STUDENT (ROLL_NO, NAME, PHONE, and AGE) is a relation schema for STUDENT. If a schema has more than one relation, it is termed Relational Schema.
  • Attribute: Attributes are the characteristics that define a relation. e.g.; ROLL_NONAME
  • Tuple: Every row in the relation is called a tuple. The above relation contains 4 tuples.
  • Column: Column denotes the set of values for a particular attribute. The column ROLL_NO is extracted from the relation STUDENT.
ROLL_NO
1
2
3
4
  • Degree: The number of attributes present in the relation is called the degree of relation. The STUDENT relation defined above has degree 4.
  • Cardinality: The number of tuples in a relation is called cardinality. The STUDENT relation defined above has cardinality 4.
  • NULL Values: The value which is not known or not available(NA) is called NULL value. It is represented by blank space. e.g.; PHONE of STUDENT having ROLL_NO 2 is NULL.

Keys in Relational Model

Let’s consider two tables named as STUDENT and STUDENT_COURSE.

STUDENT TABLE

ROLL_NONAMEPHONEAGE
1Asish943835176425
2Raj30
3Sidhu890898786540
4Emraan765890876525

STUDENT_COURSE TABLE

ROLL_NOCOURSE_IDCOURSE_NAME
1C1ML
2C2SQL
4C1ML
  • Candidate Key: The minimal set of the attribute which can uniquely identify a tuple is known as a candidate key. For Example, ROLL_NO, PHONE in STUDENT relation.
  • Super Key: A set of attributes that can uniquely identify a tuple is called Super Key. For Example, ROLL_NO, (ROLL_NO, NAME), etc.
  • Primary Key: In a relational model of databases, a primary key is a particular choice of a minimal set of attributes that uniquely define a tuple in a relation. There can be more than one candidate key in a relation of which one can be chosen as the primary key. For Example, ROLL_NO, as well as PHONE both, are candidate keys for relation STUDENT but ROLL_NO can be chosen as the primary key e.g only one out of many candidate keys.
  • Alternate Key: The candidate key other than the primary key is known as an alternate key. For Example, ROLL_NO, as well as PHONE both, are candidate keys for relation STUDENT but PHONE will be alternate key as ROLL_NO is a primary key here.
  • Foreign Key: The foreign key is a key used to connect two tables together. The foreign key is a field in one table that leads to the primary key in another table. For Example, ROLL_NO in STUDENT_COURSE is a foreign key to ROLL_NO in STUDENT relation.

Prerequisite:

Make sure that you have already installed any SQL databases IDE to execute the SQL query used in this article. If you don’t, you may also use the online IDE to execute the query. Here are some links for popular SQL database online IDE:

  1. MySQL
  2. SQLite, MariaDB, PostgreSQL, MS SQL
  3. Oracle

Getting Started with SQL

In the 21st century, data is critical. Everyone is collecting it all the time. And those people who can use and communicate with data, those who use it to critically analyze and provide insight out of it to make better decisions, those are going to be the people who shape the real world we exist in.

They are data scientists. But to be a good data scientist, one needs to know how to retrieve and work with data. And to do that, he/she needs to be very well versed in SQL, the standard language for communicating with databases.

Creating Tables

As discussed before, the data scientist isn’t normally the one in charge of handling the whole database, that usually left to the DBA or some type of administrator. However, they may have capabilities to be able to write and create their own tables. So it’s necessary to have a basic knowledge of how this works.

The CREATE TABLE statement is used to create tables in SQL. We know that a table comprises rows and columns. So whenever creating tables we have to give all the data to SQL such as the names of the columns, the data type of data to be stored in columns, constraints names, etc. Let’s take a look at how to use the CREATE TABLE statement to create tables in SQL.

Syntax

CREATE TABLE table_name
(
column1 data_type(size) constraint_name,
column2 data_type(size) constraint_name,
column3 data_type(size) constraint_name,
....
);

table_name: Name of the table to be created.
column1: Name of the first column and so on.
data_type: Type of data that can be stored in the field.
size: Size of the data that can store in the particular column. For example: In a column if you are giving the data_type as varchar and size as 20 then that column can store a string of maximum 20 characters.
constraint_name: Name of the constraint. for example- NOT NULL, UNIQUE, PRIMARY KEY etc.

Example

This below query will create a table named STUDENT with four columns, ROLL_NO, NAME, SUBJECT, and AGE.

CREATE TABLE STUDENT
(
ROLL_NO               int(5)            PRIMARY KEY,
NAME                  varchar(30)       NOT NULL,
SUBJECT               varchar(30),
AGE                   int(2)            NOT NULL                
);

The above query creates a table named STUDENT. The ROLL_NO attribute is of type int and can store an integer number of size 5. Then the two columns NAME and SUBJECT are of type varchar and can store characters. The size 30 in bracket defines that these two attributes can contain a maximum of 30 characters. And the last column AGE os of type int and can store an integer number of size 2.

Note:

  1. An error will be returned if one tries to submit a column with no values. It is significant to know that the NULL value in SQL is different from the zero value. The NULL value represents a missing value, and it has one of three different interpretations:
    • The value unknown (value exists but is not known)
    • Value not available (exists but is purposely withheld)
    • Attribute not applicable (undefined for this tuple)
  2. Primary keys can not be null. That means primary keys must have a value.

Adding/Inserting Data to The Table

The INSERT INTO statement of SQL is used to insert a new row in a table.

Syntax

INSERT INTO table_name VALUES (value1, value2, value3,…);
table_name: name of the table.
value1, value2,.. : value of first column, second column,… for the new record

Example

INSERT INTO STUDENT VALUES(1, "Amiya", "ML", "22");
INSERT INTO STUDENT VALUES(2, "Asish", "DS", "25");
INSERT INTO STUDENT VALUES(3, "Ankit", NULL, "22");
INSERT INTO STUDENT VALUES(4, "Tony", "OS", "30");
INSERT INTO STUDENT VALUES(5, "Raj", "DBMS", "28");

Now the table STUDENT is filled with 5 rows and 4 columns.

Retrieving Data

SELECT is the commonly used statement in SQL. The SELECT Statement in SQL is used to retrieve data from the database. One can retrieve either the whole table or according to some specified rules.

To fetch all the fields from the table STUDENT

Syntax

SELECT * FROM table_name;

Example

SELECT * FROM STUDENT;

Output

ROLL_NONAMESUBJECTAGE
1AmiyaML22
2AsishDS25
3AnkitNULL22
4TonyOS30
5RajDBMS28

Query to fetch the fields ROLL_NO, NAME, AGE from the table Student:

Syntax

SELECT column1, column2 FROM table_name;

Example

SELECT ROLL_NO, NAME, AGE FROM STUDENT;

Output

ROLL_NONAMEAGE
1Amiya22
2Asish25
3Ankit22
4Tony30
5Raj28

Adding Comments to SQL

Adding comments in your code is necessary not only because it makes your queries easy to follow, both for you and for others. It’s also going to help with troubleshooting the code and obtaining the code easier to share overall.

For single-line comments in SQL use “–” (double hyphen) at the beginning of any line. For multi-line comments, use “/* your comment */”.  The line starting with ‘/*’ is considered as a starting point of comment and is terminated when ‘*/’ is encountered.

Example:

-- This is a single line comment 
SELECT * FROM Customers;

/* This is a multi 
line comment */ 
SELECT * FROM Customers;

Filtering, Sorting and Calculating Data with SQL

So far what you’ve been studying with only has been a few thousand records, but several databases have a lot of data in them. Eventually, you might operate with databases with the millions or even tens of millions of records in it. Clearly, you’re never going to need to look at all of that data at once. So SQL gives us various methods to pair down and sort your data so you can quickly get the results you want. So let’ discuss that in this module.

Basics of Filtering with SQL

Filtering is very important because it enables us to narrow the data you want to retrieve. Filtering is also done when we are making an interpretation to get an idea about the data that we require to analyze as part of our model.

Why Filter?

  1. Be precise about the data that we want to retrieve
  2. Decrease the number of records we retrieve
  3. Reduce the strain on the client application
  4. Increase query performance
  5. Governance limitations

WHERE Clause

WHERE keyword is used for retrieving filtered data in a result set. It is used to retrieve data according to particular criteria and can also be used to filter data by matching patterns.

Syntax

SELECT column1,column2 
FROM TableName
WHERE column_name operator value;

column1 , column2: Fields in the table
TableName: Name of the table 
column_name: Name of the field
operator: Operation to be performed for filtering
value: The exact value to get related data in result

List of operators that can be used with where clause:

OperatorDescription
>Greater Than
>=Greater than or Equal to
<Less than
<=Less than or Equal to
=Equal to
<>Not equal to
BETWEENIn an inclusive Range
LIKESearch for a pattern
INTo define multiple likely values for a column

STUDENT TABLE

ROLL_NONAMESUBJECTAGE
1AmiyaML22
2AsishDS25
3AnkitNULL22
4TonyOS30
5RajDBMS28

Example 1: To fetch record of students with age equal to 22

SELECT * 
FROM STUDENT 
WHERE Age=22;

Output:

ROLL_NONAMESUBJECTAGE
1AmiyaML22
3AnkitNULL22

Example 2: To fetch records of students where ROLL_NO is between 2 and 4 (inclusive).

SELECT * 
FROM STUDENT
WHERE ROLL_NO BETWEEN 2 AND 4;

Output:

ROLL_NONAMESUBJECTAGE
2AsishDS25
3AnkitNULL22
4TonyOS30

Example 3: To fetch NAME and SUBJECT of students where Age is 22 or 30.

SELECT NAME, SUBJECT
FROM STUDENT
WHERE Age IN (22,30);

Output:

NAMESUBJECT
AmiyaML
AnkitNULL
TonyOS

Wildcards in SQL

What are wildcards?

  1. Those are special characters used to match parts of a value.
  2. Search patterns made from literal text, wildcard character, or a combination.
  3. Uses LIKE as an operator.
  4. It can only be used with strings.
  5. It cannot be used for non-text datatypes.
  6. It is helpful for data scientists as they explore string variables.

There are four basic wildcard operators:

OperatorsDescription
%Used in substitute for zero or more characters
_Used in substitute of one character
Note: Not supported by DB2
[range_of_characters]Used to retrieve matching set or range of characters defined inside the brackets.
Note: Not work with SQLite
[^range_of_characters] or [!range of characters]Used to fetch a non-matching set or range of characters specified inside the brackets.

Syntax:

SELECT column1,column2 
FROM TableName 
WHERE column 
LIKE wildcard_operator; 

column1 , column2: fields in the table 
TableName: name of table 
column: name of field used for filtering data

STUDENT TABLE

ROLL_NONAMESUBJECTAGE
1AmiyaML22
2AsishDS25
3AnkitNULL22
4TonyOS30
5RajDBMS28

Example 1: To fetch records from Student table with NAME ending with letter ‘A’.

SELECT * 
FROM STUDENT
WHERE NAME LIKE '%A';

Output:

ROLL_NONAMESUBJECTAGE
1AmiyaML22

Example 2: To fetch records from Student table with SUBJECT ending any letter but starting from ‘DB’.

SELECT *
FROM STUDENT
WHERE SUBJECT LIKE 'DB_';

Output:

ROLL_NONAMESUBJECTAGE
5RajDBMS28

Example 3: To fetch records from the Student table with NAME containing letters ‘a’, ‘b’, or ‘c’.

SELECT * 
FROM STUDENT
WHERE NAME
LIKE '%[A-C]%';

Output:

ROLL_NONAMESUBJECTAGE
1AmiyaML22
2AsishDS25
3AnkitNULL22
5RajDBMS28

Example 4: To fetch records from Student table with ADDRESS not containing letters ‘a’, ‘b’, or ‘c’.

SELECT * 
FROM STUDENT
WHERE NAME
LIKE '%[^A-C]%';

Output:

ROLL_NONAMESUBJECTAGE
4TonyOS30

Downsides of Wildcards:

  1. The statements having wildcards will take more time to execute if it is used at the end of search patterns.
  2. Better to use another operator (if possible): =, <. =>, etc.
  3. Placements of wildcards are important.

Sorting with ORDER BY

 To sort data with SQL, one can use the ORDER BY clause. Before going to its syntax first understand why sort data?

Why sort data?

  1. Sorting of data in an appropriate method is very helpful when inspecting data. Otherwise, data will return in such a way which makes the data a bit more complex to interpret.
  2. Updated and deleted data can change this order.
  3. The sequence of retrieved data can not be assumed if the order was not specified.
  4. Sorting data logically helps keep the information you want on top.

ORDER BY statement is used to sort the retrieved data in either ascending or descending according to one or more columns.

  • By default ORDER BY sorts data in ascending order.
  • We can use the keyword DESC to sort data in descending order and the keyword ASC to sort in ascending order.

Syntax:

SELECT *
FROM TableName 
ORDER BY column1 ASC|DESC , column2 ASC|DESC;

TableName: name of the table.
column_name: name of the column by which the data is going to be arranged. 
ASC: to sort the data in ascending order. 
DESC: to sort the data in descending order. 
| : use either ASC or DESC to sort in ascending or descending order

STUDENT TABLE

ROLL_NONAMESUBJECTAGE
1AmiyaML22
2AsishDS25
3AnkitNULL22
4TonyOS30
5RajDBMS28

Example 1: Sort according to a single column: In this following example, let’s retrieve all data from the table STUDENT and sort the data in descending order depending on the column AGE.

SELECT * 
FROM STUDENT
ORDER BY AGE DESC;

Output:

ROLL_NONAMESUBJECTAGE
4TonyOS30
5RajDBMS28
2AsishDS25
1AmiyaML22
3AnkitNULL22

Example 2: Sort according to multiple columns: In this example, let’s retrieve all data from the table STUDENT and then sort the data in ascending order first depending on the column AGE. and then in descending order depending on the column ROLL_NO.

SELECT * 
FROM STUDENT
ORDER BY AGE ASC, ROLL_NO DESC;

Output:

ROLL_NONAMESUBJECTAGE
3AnkitNULL22
1AmiyaML22
2AsishDS25
5RajDBMS28
4TonyOS30

Arithmetic Operations

One can perform various arithmetic operations on the data stored in the tables by using arithmetic operators. Arithmetic Operators are:

OperatorDescription
+Addition
Subtraction
/Division
*Multiplication
%Modulus

STUDENT TABLE

ROLL_NONAMESUBJECTAGE
1AmiyaML22
2AsishDS25
3AnkitNULL22
4TonyOS30
5RajDBMS28

Example 1: Let’s perform addition operation on the data items, items include either a single column or multiple columns. Let’s add 10 with column AGE:

SELECT ROLL_NO, NAME, AGE, AGE + 10
AS "AGE + 10" 
FROM STUDENT;

Output:

ROLL_NONAMEAGEAGE + 10
1Amiya2232
2Asish2535
3Ankit2232
4Tony3040
5Raj2838

Example 2: Let’s perform addition of 2 columns:

SELECT ROLL_NO, NAME, AGE, AGE + ROLL_NO
AS "AGE + ROLL_NO" 
FROM STUDENT;

Output:

ROLL_NONAMEAGEAGE + ROLL_NO
1Amiya2223
2Asish2527
3Ankit2225
4Tony3034
5Raj2833

Similarly you can try other arithmetic operations.

Aggregate Functions

Aggregate functions are applied for all kinds of things, and they are beneficial in finding the highest or lowest values, the total number of records, average value, etc. A lot of times in descriptive statistics, we are getting to know and understand our data. You’re going to use a lot of those different types of aggregate functions. Here are some important aggregate functions:

FUNCTIONDESCRIPTION
AVG( )Averages a column of values
COUNT( )Counts the number of values
MIN( )Finds the minimum value
MAX( )Finds the maximum value
SUM( )Sums the column values

STUDENT TABLE

ROLL_NONAMESUBJECTAGE
1AmiyaML22
2AsishDS25
3AnkitNULL22
4TonyOS30
5RajDBMS28

AVG( ): It returns the average value after calculating from values in a numeric column.

Syntax:

SELECT AVG(column_name) FROM table_name;

Example:

SELECT AVG(AGE) 
AS AVGAGE 
FROM STUDENT; 

Output:

AVGAGE
25.4000

MAX( ): The MAX( ) function returns the maximum value of the selected column.

Syntax:

SELECT MAX(column_name) FROM table_name;

Example:

SELECT MAX(AGE) 
AS MAXAGE 
FROM STUDENT; 

Output:

MAXAGE
30

SUM( ): The SUM( ) function returns the sum of all the values of the selected column.

Syntax:

SELECT SUM(column_name) FROM table_name;

Example:

SELECT SUM(AGE) 
AS SUMAGE 
FROM STUDENT; 

Output:

SUMAGE
127

Similarly you can try other aggregate functions.

Grouping Data with SQL

The GROUP BY statement in SQL is utilized to arrange similar data into groups with the help of some functions e.g if a particular column has the same values in other rows then it will arrange these rows in a group.

Note:

  • GROUP BY clause is applied with the SELECT statement.
  • In the query, the GROUP BY clause is placed after the WHERE clause.
  • In the query, the GROUP BY clause is placed before the ORDER BY clause if used any.

Syntax:

SELECT column1, function_name(column2)
FROM table_name
WHERE condition
GROUP BY column1, column2
ORDER BY column1, column2;

function_name: Name of the function used for example, SUM() , AVG().
table_name: Name of the table.
condition: Condition used.

Let’s consider a different tables in this case.

EMPLOYEE TABLE

SI NONAMESALARY
1Sandy5000
2Karan1000
3Tony3000
4Karan5000

Example: Group By single column: Group By single column means, to place all the rows with same value of only that particular column in one group. Consider the query as shown below:

SELECT NAME, SUM(SALARY) 
FROM Employee 
GROUP BY NAME;

Output:

NAMESALARY
Karan6000
Sandy5000
Tony3000

HAVING Clause

You know that WHERE clause is used to place conditions on columns but what if you want to place conditions on groups?

This is where the HAVING clause comes into use. You can use the HAVING clause to place conditions to decide which group will be the part of the final result-set. Also, you can not use the aggregate functions like SUM(), COUNT(), etc. with WHERE clause. So you have to use the HAVING clause if you want to use any of these functions in the conditions.

Syntax:

SELECT column1, function_name(column2)
FROM table_name
WHERE condition
GROUP BY column1, column2
HAVING condition
ORDER BY column1, column2;

function_name: Name of the function used for example, SUM() , AVG().
table_name: Name of the table.
condition: Condition used.

Example:

SELECT NAME, SUM(SALARY) FROM Employee 
GROUP BY NAME
HAVING SUM(SALARY) > 5000; 

Output:

NAMESALARY
Karan6000

SQL for Various Data Science Languages

Because of the popularity and versatility of SQL, SQL is also used for many big data applications. Below are a brief description and few resources for how SQL is used with common big data and data science languages.

SQL for R

R language is developed by statisticians to help other statisticians and developers faster and efficiently with the data.

It is an open-source programming language and generally comes with the Command-line interface. It is available across extensively used platforms like Windows, Linux, and macOS. 

R is an essential tool for Data Science. It is highly popular and is the first choice of many statisticians and data scientists.

For those studying R and who may be well-versed in SQL, the sqldf package provides a mechanism to handle R data frames using SQL. 

Also, for experienced R programmers, sqldf can be a useful tool for data manipulation.

Resources

  1. SQLDF Package
  2. Documentation
  3. Examples

SQL for Python

Python is an open-source, interpreted, high-level language. It supports object-oriented programming.

It is one of the best languages used by data scientists for various data science projects/applications. Python provides a lot of functionalities to deal with mathematics and statistics.

 It provides excellent libraries to deals with data science applications. python-sql is the library to write SQL queries in a pythonic way.

Resources

  1. Python-SQL Package Documentation

SQL with Hadoop

Apache Hive is a data warehouse tool that implements the SQL-like interface between the user and the Hadoop distributed file system (HDFS)

It is a software project that gives data query and analysis. It provides reading, writing, and managing large-scale datasets stored in distributed storage and queried by SQL syntax. 

It is not developed for Online Transaction Processing (OLTP) workloads.

Resources

  1. Hive Overview
  2. Documentation

SQL for Spark

Apache Spark is an open-source distributed general-purpose framework. 

Spark is structured around Spark Core, the engine that runs the scheduling and RDD abstraction, and correlates Spark to the right filesystem (RDBMS, S3, or Elasticsearch). 

Numerous libraries work on the top of Spark Core, including Spark SQL, which permits us to execute SQL-like commands on distributed data sets, GraphX for graph queries, and streaming that allows for the input of continuously streaming log data.

Resources

  1. Overview
  2. Documentation

FAQs

What is the best way to learn SQL for data science?

There are approximately thousands of SQL courses online, but most don’t teach you to use SQL in the real world. The problem is that real-world SQL doesn’t look like as they taught you in courses. However, you can follow these references to learn SQL for data science.

What is the role of SQL in data science?

SQL code is more effective, produces easily reproducible scripts, and keeps you closer to the data. It is very beneficial when working with large data sets in a data science project.

What elements of SQL do data scientists need to know?

The aspiring Data Scientists must have the following required SQL skills
1. Knowledge of Relational Database Model
2. Knowledge of the SQL commands
4. SQL indexing
5. SQL Joins
6. Primary & Foreign Key
7. SubQuery
8. Creating Tables and Retrieving data from the tables

What types of SQL databases are best for data science?

In general, out of all the relational databases, MySQL remains the most popular database for organizations. However, PostgreSQL is also a choice for some implementations. When it comes to the cloud, there are managed database solutions optimized for data science workloads.

Which SQL certification is best for a data analyst/scientist?

There are no specific SQL certifications. However, you can get one from online course portals like Coursera, Udacity, Udemy, etc

SQL For Data Science

Pluralsight Current Offers & Review

One Month Pluralsight Free Subscription

In this post, I have added the latest pluralsight coupons and offers to get discounts on Pluralsight subscription. I have also added some freebies offered by Pluralsight.

If you want to learn DevOps tools, programming, Automation, Virtualization, Cloud Computing, and whatnot, you can find every on-demand course on Pluralsight.

I started learning Ansible from Pluralsight. I had a good experience as the course content was to the point and was able to grasp all the explained concepts.

Pluralsight Coupons [Free Month to 50%]

From my personal experience, I was never able to complete any course in a month’s time. There could be several reasons, starting from busy work schedules to personal commitments.

If you are someone like me who needs quarterly or annual subscriptions, here are some Pluralsight offers that you can use.

10 Days Free Account

If you a new pluralsight user, start with the 10 days free trial. If you like the courses on the platform, you can continue with the subscription or cancel anytime during the trial period.

Get One Month Free Pluralsight Discount

If you want to learn specific tools or technology from Pluralsight, you can buy their one-month subscription and get another month for completely free.

[50% Off] On Monthly Subscription

If you are new user get flat 50% off on Pluralsight first month subscription using “Stay Home, Skill Up” Offer

[33% Off] On Annual Subscription

If you are continuous learning, you definitely need a yearly subscription from Pluralsight. Currently, pluralsight is running a promotion to get a 33% discount on their annual plan. This is a limited time offer

Note: Occasionally there will be a 33% off on Pluralsight discount. We will update the link when the promo is available.

Pluralsight Free Skills For All Account

Get free access to assessments, expert-led courses, and other benefits. The courses are limited in this account. However, the courses get rotated every week so that you can get access to other courses for free.

Get Weekly Free Courses From Pluralsight

Pluralsight has a weekly free course program where you will get free course updates on your email. It is a great offer from pluralsight. To get this, you need to signup for their weekly free course subscription list.

Pluralsight 100% Off Subscription

Pluralsight has a VIP program which offers a one-year free subscription.

To qualify for this program, you have to be a technology VIP. You need to provide the valid URL which showcases that you are technology VIP.

Please check out the VIP program page to check the eligibility criteria.

You Might Like: Linux Foundation Offers

Gift Pluralsight Subscription

You can gift your colleagues and friends with a Pluralsight subscription.

You can gift a personal monthly plan to a premium yearly plan. It includes everything that comes with a standard subscription.

Checkout Pluralsight gift from here.

How to Redeem Pluralsight Coupon

If you have an exclusive offer code from Pluralsight, use the steps below to redeem the coupon.

  1. Head over to the Pluralsight offer redemption page.
  2. Signup with all the details.

Moreover, Pluralsight has both Android and iOS application, using which you can access the courses from anywhere. All the course screencasts are in HD quality (Both audio and video). And if you wish to watch the videos offline, you can make use of their desktop player which has offline functionality.

Get the Discount Updates

Learning is a continuous requirement for me and I make use of discounts for annual subscriptions. So I keep updating this page whenever a new Pluralsight discount offer is available.

You can bookmark this page, or get updates through our email list for the latest Pluralsight offer.

Pluralsight Review

Now lets take a look at what Pluralsight has to offer for users.

Pluralsight Course Offerings

Pluralsight offers courses to IT in the following key areas.

  1. Cloud Computing
  2. Machine Learning / AI
  3. Software Development [Courses on popular programming language]
  4. IT Ops [Courses on Infrastructure, Virtualisation, and Cloud]
  5. Data Professional [Course on data analytics, engineering, and science ]
  6. Information & Cyber Security [Curses on IT security Certifications]
  7. Data Science Courses
Pluralsight course categories

Pluralsight Vs Udemy

Unlike Udemy, when you subscribe to Pluralsight, you will get access to all the courses on the platform. Whereas in Udemy, you need to purchase individual courses. This option is very good if you want to learn from just one course. You can get a Udemy course as low as $3.95

In Pluralsight, you can learn from multiple courses from beginner to advanced levels with one subscription.

Pluralsight Vs Linkedin Learning

Both Pluralsight and Linkedin learning offer a single subscription model. Linked learning was previously known as Lynda.com.

Here the comparison is mostly on the course availability than quality. Both platforms offer quality courses as the authors are screen for their technology background before onboarding them to the platform for teaching.

You can try both platforms using their free trails. Linkedin offers one-month free access to all their courses.

During the free trial program, you can decide the platform which cater to your needs.

FAQ’s

How do I get my free subscription to Pluralsight?

To get free access, you have to a technology VIP. If you are a new user, you can get 10 days free account. Also, you can get a one-month free account if you signup for Microsoft visual studio essentials.

Is Pluralsight worth subscription?

It depends on what you want to learn. Pluralsight focuses on professionals who want to upskill their career or to learn new technology. Before paying for the subscription, signup for the 10 days free trial and check for yourself it is meeting your requirements.

What are the best courses on Pluralsight?

Most of the courses on Pluralsight are focussed on IT professionals. Cloud computing, DevOps, programming comes under the best course category for pluralsight.

Does Pluralsight offer free coupons?

Pluralsight does not offer any free coupons for the subscription. You will get discounts on the standard plans. The discounts could range from a free month subscription to one-year free access for technology VIPs

Pluralsight Latest Updates

Pluralsight launched Tour de Tech, a three weeks tech program to speed up IT skills. This program has stages of challenges with big prizes.

One Month Pluralsight Free Subscription

Kubernetes Certification Coupon: 15% Off + $100 Off On Kubernetes Course bundle

Kubernetes Certification Coupon

The cloud-native foundation has two Kubernetes certifications. It is one of the most preferred certifications for organizations due to the wide adoption of Kubernetes.

  1. Certified Kubernetes Administrator (CKA) Program
  2. Certified Kubernetes Application Developer (CKAD) Program
  3. Certified Kubernetes Security Specialist (CKS) Program

Kubernetes Certification Coupon and Training

For DevOpsCube readers, Linux Foundation has offered a 15% Discount on all its kubernetes training and certification programs.

CKA Certification Voucher code

Get 15% Off ($45 discount) on the Certified Kubernetes Administrator exam and training Courses.

Note: You will get an extra $100 if you opt for the CKA course bundle along with a $45 off.

Coupon: Use code DCUBEOFFER

CKAD Certification Coupon

Get 15% Off ($45 discount) on the Certified Application Developer exam and training Courses.

Coupon: Use code DCUBEOFFER

CKS Voucher Code

Get 15% Off ($45 discount) on the Certified Kubernetes Security Specialist exam and training Courses. You need to pass the CKA exam to appear for CKS Exam.

Coupon: Use code DCUBEOFFER

Cloud Developer Career Track 75%

75% Off voucher on Cloud Developer Career Track. This is a limited time offer.

This track includes course tracks and CKAD certification.

CKA With Cloud Engineer Bootcamp

If you opt for cloud engineer Bootcamp, you can save $300 and get certified on both Linux & CKA, plus access to DevOps, Docker, Kubernetes, and SRE courses.

The following offers are available for Kubernetes certification and training.

  1. Certified Kubernetes Administrator (CKA) Certification + Training Bundle
  2. Certified Kubernetes Application Developer (CKAD) + Training Bundle

With a training bundle, you will get $100 off and an additional 16% off using Kubernetes certification coupon code DCUBEOFFER ( Note: 20% & 40% Off Sitewide offer for a limited time. Check Linux Foundation Coupons

This offer is not limited to Kubernetes certification and training. You can use it for any subscription in Linux Foundation.

Kubernetes Certification Curriculum Changes [2020 Updates]

The Linux Foundation announced the new curriculum for the CKA, which will be valid from the 1st of September 2020 (CKA20).

Here are some key changes for the new curriculam

  1. Exam duration has been reduced to 2 hrs from 3 hrs.
  2. Kubernetes v1.19 is the default version for the exam.

From the exam perspective, the focus shifts more on troubleshooting now, detailed changed below

  1. Cluster Architecture, Installation & Configuration – 25%
  2. Workloads & Scheduling – 15%
  3. Services & Networking – 20%
  4. Storage – 10%
  5. Troubleshooting – 30%

About Kubernetes Certification (CKA, CKAD & CKS)

Kubernetes certification is to test your Kubernetes skills through a live practical exam. This exam does not have any objective type questions like other certifications.

[Latest Update] Linux Foundation has announced the new certification program called Certified Kubernetes Security Specialist (CKS).

CKS certification registration and the course will be made available in November 2020. You must be CKA certified to take the CKS exam. This exam is more focussed on Kubernetes security, cluster hardening, build and deploy.

Exam Overview:

  1. It is an online-proctored exam
  2. Full emphasis is on practice than theoretical questions.
  3. You will have to complete 24 practical tasks in 2 hrs.
  4. You will have a free retake and 3 years of validity for the certification.
  5. Once you register for the exam, it will have 12 months’ validity.
  6.  You cannot cheat on the exam. However, the usage of kubernetes.io docs is permitted.

Skills required for CKA:

  1. Application Lifecycle Management – 8%
  2. Installation, Configuration & Validation – 12%
  3. Core Concepts – 19%
  4. Networking – 11%
  5. Scheduling – 5%
  6. Security – 12%
  7. Cluster Maintenance – 11%
  8. Logging / Monitoring – 5%
  9. Storage – 7%
  10. Troubleshooting – 10%

Skills Required for CKAD:

  1. Core Concepts – 13%
  2. Configuration – 18%
  3. Multi-Container Pods – 10%
  4. Observability – 18%
  5. Pod Design – 20%
  6. Services & Networking – 13%
  7. State Persistence – 8%

Skills Required for CKS:

  1. Kubernetes Cluster Setup – 10%
  2. Kubernetes Cluster Hardening – 15%
  3. System Hardening – 15%
  4. Minimize Microservice Vulnerabilities – 20%
  5. Supply Chain Security – 20%
  6. Monitoring, Logging and Runtime Security – 20%

Kubernetes Certification Tips and Resources

Here are a few resources which have enough tips and tricks to crack the Kubernetes certification.

  1. Kubernetes Certification Tips from a CKA Certified administrator
  2. The Complete Certified Kubernetes Administrator (CKA) Exam Study Guide
  3. Taking the Certified Kubernetes Administrator Exam
  4. Nail Your Kubernetes Admin Exam on the First Try
  5. Kubernetes Tutorials: Best Resources To Get Started With Kubernetes
Kubernetes Certification Coupon

Linux Foundation Coupon: 65% Off for DevOps Enthusiasts

Linux Foundation Coupon

The Linux Foundation is the largest open-source non-profit organization. It offers support to open source initiatives and projects. Its main focus in on the shared technology.

As you know, open-source projects have seen wide adoption in recent years and certification has become one important aspect to show your skill sets. Linux Foundation is doing a great job in terms of training and certification programs on open source technologies. You can check out all Linux Foundation offerings from here.

Linux Foundation Coupon

Linux foundation runs several offers during each year for developers and sysadmins. The following are the latest offers from the Linux foundation exclusive for tech enthusiasts.

Currently Linux Foundation is running an exclusive 15% off sitewide sale. Use the link and coupon given below.

Exclusive Offer: Get 15% + $100 discount on All Linux Foundation training courses and certifications Course Bundles using coupon DCUBEOFFER

Check out other Linux foundation special offers. These offers will be applicable only during special sale days.

  1. 60% Off Certification Bundle – Black Friday Offer
  2. 40% Off on Developer Bundle + Free DevOps Course
  3. 20% Off Sitewide Training and Certifications using Coupon TECHIES20

Cloud Engineer Bootcamp [Save $1300]

Linux Foundation has released a new Cloud Engineer Bootcamp program, which spans 6 months. This Bootcamp will prepare any beginner with the required skills in the latest cloud computing technologies.

Note: Use coupon DCUBEOFFER to get an extra 15% discount on the cloud engineer Bootcamp program.

Here are the key benefits of the cloud engineer Bootcamp program.

  1. Interactive live sessions with instructors
  2. You will learn from the cloud-native foundation host instructors.
  3. Well recognized cloud-native certifications.

The six months Cloud Engineer Bootcamp has the following curriculum.

  1. Essentials of Linux System Administration (LFS201)
  2. Linux Networking and Administration (LFS211)
  3. Linux Foundation Certified System Administrator exam (LFCS certification)
  4. Containers Fundamentals (LFS253)
  5. DevOps and SRE Fundamentals (LFS261)
  6. Kubernetes Fundamentals (LFS258)
  7. Certified Kubernetes Administrator Exam (CKA Certification)

By the end of the Bootcamp, you will be certified on both Linux & Kubernetes (CKA)

Advanced Cloud Engineer Bootcamp [Save $1300]

Linux Foundation has an advanced cloud engineer Bootcamp program focussing more on container technology.

This Bootcamp offers end to end training on the container fundamentals, logging, monitoring, deployment, service mesh, and much more. The associated training course will cover tools like Prometheus, fluent, helm, etc.

With advanced cloud engineer Bootcamp, you can save $1300 from the list price of $2300. You can save an additional 15% with the site-wide promo code.

30% Off NODEJS/OPENJS training and certification [JSNAD]

Use coupon NODEJS30 to get a 30% discount on all Linux foundation nodeJS certification exams and training programs. This offer is valid until December 2020

The following are the NodejS programs that are eligible for this discount.

  1. Node.js Application Development (LFW211) + JSNAD Exam Bundle
  2. OpenJS Node.js Application Developer (JSNAD)
  3. OpenJS Node.js Services Developer (JSNSD)

Linux Foundation Black Friday/Cyber Monday Offer [2020]

Get all the certification bundle for $189 + Extra 15% Off using DCUBEOFFER coupon

[60% Savings] Developer Course Bundle Offer

The developer bundle offers free DevOps courses with each chosen course. You will get the bundle for $179 which is of worth $498

Linux Foundation Course Offerings

Linux Foundation offers training in the following key domains.

  1. Linux
  2. Embedded Linux
  3. Open Source Development
  4. Kubernetes
  5. NodeJs
  6. Cloud Foundry
  7. Docker
  8. Hadoop
  9. Hyperledger
  10. ONAP
  11. Openstack
  12. OPNFV

Featured Linux Foundation Courses

The following are the featured Linux Foundation courses.

1. Kubernetes Administration (LFS458): This course covers the building and administration of kubernetes clusters.

2. DevOps Fundamentals: Implementing Continuous Delivery (LFS261): This course talks about the process and patterns involved in building a robust CI/CD solution. It is focussed on DevOps and SRE for aspiring DevOps enthusiasts.

3. Linux System Administration (LFS301): This course is focussed on professional Linux system administration.

4. Linux Networking and Administration (LFS211): This course is structured to train people to become Linux System Engineers.

5. Service Mesh Fundamentals (LFS243): Learn service mesh concepts using envoy proxy and Service Mesh Interface (SMI)

Featured Linux Foundation Certifications

You can take up the Linux Foundation certification exam from anywhere in the world. The following are the key certification programs offered by the Linux Foundation. You can use our exclusive Linux Foundation Coupons to get discounts on all the certification registration.

  1. Linux Foundation Certified SysAdmin (LFCS)
  2. Certified Kubernetes Administrator (CKA)
  3. Certified Kubernetes Application Developer (CKAD)
  4. Certified Kubernetes Security Specialist Exam (CKS Certification)
  5. Certified Kubernetes Security Specialist (CKS)
  6. Cloud Foundry Certified Developer (CFCD)
  7. Certified Hyperledger Fabric Administrator (CHFA)
  8. OpenJS Node.js Application Developer (JSNAD)

Frequently Asked Questions

Does Linux Foundation offer Coupons for certification and training?

Yes. You can always get 15% off on all training and certification programs using coupon code DCUBEOFFER. Also, you can make use of occasional promotional offers from Linux foundation to get up to 65% discount on course bundles

How long Linux foundation Discount is valid?

15% discount is valid through out the year. Exclusive promotions are valid for a period of seven days to one month.

How to redeem the discount code?

In the checkout page, you have the option to apply coupon. Once you apply the coupon, you can see the discounted price instantly.

Is there a discount on the Cloud engineer Bootcamp program?

Yes. Linux Foundation has a basic and advanced cloud engineer Bootcamp program. You can save $1300 from the list price. Also, get additional 15% discount using coupon code DCUBEOFFER

Linux Foundation Coupon