Python Numpy Tutorial For Beginners With Examples

Python numpy tutorial

This Python Numpy tutorial for beginners talks about Numpy basic concepts, practical examples, and real-world Numpy use cases related to machine learning and data science

What is NumPy?

NumPy in python is a general-purpose array-processing package. It stands for Numerical Python. NumPy helps to create arrays (multidimensional arrays), with the help of bindings of C++. Therefore, it is quite fast. There are in-built functions of NumPy as well. It is the fundamental package for scientific computing with Python.

The NumPy library also contains a multidimensional array and matrix data structures. It provides ndarray, a homogeneous n-dimensional array object, with methods to efficiently operate on it. 

It is one of the very important libraries used in the field of Data Science & Machine Learning.

Here is an interesting search trend for NumPy for the last five years.

NumPy trends

Why do we need NumPy?

Does a question arise that why do we need a NumPy array when we have python lists?

The answer is we can perform operations on all the elements of a NumPy array at once, which are not possible with python lists.

For example, we can’t multiply two lists directly we will have to do it element-wise. This is where the role of NumPy comes into play.

Example:

list1 = [2, 4, 6, 7, 8]
list2 = [3, 4, 6, 1, 5]

print(list1*list2)

Output:

TypeError: can't multiply sequence by non-int of type 'list'

Where the same thing can be done easily with NumPy arrays. Here is how NumPy works.

Numpy Array Example:

import numpy as np

list1 = [2, 4, 6, 7, 8]
list2 = [3, 4, 6, 1, 5]

arr1 = np.array(list1)
arr2 = np.array(list2)

print(arr1*arr2)

Output:

[ 6 16 36 7 40]

Python Numpy Tutorial

In this section, we will start with the installation of the key concepts involved in numPy. Also, we will go through some of the practical examples of NumPy library.

Installing NumPy In Python

Before you use NumPy you must install the NumPy library as a prerequisite. It can be installed with conda, with pip, or with a package manager.

If you use conda, you can install it with:

conda install numpy

If you use pip, you can install it with:

pip install numpy

Arrays in NumPy

Array in NumPy is a table of elements, all of the same type, indexed by a tuple of positive integers. In NumPy, the number of dimensions of the array is called the rank of the array. A tuple of integers giving the size of the array along each dimension is known as the shape of the array. An array class in NumPy is called as ndarray.

Example

  1. First of all import the numpy library
import numpy as np
  1. Creating an array object using np.array()
array = np.array([
    [1, 2, 3, 5, 6],
    [2, 1, 5, 6, 7]
])
  1. Printing the array dimension using array.ndim
print("No. of dimensions of the array: ", array.ndim)

Output:

No. of dimensions of the array: 2
  1. Printing the shape of the array using array.shape
print("Shape of the array: ", array.shape)

Output:

Shape of the array: (2, 5)
  1. Printing the size of the array using array.size. The size of the array means nothing but the total number of elements in the given array.
print("Size of the array: ", array.size)

Output:

Size of the array: 10

Creating a NumPy Array

Arrays in NumPy can be created in multiple ways, with various number of Ranks, defining the size of the Array. Arrays can also be created with the use of various data types such as lists, tuples, etc.

Example:

  1. First of all import the numpy library
import numpy as np
  1. Creating a rank 1 array by passing one python list
list = [1, 2, 3, 5, 6]
array = np.array(list)
print(array)

Output:

[1 2 3 5 6]
  1. Creating a rank 2 array by passing two python lists
list1 = [1, 2, 3, 5, 6]
list2 = [3, 1, 4, 5, 1]
array = np.array(
    [list1, list2]
)
print(array)

Output:

[[1 2 3 5 6]
[3 1 4 5 1]]
  1. Creating array by passing a python tuple
tuple = (1, 2, 3, 5, 6)
array = np.array(tuple)
print(array)

Output:

[1 2 3 5 6]
  1. Creating a 3×4 array with all zeros using np.zeros( )
array = np.zeros((3, 4))
print(array)

Output:

[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
  1. Creating a constant value array of complex type using np.full( )
array = np.full((3, 3), 6, dtype='complex')
print(array)

Output:

[[6.+0.j 6.+0.j 6.+0.j]
[6.+0.j 6.+0.j 6.+0.j]
[6.+0.j 6.+0.j 6.+0.j]]

NumPy Array Indexing

Indexing can be done in NumPy by using an array as an index. In case of slice, a view of the array is returned but in index array a copy of the original array is returned. Note that the first element is indexed by 0 second first by 1 and so on, whereas the last element is indexed by -1 second last by -2 and so on.

Example:

  1. Creating a rank 1 array by passing one python list
list = [1, 2, 3, 5, 6]
array = np.array(list)
print(array)

Output:

[1 2 3 5 6]
  1. Accessing elements and creating new array by passing indices
newArray = array[
    np.array([2, 0, 1, 4])
]
print(newArray)

Output:

[3 1 2 6]
  1. Index values can be negative
newArray = array[
    np.array([2, 0, -1, -4])
]
print(newArray)

Output:

[3 1 6 2]
  1. Basic slicing occurs when an object is a slice object that is of the form [start: stop: step]. Note that stop position is not included in slicing.
newArray = array[1:-2:1]
print(newArray)

Output:

[2 3]

Basic slicing and indexing in a multidimensional array

Slicing and indexing in a multidimensional array are a little bit tricky compared to slicing and indexing in a one-dimensional array.

import numpy as np

array = np.array([
    [2, 4, 5, 6],
    [3, 1, 6, 9],
    [4, 5, 1, 9],
    [2, 9, 1, 7]
])
print(array)

# Slicing and indexing in 4x4 array
# Print first two rows and first two columns
print("\n", array[0:2, 0:2])

# Print all rows and last two columns
print("\n", array[:, 2:4])

# Print all column but middle two rows
print("\n", array[1:3, :])

Output:

[[2 4 5 6]
[3 1 6 9]
[4 5 1 9]
[2 9 1 7]]

[[2 4]
[3 1]]

[[5 6]
[6 9]
[1 9]
[1 7]]

[[3 1 6 9]
[4 5 1 9]]

NumPy Operations on Array

In NumPy, arrays allow various operations that can be performed on a particular array or a combination of Arrays. These operations may include some basic Mathematical operations as well as Unary and Binary operations.

Example:

  1. Creating two different two dimensional arrays
array1 = np.array([
    [2, 4, 5],
    [3, 1, 6]
])

array2 = np.array([
    [1, 5, 9],
    [10, 32, 78]
])
  1. Adding 5 to every element in array1
newArray = array1 + 5
print(newArray)

Output:

[[ 7 9 10]
[ 8 6 11]]
  1. Adding 5 to every element in array1
newArray = array1 + 5
print(newArray)

Output:

[[ 7 9 10]
[ 8 6 11]]
  1. Multiplying 2 to every element in array2
newArray = array2 * 2
print(newArray)

Output:

[[ 2 10 18]
[ 20 64 156]]
  1. Sum of every element in array2
newArray = array2.sum()
print(newArray)

Output:

135
  1. Adding two arrays
newArray = array1 + array2
print(newArray)

Output:

[[ 3 9 14]
[13 33 84]]

String Operations using NumPy

This module is used to perform vectorized string operations for arrays of dtype numpy.string_ or numpy.unicode_.

  1. numpy.upper( )

Returns the uppercased string from the given string. It converts all lowercase characters to uppercase. If no lowercase characters exist, it returns the original string.

Example:

import numpy as np

string = "devopscube"
print(np.char.upper(string))

Output:

DEVOPSCUBE

Similary one can use numpy.lower( ) to convert all uppercase characters to lowercase.

  1. numpy.split( )

Returns a list of strings after breaking the given string by the specified separator.

Example:

import numpy as np

string = "devopscube.com"
print(np.char.split(string, sep='.'))

Output:

['devopscube', 'com']
  1. numpy.title( )

It is used to convert the first character in each word to Uppercase and remaining characters to Lowercase in the string and returns a new string.

Example:

import numpy as np

string = "devopsCube is An amaZing pOrtal"
print(np.char.title(string))

Output:

Devopscube Is An Amazing Portal
  1. numpy.equal( )

This function checks for string1 == string2 elementwise and return a boolean value true or false.

Example:

import numpy as np

string1 = "Devopscube"
string2 = "Devopscube.com"
print(np.char.equal(string1, string2))

Output:

False

To know more string functions in NumPy refer String operations.

Mathematical Functions in NumPy

NumPy contains a large number of various mathematical operations. NumPy provides standard trigonometric functions, functions for arithmetic operations, handling complex numbers, etc.

  1. numpy.sin( )

This mathematical function helps the user to calculate trigonometric sine for given values.

Example:

import numpy as np

x = 0
print(np.sin(x))

x = np.pi / 2
print(np.sin(x))

x = np.pi / 4
print(np.sin(x))

Output:

0.0
1.0
0.7071067811865475

Similarly one can use numpy.cos( ), numpy.tan( ), numpy.arcsin( ), numpy.arccos( ), numpy.arctan( ) to calculate trignometric cosine (cos), tangent (tan), cosecant (csc), secant (sec), and cotangent (cot) respectively for given values.

  1. numpy.round_( )

This mathematical function round an array to the given number of decimals.

Example:

import numpy as np

arr = [.33345, .1234, 1.456789]
roundOffValues = np.round_(arr, decimals=3)
print(roundOffValues)

Output:

[0.333 0.123 1.457]
  1. numpy.log( )

This mathematical function helps users to calculate Natural logarithm of all elements in the input array.

Example:

import numpy as np

arr = [1, 3, 50]
logValues = np.log(arr)
print(logValues)

Output:

[0. 1.09861229 3.91202301]
  1. numpy.exp( )

This mathematical function helps users to calculate the exponential of all elements in the input array.

Example:

import numpy as np

arr = [1, 3, 50]
expValues = np.exp(arr)
print(expValues)

Output:

[2.71828183e+00 2.00855369e+01 5.18470553e+21]

To know more mathematical functions in NumPy refer Mathematical functions.

NumPy Use Cases in Data Science & Machine Learning

NumPy is a very popular Python library for large multi-dimensional array and matrix processing. With the help of a large collection of high-level mathematical functions it is very useful for fundamental scientific computations in Machine Learning.

It is particularly useful for,

  1. Linear Algebra
  2. Fourier Transform
  3. Random Number Generations

High-end libraries like TensorFlow uses NumPy internally for manipulation of Tensors.

NumPy Linear Algebra Examples

Lots of ML concepts are tied up with linear algebra. It helps in

  1. To understand PCA(Principal Component Analysis),
  2. To build better ML algorithms from scratch,
  3. For processing Graphics in ML,
  4. It helps to understand Matrix factorization.

In fact, it could be said that ML completely uses matrix operations.

The Linear Algebra module of NumPy offers various methods to apply linear algebra on any NumPy array. One can find:

  1. Rank, determinant, transpose, trace, inverse, etc. of an array.
  2. Eigenvalues and eigenvectors of the given matrices
  3. The dot product of two scalar values, as well as vector values.
  4. Solve a linear matrix equation and much more!

Lets looks at some NumPy sample exercises

1. Find rank, determinant, transpose, trace, inverse, etc. of an array using Numpy

Example:

  1. Creating a 3×3 NumPy array
array = np.array([
    [6, 1, 1],
    [4, -2, 5],
    [2, 8, 7]
])
  1. Calculating rank of array
rank = np.linalg.matrix_rank(array)
print(rank)

Output:

3
  1. Calculating determinant of array
determinant = np.linalg.det(array)
print(determinant)

Output:

-306.0
  1. Calculating trace of array
trace = np.trace(array)
print(trace)

Output:

11
  1. Calculating transpose of array
transpose = np.transpose(array)
print(transpose)

Output:

[[ 6 4 2]
[ 1 -2 8]
[ 1 5 7]]
  1. Calculating inverse of array
inverse = np.linalg.inv(array)
print(inverse)

Output:

[[ 0.17647059 -0.00326797 -0.02287582]
[ 0.05882353 -0.13071895 0.08496732]
[-0.11764706 0.1503268 0.05228758]]

2. Find eigenvalues and eigenvectors of the given matrices using NumPy

import numpy as np

array = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

eigenVal, eigenVec = np.linalg.eig(array)
print(eigenVal)
print(eigenVec)

Output:

[ 1.61168440e+01 -1.11684397e+00 -1.30367773e-15]
[[-0.23197069 -0.78583024 0.40824829]
[-0.52532209 -0.08675134 -0.81649658]
[-0.8186735 0.61232756 0.40824829]]

3. Find the dot product of two scalar values and vector values using NumPy

import numpy as np

scalarProduct = np.dot(6, 9)
print("Dot Product of scalar values  : ", scalarProduct)

vector_a = 4 + 3j
vector_b = 2 + 6j

vectorProduct = np.dot(vector_a, vector_b)
print("Dot Product of vector values  : ", vectorProduct)

Output:

Dot Product of scalar values : 54
Dot Product of vector values : (-10+30j)

4. Solve a linear matrix equation using Numpy

import numpy as np

A = np.array([
    [1, 3],
    [2, 4]
])

b = np.array([
    [7],
    [10]
])

x = np.linalg.solve(A, b)
print(x)

Output:

[[1.]
[2.]]

NumPy Fourier Transform Examples

In mathematics, a Fourier transform (FT) is a mathematical transform that decomposes a function (often a function of time, or a signal) into its constituent frequencies. Some of the important applications of the FT include:

  1. Fast large-integer and polynomial multiplication,
  2. Efficient matrix-vector multiplication for Toeplitz, circulant and other structured matrices,
  3. Filtering algorithms,
  4. Fast algorithms for discrete cosine or sine transform (e.g. Fast Discrete Cosine Transform used for JPEG and MPEG/MP3 encoding and decoding),
  5. Solving difference equations.

Example:

  1. Using np.fft( ), get the 1D  Fourier Transform
import numpy as np

A = np.array([2, 4, 6, 8, 9])
result = np.fft.fft(A)
print(result)

Output:

[29. +0.j -5.30901699+5.93085309j -4.19098301+1.03681323j
-4.19098301-1.03681323j -5.30901699-5.93085309j]
  1. Using np.fft2( ), get the 2D  Fourier Transform
import numpy as np

A = np.array([
    [2, 4, 6, 8, 9],
    [3, 1, 6, -2, -4]
])
result = np.fft.fft2(A)
print(result)

Output:

[[ 33. +0.j -6.47213595 -3.52671151j
2.47213595 +5.7063391j 2.47213595 -5.7063391j
-6.47213595 +3.52671151j]
[ 25. +0.j -4.14589803+15.38841769j
-10.85410197 -3.63271264j -10.85410197 +3.63271264j
-4.14589803-15.38841769j]]
  1. Using np.fftn( ), get the N-D  Fourier Transform
import numpy as np

A = np.array([
    [2.3, 4.1, 6.5, 8, 9],
    [3, -1.2, 6, -2, -4]
])
result = np.fft.fftn(A)
print(result)

Output:

[[ 31.7 +0.j -7.22558014 -1.82338546j
4.62558014 +7.41621639j 4.62558014 -7.41621639j
-7.22558014 +1.82338546j]
[ 28.1 +0.j -3.53966744+12.90709507j
-12.26033256 -4.50909046j -12.26033256 +4.50909046j
-3.53966744-12.90709507j]]

NumPy Random Number Generations

Using numpy.random.rand(d0, d1, …., dn ) creates an array of specified shape and fills it with random values, where d0, d1, …., dn are dimensions of the returned array. This function returns an array of defined shape and filled with random values.

Example:

  1. Randomly constructing 1D array
import numpy as np

A = np.random.rand(7)
print(A)

Output:

[0.3126664 0.99492257 0.73164575 0.77857217 0.94840314 0.10833222
0.14896065]
  1. Randomly constructing 2D array
import numpy as np

A = np.random.rand(3, 4)
print(A)

Output:

[[0.22751116 0.09730939 0.97083485 0.67629309]
[0.94896123 0.96087311 0.8725199 0.48835455]
[0.86496409 0.32296315 0.72891428 0.27729306]]

Conclusion

In this python Numpy tutorial, I have covered the key concepts with some practical examples pertaining to Machine learning use cases.

If you want to have a look at more Numpy practical examples, you can check out the practical NumPy examples.

Now, I would like to hear from you. What Numpy use cases are you currently working on?

Python numpy tutorial

Python Web Scrapping Tutorial: Step by Step Guide for Beginners

python web scrapping tutorial

This article talks about python web scrapping techniques using python libraries.

One of the most important things in the field of Data Science is the skill of getting the right data for the problem you want to solve. Data Scientists don’t always have a prepared database to work on but rather have to pull data from the right sources. For this purpose, APIs and Web Scraping are used.

  1. API (Application Program Interface): An API is a set of methods and tools that allows one’s to query and retrieve data dynamically. Reddit, Spotify, Twitter, Facebook, and many other companies provide free APIs that enable developers to access the information they store on their servers; others charge for access to their APIs.
  2. Web Scraping: A lot of data isn’t accessible through data sets or APIs but rather exists on the internet as Web pages. So, through web-scraping, one can access the data without waiting for the provider to create an API.

What’s Web Scraping?

Web scraping is a technique to fetch data from websites. While surfing on the web, many websites don’t allow the user to save data for private use.

One way is to manually copy-paste the data, which both tedious and time-consuming.

Web Scraping is the automatic process of data extraction from websites. This process is done with the help of web scraping software known as web scrapers.

They automatically load and extract data from the websites based on user requirements. These can be custom built to work for one site or can be configured to work with any website.

Why Python for Web Scrapping?

There are a number of web scraping tools out there to perform the task and various languages too, having libraries that support web scraping.

Among all these languages, Python is considered as one of the best for Web Scraping because of features like – a rich library, easy to use, dynamically typed, etc.

Here are some most commonly used python3 web Scraping libraries.

  1. Beautiful Soup
  2. Selenium
  3. Requests
  4. Lxml
  5. Mechanical Soup
  6. Urllib2

Now discuss the steps involved in web scraping using the implementation of Web Scraping in Python with Beautiful Soup.

Building Web Scraper Using Python

In this section, we will look at the step by step guide on how to build a basic web scraper using python Beautiful Soup module.

  1. First of all, to get the HTML source code of the web page, send an HTTP request to the URL of that web page one wants to access. The server responds to the request by returning the HTML content of the webpage. For doing this task, one will use a third-party HTTP library called requests in python.
  2. After accessing the HTML content, the next task is parsing the data. Though most of the HTML data is nested, so it’s not possible to extract data simply through string processing. So there is a need for a parser that can create a nested/tree structure of the HTML data. Ex. html5lib, lxml, etc.
  3. The last task is navigating and searching the parse tree that was created using the parser. For this task, we will be using another third-party python library called Beautiful Soup. It is a very popular Python library for pulling data from HTML and XML files.

Step 1: Import required third party libraries

Before starting with the code, import some required third party libraries to your Python IDE.

pip install requests
pip install lxml
pip install bs4

Step 2: Get the HTML content from the web page

To get the HTML source code from the web page using the request library and to do this we have to write this code. I am taking this webpage.

source = requests.get('https://devopscube.com/project-management-software').text

Step 3: Parsing the HTML content

Parse the HTML file into the Beautiful Soup and one also needs to specify his/her parser. Here we are taking lxml parser.

soup = BeautifulSoup(source, 'lxml')

To print the visual representation of the parse tree created from the raw HTML content write down this code.

print(soup.prettify())

Step 4: Navigating and searching the parse tree

Now, we would like to extract some useful data from the HTML content. The soup object contains all the data in a nested structure that could be programmatically extracted. In our example, we are scraping a web page contains a headline and its corresponding website.

We can start parsing out the information that we want now just like before. Let’s start by grabbing the headline and its official website.

So to grab the first headline and its official website for the first post on this page let’s inspect this web page and see if we can figure out what the structure is.

python web scrapping inspect web page

From the above diagram, you can see that the whole content including the headline and the official website is under the article tag. So let’s start off by first grabbing this entire first article that contains all of this information.

article = soup.find('article')

Now let’s grab the headline. So if we look in the HTML source code, we have our <div> tag and within that <h3> tag the headline is present. So the code for grabbing the headline is

headline = article.div.h3.text
print(headline)

Output:

Backlog.com

Next, let’s grab the website. So if we look in the HTML source code, we have our <div> tag with its class = “entry-content” and inside that, we have a link inside <a> tag and the text of that link contains the official website. So the code for grabbing the website is

offcialWebsite = article.find('div', class_='entry-content').a.text
print(offcialWebsite)

Output:

www.backlog.com

The complete python web scrapping code is given below.

# Python program to illustrate web Scraping

import requests
from bs4 import BeautifulSoup
import lxml

source = requests.get('https://devopscube.com/project-management-software').text
soup = BeautifulSoup(source, 'lxml')

article = soup.find('article')
headline = article.div.h3.text
print(headline)
offcialWebsite = article.find('div', class_='entry-content').a.text
print(offcialWebsite)

Output:

Backlog.com
www.backlog.com

Realworld Python Web Scrapping Projects

Here are some real world project ideas you can try for web scrapping using python.

  1. Price monitoring in e-commerce websites
  2. News syndication from multiple news websites and blogs.
  3. Competitor content analysis
  4. Social media analysis for trending contents.
  5. COVID-9 data tracker

Also look at some of the python web scrapping examples from Github.

Note: Web scraping is not considered good practice if you try to scrape web pages without the website owner’s consent. It may also cause your IP to be blocked permanently by a website.

Web Scrapping Courses

If you want to learn full-fledged web scraping techniques, you can try the following on-demand courses.

  1. Web Scraping in Python [Datacamp]
  2. Web scrapping courses [Udemy]
  3. Using Python to Access Web Data [Coursera]

Conclusion

So, in this python web scraping tutorial we learned how to create a web scraper. I hope you got a basic idea about web scraping and understand this simple example. 

From here, you can try to scrap any other website of your choice.

python web scrapping tutorial

List of DevOps Blogs and Resources for Learning

List of DevOps Blogs and Resources

As DevOps continues to mature, various definitions and opinions emerge. Organizations adopting DevOps culture are looking into experts, customers, and vendors to help them achieve DevOps.

There are many blogs and articles which would help you to get a good insight on DevOps.

Also, the experience shared by the current practitioners will help you identify your organization’s current stand on DevOps. For this, we have a list of top books, blogs, and resources that will definitely help you on your DevOps journey.

If you are trying to become a DevOps engineer or want to keep up with current DevOps trends, this blog is for you.

List of Best DevOps Books

This list has books covering the DevOps cultural aspects to few key tools based books that are popular in DevOps domain.

Following are the list of popular DevOps books.

For Leaders & Managers & Engineers

  1. The Phoenix Project
  2. The DevOps Handbook
  3. Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation

Some Tool Based Books

  1. The DevOps 2.0 Toolkit
  2. The Docker Book
  3. Kubernetes eBook
  4. The Terraform Book
  5. The Art of Monitoring

I know what you are thinking now. “There are other cool tool based books that can be added to the list”

I agree, there are other good books related to Ansible, Jenkins, etc that can be added to the list. I just want to keep the list minimal with some important ones. I don’t mind adding a few more if you really think it is required in this list 🙂

List of DevOps Blogs and Resources

Following is the list of top DevOps blogs to get more insights into key DevOps trends.

Dzone DevOps

Dzone DevOps category contains great articles and blog posts from various industry experts and blogs. In this section, you can find theoretical to practical implementations of tools and practices from different industry verticals. Moreover, new articles and blog posts are added every day to keep you updated with the industry trends.

Devops.com

Devops.com is strictly about DevOps. It publishes news on strategies, insights, best practices, case studies, events, and much more, which would interest every DevOps enthusiast. Also, it is more focused on the enterprise and business side of DevOps.

DevOps on Quora:

Another interesting place to discuss and know more about DevOps. Here you can ask questions and answer other people’s queries.

You Might Like: Get Pluralsight One Month Free

Medium DevOps Publications

You can find a ton of articles from medium publications related to DevOps. Most of the articles are from the DevOps community and organizations working on DevOps methodologies.

DevOps Sub-Reddit

The DevOps page in Reddit lists news and happenings in the DevOps domain. It also lists pure tech tutorials on DevOps tools for dev’s and ops persons to follow DevOps culture.

Linkedin DevOps groups:

Linkedin DevOps groups are the best source of DevOps resources. Here is the list of groups you should check out.

  1. DevOps
  2. DevOps discussions

DevOps Podcasts

If you are an avid podcast listener, here is the list you should have a look at.

  1. The ship show
  2. To Be Continuous
  3. DevOps Radio

DevOps Influencers

By following influencers you can keep yourself updated with the latest technology trends. Here are the list of DevOps influencers you can follow.

  1. Martin Fowler
  2. Jez Humble
  3. John Arundel
  4. Gene Kim
  5. Kelsey Hightower

DevOps Tools List

A blog post published by Skillslane contains the tools and technologies that could be used for your DevOps toolchain. Check it out here. The Ultimate List of DevOps tools

DevOps Courses

There are a plethora of online resources available for understanding DevOps and its related technologies. However, if you want guided coaching, you can prefer any one of the following options.

  1. Udemy DevOps Courses
  2. Pluralsight DevOps Courses
  3. Linux Academy Courses

DevOpsCube Blog Tutorials

In DevOpsCube, we have many interesting articles on DevOps tools, implementations, and designs. Check out our articles on different categories.

Want to receive interesting content and resources related to DevOps?

Signup for DevopsCube email newsletter from the homepage.

White Papers & Reference Architectures

Reference architectures provide tons of information on standards and guidelines on using different cloud services. You can also learn how other organizations use cloud services for their use cases. Here are some resources you can start with.

  1. AWS reference architectures
  2. GCP Whitepapers
  3. Azure Whitepapers

Good Engineering Blogs

It’s very interesting to know about how big organizations architect their highly scalable applications.

You can read Engineering Blogs of big organizations to gain some insights on how they design systems and overcome several scalability and performance issues with open source tooling and their customized in-house tooling.

Plus these guys have created some awesome open-source projects that are worth looking at. Also, these blogs are not dedicated to DevOps, however, you will find a ton of useful information related to infrastructure and other development practices.

Here are my favorite engineering blogs.

  1. Netflix Engineering Blog
  2. Twitter Engineering Blog
  3. Capital One Tech
  4. Facebook Engineering
  5. Stackoverflow Engineering Blog
  6. Uber Engineering Blog
  7. Walmart Engineering Blog
  8. Linkedin Engineering Blog

Dedicated DevOps Forum

A dedicated forum for DevOps discussions and Q/A.

  1. DevOps Tech Discussions
  2. Stackexchange DevOps

Use Blog Syndication Tools

It is not easy to bookmark all the devops blogs and resources and check for content updates.

I hear you!

Use services like feedly.com to syndicate all your favorite blogs in one place. You can categorize the feeds as per your needs. I use Feedly personally for my reading.

The list ends here. I hope these resources help you out with understanding DevOps, evolving methodologies, and tools around it.

I’d like to hear from you.

Which Devops blog or resource in the list do you like the most?

Are you going to try any book?

Or maybe you read from blogs!

Either way, let me know by leaving a quick comment below.

List of DevOps Blogs and Resources

Become A DevOps Engineer in 2020: A Comprehensive Guide

How to Become A DevOps Engineer

As of the current IT market, the DevOps domain is one of the best options for IT folks in terms of salary and career growth. One common question I get quite often is “How to become a DevOps engineer?”

In this blog, I will try to answer this with my own experiences in practicing DevOps in different organizations.

Many people argue (including me) that there is nothing like a “DevOps Engineer” or a “DevOps Team” because it is not a thing. However everyone in the industry now got used to the term “DevOps engineer” and as long as you understand the DevOps philosophy, these titles don’t matter much.

Having said that, there are few misconceptions about what DevOps really means. One such misconception is “Automation is DevOps”. Developing skills related to automation is not enough to become a DevOps Engineer.

Wikipedia says,

DevOps (a clipped compound of development and operations) is a culture, movement or practice that emphasizes the collaboration and communication of both software developers and other information-technology (IT) professionals while automating the process of software delivery and infrastructure changes.

From the above definition, it is clear that DevOps is not about any tools or technologies. It is a philosophy for making different IT teams work together to deliver better and fast results through continuous feedback.

Read this article to understand DevOps in a better way. What does DevOps really mean?

Here is an interesting trend graph showing DevOps popularity in last 5 years.

devops trends for five years

Organizations trying to achieve DevOps requires people with collaborative skills, willing to change and adopt new technologies, a good understanding of systems, automation tools, CI tools, Version control systems, networking, experience in using project management tools, etc. that are required for getting an app into production without much delay.

Also, the design or the pipeline designed by the team should be able to deliver small updates or releases without much manual intervention. This could happen only if there is a cultural shift in the way teams work.

Skillsets To Become a DevOps Engineer

You must understand the fact that DevOps is not specific to developers or system engineers. It’s for anyone who is passionate about evolving practices, technologies, and willing to work in a collaborative environment where everything is automated to make everyone’s life so easy. 

In this article, I will explain how you should prepare yourself for tools and technologies to adapt and work in DevOps culture.

Note: In this article, I have covered many verticals. It is not possible for a beginner to be a master of everything. However, having a fair amount of knowledge in these areas will help you if you are pursuing a career in DevOps

Understanding DevOps Culture

The first and foremost thing is to understand the DevOps culture. It is all about bringing people together to work towards a common goal in an efficient way. One thing IT managers should do before getting into DevOps toolsets is that every member of the team should be mentored on how DevOps work and its cultural aspects. It avoids lots of confusion in the team.

People will stop pointing fingers for various issues once they understand the fact that when there is delay or issue in project delivery, everyone involved in the project is equally responsible.

Once you practice DevOps culture, you will stop saying that “CI/CD and automation is DevOps

Resources

  1. DevOps Culture and Mindset [Coursera]
  2. The Phoenix Project [Recomended eBook]

Learn about *nix Systems

We are in an era where we cannot live without Linux/Unix systems. You should get a better understanding and working knowledge of various Linux distributions that are highly used by organizations (RHEL, Centos, Ubuntu, CoreOS, etc).

As per The Linux foundation case study, 90% of the public cloud workload runs on Linux

Public cloud linux usage

Here is another interesting study from Redhat, which shows the different Linux distros being used in the public cloud.

Linux distro use in public cloud

Now you have enough reasons on why you should focus on Linux.

When it comes to Linux, its all terminal, GUI is less preferred in *nix world. Get your hands dirty with terminals of these systems.

You can use a Virtual box or AWS/GCP/Azure to spin up Linux servers.

You can start with the following.

  1. Understand the Linux booting process
  2. Install and Configure web servers (Apache, Nginx, Tomcat, etc..)
  3. Learn how Linux processes work.
  4. Learn how SSH works.
  5. Learn about different file systems.
  6. Learn about system logging, monitoring, and troubleshooting.
  7. Learn about important protocols (SSL, TLS, TCP, UDP, FTP, SFTP, SCP, SSH)
  8. Learn to manage services and try to create a service on your own (Initd, Systemd)
  9. Host static/Dynamic websites on web servers.
  10. Setup Load balancers & Reverse Proxys (Nginx, HA proxy, etc)
  11. Break something and learn to troubleshoot.

Resources

  1. Introduction to Linux [edX]
  2. Learn Linux in 5 days [Udemy]

Understand How Infrastructure Components Work

The basic building block of any organization is its Infrastructure. It could be on the cloud or on-premise Data Center. An overall understanding of Infrastructure components is a must for a person who wants to practice or work in a DevOps environment. You should mainly have a basic understanding of the following.

Networking


  1. Subnets
  2. Public network
  3. Private network
  4. CIDR Notations
  5. Static/Dynamic IP’s
  6. Firewall
  7. Proxy
  8. NAT
  9. Public & Private DNS
  10. Troubleshooting
  11. VPN

High Availability


  1. Clusters
  2. Fail Over Mechanisms
  3. Disaster Recovery

Security


  1. PKI Infrastructure
  2. SSL certificates

Storage


  1. SAN
  2. Backups
  3. NFS

Single Sign On


  1. Active Directory/LDAP

Load Balancers

  1. L5 Load Balancers
  2. L7 Load Balancers
  3. Load balancing algorithms
  4. Reverse Proxy

VPN


  1. Site to Site VPN
  2. Client to site VPN

There could be more things but I have highlighted the key components in an IT Infrastructure.

Get Certified On Cloud

When I say “Get Certified”, please do not use the exam dumps just to pass the certification. It adds very less value to you. May be its good for the organization to show the clients that they have certified cloud engineers.

Most of the public cloud market share is currently owned by AWS. Here is the report from Businesswire.

public cloud market share

Pick any one public cloud, preferably AWS, and learn about all its core infrastructure services. Do hands-on on all the core services and understand how it works.

Watch AWS re-invent videos and understand how other organizations are using AWS services for hosting their applications. Trust me, you will learn a lot from these videos and no online training will provide that much information on how to run production workloads on AWS.

If you are planning to get certified GCP, watch their Google Next videos.

Use the certification to gauge yourself on the respective platform.

Resources:

  1. Ryans AWS Certification Courses
  2. Google Certified Associate Cloud Engineer Certification
  3. Microsoft Azure – Beginner’s Guide + AZ-900 preparation

Learn to Automate

Automation has become an important aspect of every organization. We no more create servers manually we just automate it.

As per a report from Redhat, many organizations are investing in their automation initiatives. Check out this data.

Organization Devops Automation Budget

From provisioning servers, application configuration, deployment, everything should be automated. You can learn any of the following devops toolsets that fit your needs.

For Dev Environment


  1. Vagrant
  2. Docker Desktop
  3. Minikube
  4. Minishift

For infrastructure provisioning


  1. Terraform
  2. CLIs (of respective cloud provider)

For Configuration Management


  1. Ansible
  2. Chef
  3. Puppet
  4. Saltstack

VM image management


  1. Packer

Resources:

  1. Learn DevOps: Infrastructure Automation With Terraform
  2. Ansible for the Absolute Beginner – Hands-On – DevOps
  3. Docker for the Absolute Beginner

Containers, Distributed Systems & Service Mesh

Container adoption is increasing day by day. The organization you work for might not be using containers now, however, it is best to have hands-on knowledge on container technology like Docker. It will gain you some competitive edge among your peers.

Once you understand docker, you can try out its clustering and orchestration tools like Kubernetes, Docker Swarm, etc.

These platforms are best suited for microservices-based architecture.

Here is an interesting Kubernetes usage trend by Datadog.

kubernetes usage trends

Here is the five year increasing search trends for Kubernetes.

kubernetes user trends

A service mesh is an advanced topic when it comes to distributed systems. If you are a beginner to container toolsets, you can learn this after gaining a good knowledge of microservices-based architecture.

Resources

  1. Kubernetes Tutorials For Beginners: Getting Started Guide
  2. Best kubernetes courses
  3. https://github.com/kelseyhightower/kubernetes-the-hard-way

Logging & Monitoring

Logging and monitoring are very important aspects of an infrastructure.

Most of the apps deployed in the infrastructure will produce logs. Based on architecture and design, logs will be pushed and stored in a logging infrastructure.

Every company will have a logging infrastructure. Commonly used stacks are Splunk and ELK. Also, there are few SaaS companies like Loggly which provides logging infrastructure.

Logging systems will be used by developers, operations team and security teams to monitor, troubleshoot, and audit applications and infrastructure.

In every organization, mission-critical applications will be monitored 24/7. There will be monitoring dashboards. Generally, dashboards are created from logging sources, or metrics generated by the application.

Also, there will be alerting systems. Based on the rules configured in the monitoring systems, alerts will be triggered.

For example, an alert could be triggered as a slack notification, Jira ticket, email alert, ServiceNow incident ticket, or xMatters phone call. Alerting workflows differ from organization to organization.

As a DevOps engineer, you should be able to query logs and troubleshoot issues in non-prod and prod environments. Understand regular expressions is very important to query logs in any logging tool.

Resources

  1. Elastic Stack – In-Depth & Hands-On
  2. Monitoring and Alerting with Prometheus
  3. Art of Monitoring [eBook]
  4. Regular Expressions (Regex) Tutorial

Understand Security Best Practices (DevSecOps)

DevSecOps is another area dealing with integrating security practices in each stage of devops.

Wikipedia says,

DevSecOps is an augmentation of DevOps to allow for security practices to be integrated into the DevOps approach. The traditional centralised security team model must adopt a federated model allowing each delivery team the ability to factor in the correct security controls into their DevOps practices.

Checkpoints 2020 security survey shows different cyber attacks by regions.

In cloud environments, crypto mining is a common attack. This mostly happens when the cloud access secrets are maintained poorly so that hackers get access to it.

When it comes to DevOps, secret management for applciations and infrastructure componets should follow standard security practices.

Following image shows the key DevSecOps standard practices published by Redhat.

Source: Redhat.com

Hashicorp Vault is a great secret management tool you can look at. There are many workflows available to manage environment secrets.

Resources:

  1. HashiCorp Vault: The Advanced Course
  2. Vault Tutorial
  3. What is container security?

Learn Coding & Scripting

In today’s world, we treat everything as code. Even though there are enough tools to automate everything, you might need custom functionality that a tool may not offer. In such cases, coding/scripting comes in handy to achieve those functionalities.

For example, Jenkins pipeline as code requires an understanding of groovy, Ansible custom module requires understanding on python, Writing Kubernetes operator requires Golang experience.

You can learn the following commonly used scripting languages.

  1. Bash/Shell
  2. Python
  3. Golang

Golang is really getting popular in the DevOps domain. Lots of DevOps tooling are done using Golang nowadays. In fact tools like Kubernetes, terraform are written in go.

A survey was done by JFrog for Golang adoption during GopherCon and 18% of the respondents said they use Golang for DevOps related work.

Golang devOps adoption survey

Resources

  1. Complete Python Bootcamp: Go from zero to hero in Python 3
  2. Learn How To Code: Google’s Go (golang)
  3. Linux Shell Scripting: A Project-Based Approach to Learning

Learn Git, Learn to Document, Learn about GitOps

It is very important to version control everything you do (except passwords and secrets :P). Git is the best version control tool. There are plenty of tutorials available on git and it will not take much time to learn important git operations.

You can start with Github or Bitbucket as your remote code repository.

Once you understand Git, learn about GitOps.

So what is this GitOps anyway? here is what gitops.tech explains GitOps

GitOps is a way of implementing Continuous Deployment for cloud native applications. It focuses on a developer-centric experience when operating infrastructure, by using tools developers are already familiar with, including Git and Continuous Deployment tools.

Next important thing is to document every important thing you do. Every repository must have a README file which should explain your code in a better way. Good documentation will not only help you but also someone who tries to use your code.

Resources:

  1. Git Complete: The definitive, step-by-step guide to Git
  2. Git Basics Every Developer and Administrator Should Know

Understand End To End Application Delivery Lifecycle

When it comes to application delivery lifecycle, there are three important concepts you need to be aware of.

  1. Continuous Integration
  2. Continuous Delivery
  3. Continuous Deployment

Read this release process management article to understand how a typical application development, build, testing, deployment, approval process, and validation work.

Learn to use CI/CD tools like Jenkins, Travis CI, GoCD, etc

Here is a good pictorial representation on CI/CD process by bmc.

CI/CD in devops
source: bmc

DevOps vs SRE

SRE is another evolving topic in DevOps community.

SRE is set of practices and philosophies emerged from google.

Here is what google says about DevOps and SRE

DevOps and SRE are not two competing methods for software development and operations, but rather close friends designed to break down organizational barriers to deliver better software faster.

I recommend these official documents from Google to understand more about SRE.

  1. What is SRE?
  2. SRE vs. DevOps: competing standards or close friends?

Read Read and Read

Nothing will gain knowledge like reading. Read at least one DevOps tech blog related to engineering. Follow all the engineering blogs like Netflix, Twitter, Google, etc. Learn how they are using the right toolsets, their deployment strategies and their latest open source projects.

Follow like-minded people on LinkedIn, Reddit, Medium, Quora etc.

Resources

  1. List of Best DevOps Blogs & Resources

Write a Blog

It’s good to share with others about your experiences and learning. You can publish tutorials, learnings, and your experiences on your personal blog. It will help others and it will create a personal brand for yourself. It takes less than 30 minutes to set up a WordPress blog or a Medium blog. If you want help to start your blog, drop a message to us at [email protected]

Whenever you learn something new, you can write about it. It will be a reference to you as well as others. You can share it on Linked in groups, Dzone, etc..

Conclusion

The tools and processes involved in DevOps are not limited to what is mentioned in this article. However, these are commonly used opensource tools and technologies you can start with to become a DevOps engineer.

Now I’d like to hear your thoughts:

What’s your key takeaway from this?

Or maybe you have a question about different verticals explained.

Either way, leave a comment below right now.

How to Become A DevOps Engineer