To better understand Kubernetes and how important it is, we will start by reviewing some basic concepts. Then, we will explore the reasons why Kubernetes is important. Lastly, we will dive into Kubernetes and some of its components. So let us start by reviewing containers.
Containers are composed of container images as a method to package software and its dependencies, namespaces for resource isolation, and Linux kernel cgroups for resource limits, such as CPUs, RAM, etc.
The Linux kernel cgroups, or control groups, limit resource utilization by processes. The resources that cgroups can control include CPU, memory, and network, among others. For example, you can assign a certain amount of memory with cgroups. Cgroups also allow you to have different shares of CPU from the host system. And lastly, you can use cgroups to measure resource usage.
Namespaces, on the other hand, limit what the processes or containers, within a namespace, can see. Namespaces allow isolation for containers and prevent containers from seeing the file systems of other containers. In short, namespaces make it possible for containers to have their own file system, network, and user namespace, among other namespaces.
Container images are a set of files and directories needed to create containers and run software in them. Container images are composed of layers. The concept of layers is that they allow you to combine files and directories to create a unified view. Layers provide a Copy-on-Write (CoW) view of the files in the container image, which means that whenever you add a file, it will not be duplicated until it is written to it; so if you have a file that only needs to be read, the file won’t be duplicated.
In order to implement layers, you need to use a Union file system (FS), such as OverlayFS or AUFS, which can provide merged views of files. The two file systems, OverlayFS and AUFS, are used by Docker, a container runtime, to manage images and layers. UnionFS can help you start new containers very efficiently. This is because there is no need to duplicate the whole image; you can create a new container just by adding a new empty layer at the top for each container. Adding an empty layer allows a container to have its own writable space without affecting other containers.
In summary, container images allow you to package software to be able to run it in containers. Therefore, in a container, you’ll find all the dependencies and libraries needed to run applications in different computing spaces.
Containers, in a way, containerize the host operating system (OS) by running the application directly on the host CPU and talking to the host OS kernel. This gives you the ability to run several containers in a single OS. With containers, you can easily share the server's storage, memory, network, and CPU resources among different applications. Additionally, even when sharing the server's resources, containers are independent of each other, which provides isolation for your applications.
Another great benefit of using containers is Portability. Because containerized applications come with all dependencies, libraries, and files already packaged, you can deploy your applications anywhere. You do not need to worry about needing a specific OS or server, whether you want to deploy your application on-prem or in the cloud; this makes containers easy to deploy.
A Virtual Machine (VM), as the name suggests, is a computer system on top of a physical computer or a server. VMs, as well as containers, can be used to deploy applications. However, VMs need an OS besides the host OS. This is already a problem because needing a different OS means you need to allocate resources specifically for that OS. Additionally, the VMs not only need an extra OS, but similarly to containerized applications, they also need the libraries, binaries, and any necessary files to run the application.
Suppose you want to deploy three applications, so you create three VMs on your personal computer because you can only run one application in each VM. Besides the OS in your computer, you will need three extra OSs, one for each VM. That already takes up a lot of space from the system. On top of that, you need all the files required by your applications. And, if one of your VMs takes up all the resources in your system, you won’t be able to run the other two applications. Although VMs allow you to package applications and run them in complete isolation, VMs are not as good as containers.
As we mentioned before, Containers talk to the host kernel, so they do not require a different OS as VMs do. VMs are usually much larger than containers and also require a lot more memory than containers. VMs also take longer to boot compared to containers, which can start very quickly. In summary, with containers, your resources are not wasted by virtualization of hardware, additional OSs, and simulation of block storage. These are just a few of the reasons why containers have proven to be more efficient than VMs.
Although containers are great, you could encounter several issues if you tried to manage several of them manually. If you use containers just for testing or have very few containers in your environment, you could probably manually manage them. However, if you are planning on creating a large container infrastructure, then this is where things get more complicated.
Managing an extensive container infrastructure includes having a scheduler to know where to run the containers. You need to be able to seamlessly move containers and know where to move them so that your applications keep running smoothly. You also need a load balancer to redirect traffic to the proper server or container, not to overload some of them with too much traffic while others don't get any.
Additionally, you need to monitor the container's health, as some of them might completely fail, and you will need to replace them right away. Some containers might malfunction, and they can start using more resources than needed; in such a case, you will need to identify the faulty container and then replace it. You also need to control how your containers access resources such as storage or network.
You need to consider several other things when you have a large container infrastructure, and as you could imagine, managing this type of infrastructure manually is impossible. Even if it were possible, it would be very inefficient. That's why you'd need a container orchestrator, which can do all the things we mentioned above and much more. With a container orchestrator, you can automate the tasks required to run containerized applications. So, in other words, to have a reliable container infrastructure, you need a suitable container orchestrator, and here is where Kubernetes comes into play.
Kubernetes, also known as k8s, is an open-source container orchestration system that allows you to automatically manage containerized workloads and services. This means that Kubernetes can help you automatically manage, scale, and deploy your containerized applications. Kubernetes was initially developed at Google but was open-sourced in 2014.
Kubernetes provides automatic load balancing, so you do not have to worry about servers being flooded. If your containers need to be managed, i.e., need to be activated, suspended, or just shut down, Kubernetes takes care of it. Also, if containers fail or malfunction, Kubernetes can replace them. In addition, Kubernetes can dynamically scale; if you need to scale up or down your application based on demand, Kubernetes helps with that as well. In other words, Kubernetes ensures that your applications always work as expected.
Whenever you deploy Kubernetes, you get a Kubernetes Cluster. A Kubernetes Cluster is a combination of several components that work together to successfully run containerized applications. To understand a little more about Kubernetes and what a Kubernetes cluster is, let’s review some components that make up a cluster.
A Pod is a computing unit that can host one or more containers that can share resources such as storage and network. This means that a Pod can run a single container, but it can also run several containers that need to work together. In other words, Kubernetes uses Pods to manage and interact with containers. Pods can request computation resources and memory, depending on the task that it needs to get done.
Pods are tied to the machines where they are created and remain there until they are destroyed. However, you can have replicas of the same Pod in different machines. Kubernetes uses a DaemonSet to ensure that all or some of your machines run a copy of a Pod. As you add machines to your cluster, Pods are added to those machines. Similarly, if a machine is removed, the DaemonSet ensures that the pods associated with that machine are also destroyed.
A Pod’s characteristic worth noting is that Pods are ephemeral, meaning they have a short lifetime. When a Pod fails, Kubernetes can replicate that Pod without interrupting the workflow.
Given that Pods are ephemeral, you can lose files when a container fails, or you might encounter issues when trying to share files among containers working on the same Pod. A solution to this is the usage of volumes and persistent volumes (to learn more about volumes and Kubernetes storage, make sure to check out our next article: What is Kubernetes Storage?)
Nodes are worker machines, also called worker nodes, which can be physical or virtual machines. Nodes usually contain several Pods and have all the services needed to run those Pods. In short, nodes are in charge of running your containerized applications. To give you more information about Nodes, we will review two components in a Node: kubelet and kube-Proxy.
Please note that there is a third component in a Node called container runtime, which needs to be installed to run Pods; however, we won’t be discussing that component in this article.
The kubelet is an agent responsible for scheduling and making sure apps are properly running in a node. The kubelet is in charge of communicating with the Control Plane (discussed in the next section). If Pods fail, the kubelet follows the instructions from the Control Plane and can create or destroy a Pod accordingly. The kubelet also provides information about the Node’s health to the Control Plane.
The kube-proxy is a network proxy that allows network traffic to be redirected into the Pods in the Node. The kube-proxy allows network communication inside and outside of a Kubernetes cluster. Suppose you have a Pod that runs a web page and a Pod that runs your database. If your web page needs to communicate with the database, the kube-proxy is in charge of making that communication successful.
The control plane contains several components that are in charge of controlling and managing all the Kubernetes services required to run and deploy an application successfully. The control plane is responsible for managing all the Pods and ensuring that the desired state is achieved. We won’t be discussing each component in detail in this article, but you can find them in the Control Plane Components section in the Kubernetes documentation.
In the control plane, you have components such as the kube-apiservice, which exposes the Kubernetes API and allows you to interact with your cluster. You also have the etcd, which holds all the important data used by Kubernetes. A third component is the kube-scheduler, which decides when and where to run a Pod. The scheduler determines which Node a Pod should run on based on the Node’s resource availability. Then you have the kube-controller-manager, which is responsible for handling the controller services in your cluster. Controllers help your cluster to reach the desired cluster’s state, making or requesting changes to achieve such a state. Lastly, you have the cloud-controller-manager, which allows you to connect your Kubernetes cluster to a cloud provider.
To sum it all up, this article provided a brief introduction to Kubernetes, some of its components, and their functionalities. However, we did not discuss storage for Kubernetes. As we mentioned before, Pods are ephemeral and you can lose data if they are destroyed. For that reason, you need a reliable software storage solution to keep your data safe, so that whenever a Pod fails, the new Pods can resume the task being performed by the faulty Pod.
Also, if you need or are planning to deploy stateful applications, then you need somewhere to store your data. This is where you will need a storage solution that works well with Kubernetes to provide your Pods with Persistent Volumes. If you would like to learn more about storage for Kubernetes and how to take Kubernetes to the next level with stateful applications, make sure to read our next article: What is Kubernetes Storage?.