Kubernetes is a fast-evolving ecosystem used to overcome tasks that have been operational duties for administrators for years. If a process is stuck, Kubernetes will restart it. If a task needs more computational power, Kubernetes will schedule that. Kubernetes does that and a lot more by orchestrating containers.
To keep containerized applications running, Kubernetes uses Pods, or compute units, to manage and interact with containers. Pods are ephemeral, meaning that Pods have a short lifetime. Additionally, network objects like load balancers and even IP addresses have a short lifetime as well. This short lifetime exposes a need for application storage where you can reliably and permanently store your data.
In this article, we will focus on Kubernetes storage. Therefore, it is recommended that you have some previous knowledge about Kubernetes and its components. To learn more about Kubernetes components make sure to check our previous article: What Is Kubernetes and How It Works.
Application storage is essential to Kubernetes. Without application storage, you might lose data whenever a container crashes. Also, you might not be able to share files between containers running in the same Pod.
Nowadays there are many storage options for Kubernetes. However, for application storage to successful work with Kubernetes, it needs to meet the following criteria:
Compute units with a short lifecycle need reliable storage to store their results or access their assets. Kubernetes has a concept for this type of persistent storage requirement: A pod can request a persistent volume (PV) to be attached. This request is done by a so-called "Persistent Volume Claim" or "PVC."
Consider the following example of a PVC: A Pod declaration claims 10GB of persistent storage. The Pod will only start up successfully if the system can meet this requirement. Kubernetes will try to solve these requirements by attaching a pre-provisioned storage volume. When pre-provisioned storage volumes are not available, Kubernetes will create a new storage volume and attach it to the Pod.
To learn more about persistent volumes, see the official Kubernetes documentation section on persistent volumes
A storage object in Kubernetes can either be created by a storage administrator ("Pre-provisioned Storage") or dynamically by Kubernetes itself. Kubernetes can dynamically create storage objects if it is installed with a capable storage driver. Kubernetes can use both kinds of storage to fulfill a Persistent Volume Claim.
After creation, a storage object is attached to a Pod and becomes ready to be used. For example, an application can read and write data to it.
What happens next is determined by a rule set that is called "reclaimPolicy." There are two options available: The ReclaimPolicy "Delete" and the policy "Retain."
The Default policy is "Delete." That means that all data is lost once a PVC gets deleted. This ensures that data from one use case will not end up in a different scenario. But it also means that your data is not available for any other processes anymore.
The "Retain" policy ensures that data from a former persistent volume is kept. It allows to post-process data after a persistent volume claim terminates. This includes:
Pre-Provisioning storage by administrators can give you the highest degree of control over your data. On the other hand, automating this process creates a lot of flexibility for users. This is where dynamically provisioned storage kicks in.
First, Kubernetes talks to a storage system, or API. Then the storage system, in turn, hands out a storage object that can be consumed as "PersistentVolume." The dynamically created resources can be local storage on the Kubernetes cluster nodes. But also simple NFS shares up to unlimited scalable storage objects served by a modern SDS system.
The way Kubernetes requests are performed and how storage solutions respond to these requests is standardized. They follow the specs of the "ContainerStorageInterface" (CSI).
Kubernetes has a plugin system that allows integrating vendor CSI drivers into any Kubernetes cluster. These volume plugins are usually accompanied by a vendor-specific client that allows consuming storage objects inside a Kubernetes cluster. Dynamic storage provisioning allows for rapid storage access within seconds. This gives users the flexibility to act without any other human being involved.
In the image below, you can see an example of Dynamic storage provisioning. In the image, you can see the Quobyte CSI plugin, and the Quobyte client which allows a Pod to consume storage. If you want to learn more about how Quobyte works with Kubernetes, check out this Kubernetes Storage page.
Not all application demands for storage are the same. This is why Kubernetes has a concept of "Storage Classes.” Storage classes are referenced every time a PVC requests a certain storage object.
Storage classes are maintained by a Kubernetes administrator. This person can decide about the quality of storage that is available for users on the specific Kubernetes cluster.
By declaring different storage classes, it is possible to access totally different storage systems. It's also possible to access different device classes (SSD/HDD) within a single storage system
Also, storage that has certain levels of redundancy, backup policies outside of Kubernetes, and immutability of files.
Declaring a StorageClass effectively, means unveiling all different possible storage qualities to Kubernetes users.
In Kubernetes, Storage resources can be accessed in different modes: Exclusively by one Pod with shared write- and/or read access among different Pods. Your storage vendor must support these different modes; sometimes, only a subset will work.
To avoid data being mounted from one Pod or application into another Pod, Kubernetes does bookkeeping. This bookkeeping stores the relationship between a PVC and a Persistent Volume in the Kubernetes database. This bi-directional link between volumes and volume claims is called "ClaimRef." This claim-reference inside a Persistent Volume declaration specifies which PVC initially claimed it.
This is especially important if a volume has the reclaimPolicy "Retain" and shall be used in different contexts. A cluster administrator can control which PVC can request this specific volume by setting the "claimRef" to a specific PVC.