What is Kubernetes Storage?

Kubernetes is a fast-evolving ecosystem to overcome tasks that have been operational duties for administrators for years. If a process is stuck, Kubernetes will restart it. If a task needs more computational power, Kubernetes will schedule that. Kubernetes does that and a lot more by orchestrating containers. To do that, Kubernetes uses Pods, or compute units, to manage and interact with containers. If you are new to Kubernetes, and all the terms mentioned up to this point are new to you, make sure to check our previous article, What Is Kubernetes, and Why Is Important?

Pods are ephemeral, meaning that Pods have a short lifetime. Additionally, network objects like load balancers and even IP addresses have a short lifetime as well. The short lifetime of Pods and the network objects mentioned above exposes a need for application storage where you can reliably store your data.

Application storage is essential to Kubernetes. Without application storage, you might lose data whenever a container crashes. Another problem you might face without it is that you might not be able to share files between containers running in the same Pod. However, for application storage to successful work with Kubernetes, it needs to meet the following criteria:

  • Application storage needs to be dynamically attached to compute units
  • Application storage needs a lifecycle separate from Pods
  • Ideally, application storage needs to be provisioned dynamically
  • Application storage needs Quality of Service (QoS)
  • Ideally, application storage needs to be accessible by more than one compute entity to allow easy scale out
  • Application storage needs to be secured by other means than network IP addresses

Dynamically Attached Storage

Compute units with a short lifecycle need reliable storage to store their results or access their assets. Kubernetes has a concept for this type of persistent storage requirement: A pod can request a persistent volume to be attached. This request is done by a so-called "Persistent Volume Claim" or "PVC." A Pod declaration claims, for example, 10GB of persistent storage. The Pod will only start up successfully if the system can meet this requirement. Kubernetes will try to solve these requirements by attaching a pre-provisioned storage volume or by freshly creating a storage volume and then attaching it to the Pod.

Storage Lifecycle

A storage object in Kubernetes can either be created by a storage administrator ("Pre-provisioned Storage") or dynamically by Kubernetes itself. Kubernetes can dynamically create storage objects if it is installed with a capable storage driver. Kubernetes can use both kinds of storage to fulfill a Persistent Volume Claim.

After creation, a storage object is attached to a Pod and becomes ready to be used, i.e., an application can read and write data to it.

What happens next is determined by a rule set that is called "reclaimPolicy." There are two options available: The ReclaimPolicy "Delete" and the policy "Retain."

The Default policy is "Delete." That means that all data is lost once a PVC gets deleted. This ensures that data from one use case will not end up in a different scenario. But it also means that your data is not available for any other processes anymore.

The "Retain" policy ensures that data from a former persistent volume is kept. It allows to post-process data after a persistent volume claim terminates. That could include actions like archiving contained data from that volume using other pods or PVCs. It can also include fulfilling legal requirements regarding deleting data before using the same storage object again. Or, for example, simply ensure a four-eyes process where humans decide if data should really be deleted.

Dynamic Storage Provisioning

Pre-Provisioning storage by administrators can give you the highest degree of control over your data. On the other hand, automating this process creates a lot of flexibility for users. This is where dynamically provisioned storage kicks in. Kubernetes talks to a storage system, or API, and the storage system, in turn, hands out a storage object that can be consumed as "PersistentVolume."

The dynamically created resources can range from local storage on the Kubernetes cluster nodes to simple NFS shares up to unlimited scalable storage objects served by a modern SDS system.

The way Kubernetes requests are performed and how storage solutions answer to these requests are standardized: They follow the specs of the "ContainerStorageInterface" (CSI). Kubernetes has a plugin system that allows integrating vendor CSI drivers into any Kubernetes cluster. From a technical point of view, these plugins are usually accompanied by a vendor-specific client that allows consuming storage objects inside a Kubernetes cluster. This kind of storage provisioning allows for rapid storage access within seconds and gives users the flexibility to act without any other human being involved.

In the image below, you can see an example of the Quobyte CSI plugin, and Quobyte client which allows a Pod to consume storage. If you want to learn more about how Quobyte works with Kubernetes, check out this Kubernetes Storage page.

Dynamic Storage Privisioning - Kubernetes CSI Plugin

Quality of Service: StorageClasses

Not all application demands for storage are the same. This is why Kubernetes has a concept of "Storage Classes." These storage classes are referenced every time a PVC requests a certain storage object. Storage classes are maintained by a Kubernetes administrator. This person can decide about the quality of storage that is available for users on the specific Kubernetes cluster. By declaring different storage classes, it is possible to access totally different storage systems. It's also possible to access different device classes (SSD/HDD) within a single storage system. Also, storage that has certain levels of redundancy, backup policies outside of Kubernetes, and immutability of files.

Declaring a StorageClass effectively, means unveiling all different possible storage qualities to Kubernetes users.

Access Patterns: RWO, ROX, RWX, RWOP

In Kubernetes, Storage resources can be accessed in different modes: Exclusively by one Pod, with shared write- and/or read access among different Pods. Your storage vendor must support these different modes; sometimes, only a subset will work.

  • RWO - ReadWriteOnce: Using this mode, a single worker node can mount this volume read/write. On this worker node, multiple pods can access the data.
  • ROX - ReadOnlyMany: Using this mode, a volume can be mounted by many worker nodes in read-only mode. Thus it can be used by many Pods/ applications at the same time for reading.
  • RWX - ReadWriteMany: Same as above, but for writing. The application or the storage administrator needs to make sure that locking problems are mitigated.
  • RWOP - ReadWriteOncePod: This is the only mode where only one Pod or application has exclusive read and write access to a Persistent Volume

To learn more about persistent volumes, see the official Kubernetes documentation section on persistent volumes

Storage Access

To avoid data being mounted from one Pod or application into another Pod, Kubernetes does bookkeeping. This bookkeeping stores the relationship between a PVC and a Persistent Volume in the Kubernetes database. This bi-directional link between volumes and volume claims is called "ClaimRef." This claim-reference inside a Persistent Volume declaration specifies which PVC initially claimed it.

This is especially important if a volume has the reclaimPolicy "Retain" and shall be used in different contexts. A cluster administrator can control which PVC can request this specific volume by setting the "claimRef" to a specific PVC.

Leave Us Your Feedback!
Leave Us Your Feedback About This Article: