A Not So Minor Gap
Container technology has made phenomenal progress and is getting ready to take over application infrastructures in public and in private clouds. Storage access is an important part of the container technology stack and most container orchestration systems are able to routinely provision access to storage systems as part of their orchestration process.
So at first sight, it seems that storage integration is a problem solved. However, the solution has one important gap: it doesn’t provide for secure access. As container infrastructures lack a notion of cluster-wide user identity, there is no way to use existing mechanisms of storage systems for access control. Consequently, existing data cannot be accessed in a controlled way and data written by containers has to live in a separate universe.
How Storage is Attached to Containers
Let’s start with some background about how containers and storage work together. Containers are accessing storage by mapping a part of the host namespace into the container’s namespace. Such a mapping is often referred to as a volume (which doesn’t necessarily correspond to a volume on the underlying storage system).
On a Linux host, applications run as processes that act as a user. Users are identified by their user id (or
uid), which usually have a human-readable alias – their user name. Users can be members of one or more groups, which we identify by a group id (or
gid) and a given group name. When doing file system access control, the user id and group ids of the acting process are used to take access decisions depending on permission bits and ACLs.
The processes of containerized applications run in user namespaces. By design, the host namespace and container namespace are two separate universes and the users (i.e. uids) inside the container have no prescribed relationship to the users of the host. This is necessary because user and group ids in the container are totally arbitrary, disconnected from the host and fully under control of the owner of the container. The need for this structure becomes obvious when realizing that Docker-style Linux containers often operate internally with a uid of 0, better known as
root. Mapping container uids to arbitrary host uids makes uid 0 lose its privileges, and safely contains what is running inside the container.
Why Access Control is Tricky
While these mechanisms allow running applications in isolation, we are in a bad position if we wanted to do access control: a running container on a host acts under arbitrary uids that even differ across hosts. In order to do file system access control, we need a consistent mapping of identities across all machines in the cluster.
The current work-around to make storage at least somewhat accessible is to make the file system globally read- and writable, thereby effectively disabling any access control. In order to achieve at least some kind of isolation, volumes are exclusively associated with specific containers. This solution suggests itself when building a container infrastructure with a block storage system (think EBS volumes on AWS for public clouds, or a distributed block storage system like Ceph on-premises), as a file system volume on block storage is only accessible from one host anyway.
However, when building an infrastructure with NFS or a distributed file system, the volume-per-container-instance becomes a major restriction of functionality considering that many file systems are capable of being accessed from multiple containers spread across hosts. What’s more, there are plenty of use cases where the data lifecycle needs to be independent of the container lifecycle and containers across hosts need shared access to the same data (like with a CMS), or access existing data while respecting existing permissions.
Putting Identity Back Together
Since the host operating system can’t provide us with a user identity, we need another form of identity for a running container in order to enable file system access control to do its job. Actually, we can make use of the container infrastructure here: it knows exactly who owns a running container and decides who is allowed to change its state (for example stop it). Depending on the container platform, this identity can take several forms. Kubernetes for example has namespaces and service accounts. These identities need not be related to operating system-level identities and are an artifact of the container infrastructure. Unfortunately, not all container infrastructures have such mechanisms in place.
Even if we don’t want to tie together the identities for managing containers and managing access to file systems, we still need the support of the container infrastructure: it’s the only place where we can establish a strong form of identity that is valid across hosts.
Assuming the infrastructure provides us with a cluster-wide user identity, we need to do two things: make the identity available on the host and map all container access to this identity.
Cluster-Wide Identity is Possible After All
With the Quobyte Kubernetes volume plugin, we built a proof of concept for this approach. The plugin takes the namespace and service account name of a given container from the pod description and uses them as the identity required for file system access control (i.e. as the user and tenant accessing the file system volumes). It also configures Quobyte’s native FUSE client to map any access from the particular container to the given identity. In effect, any uid in the container (be it root or an arbitrary user) acts as the infrastructure-level identity when accessing the file system, and normal file system access control mechanisms (permissions and ACLs) can be used to control access data.
In effect, a container running as a service account alice will create files that are owned by alice and will be subject to access control with the container acting as user alice, both for files created from the container and for files that exist outside the container infrastructure.
Secure access control to storage is an essential feature of a production-grade container infrastructure. A proof of concept with Quobyte’s Kubernetes plugin shows how to solve the problem, but any solution of this kind needs a container infrastructure with a notion of a cluster-wide identity along with a volume driver that is able to establish the corresponding mappings.