When designing storage solutions for your users, you may come across one challenge: Storage isolation. How do you ensure user A is not seeing or deleting data from user B?
There are two strategies: First, you can build physically separated customer environments, including storage targets. And second, you can rely on a logical layer of separation and share storage resources.
In this article we will compare both approaches so you can understand and profit from the benefits of both approaches.
Storage Isolation – Physically Separated Environments
Consider the following simple video encoding workflow:
As you can see, the infrastructure pattern is exactly the same. There is only one small difference: The storage target is not the same one. And there is a very good reason for that. You do not want to deliver test pictures (probably cat videos) to your users when they are paying for the latest blockbuster.
Needing two different storage targets gives you two choices. The first choice is to use separate storage clusters, which provides a very good level of storage isolation, but very costly. And the second choice is to use a storage solution that can identify and differentiate between production and development clients.
Storage Isolation – Logical Layer Separation
With the second choice, NFS comes to an end. The access control for NFS is usually based on IP addresses and that is something you cannot rely on in a containerized world. Also, from a performance point of view, NFS is not the best choice. In the second stage of our example, the transcoding process, you will only be fast if a large number of clients can access the storage in parallel. NFS suffers from bottlenecks because all requests must traverse a single node.
The solution to NFS bottlenecks is to use a parallel file system. With a parallel file system, you benefit from distributed read and write traffic across the whole storage cluster.
Let’s summarize what we just discussed. To be able to support modern transcoding pipelines, your storage should:
- Support multi-tenancy for security reasons
- Be a parallel storage system for performance reasons
There are some things on the storage system side that could make your life easier. Storage provisioning should be doable from within Kubernetes. This way, your storage provisioning can be automated as well. And of course, It would be nice to not have to run S3-proxy within Kubernetes. What if your storage provider already included it?
We can say that importing content is a solved exercise. We can also say that reducing complexity is always a good idea. So, if that is the case, why not use the very same Object Storage for delivering content? That simplifies things again. Let’s paint it in nice Quobyte colors:
Storage Isolation Options
First, you have the traditional way. You can use a separate testing infrastructure targeting an isolated storage system. Plus, a production stack that is running in a different place. On the “pro” side of it, you obtain strong layers of isolation. On the “con” side, you will have the costs for two storage solutions. Also, the costs for two infrastructure environments (like Kubernetes clusters). And always, if you need a new environment, you will need a new storage cluster.
The other alternative is to rely on logical separation. You can run different workloads isolated in Kubernetes namespaces. You can also separate storage access on a logical level. If a user is mounting a development storage unit (a.k.a. volume) they are forced to use authentication. With that authentication, there is no chance to access production storage at all, simply because they belong to different tenants.
Built on that foundation, you can scale production workloads and replicate infrastructure as needed.