Fault-Tolerant Container Infrastructures – The Essentials
Let’s say you wanted to build a reliable and efficient container-based infrastructure like Google’s. The best way to go is to turn your Linux servers into one fault-tolerant software-based machine. Your first ingredient is a container scheduler like Mesos or Kubernetes. It starts containers and keeps them running – thereby breaking the limits of individual machines. Your second ingredient is a fault-tolerant distributed file system – like Quobyte. It must be able to serve application data to any host the container is scheduled to.
Putting the two together, you get a fault-tolerant application infrastructure. When machines fail, the scheduler will detect it and schedule the application container elsewhere – resources and constraints permitting. The distributed file system makes sure that the application data is accessible from the new host.
Communication Gone Awry: Zombie Containers
Distributed systems come with a bunch of intricacies they need to deal with. One of them is that applications that work with data often need exactly one instance running – otherwise the lack in coordination between instances might lead to data corruption. Even if we configure our scheduler accordingly, it might falsely believe a container instance to be dead. Machines may be overloaded or have intermittent network connectivity problems. In that case the scheduler thinks that the machine and all its hosted containers are gone. Consequently, it’ll start rescheduling new instances across the cluster. The result are zombie containers: the scheduler assumes them to be dead, while they actually still go about and do their job.
The Consequences of Zombification
That case violates the uniqueness condition and we end up with two application instances talking to the same data storage. If the two instances start writing to the same file, we face different failure scenarios: for one, we may end up with both of them writing to the same file resulting in a mixture of data. For another, either the one or the other write succeeds, messing up the data consistency. In the final and worst case, we lose the entire file.
Imagine the case of a MySQL database that holds all your users’ information. Either of the three cases just discussed would be disastrous. For example, a simultaneous write from both instances wrecks the entire database or an integral part of it (if partitioned). Data loss ensues.
Containing the Threat: Implicit Locking
There are several ways to contain the threat. Some apply at scheduler level – like leases that are well-timed with restart delays. Others apply in collaboration with the application – like instance versioning. Quobyte takes a different route and adds implicit locking into the mix. It’s a tool that works particularly well with all applications that deal with data in files, like in our example at hand (which you can generalize to databases as such).
By now you probably think that file system interfaces like POSIX already support coordinating concurrent access to data via locks. So what’s new? Well, since many applications have been designed without the failure modes of distributed systems in mind, these locks can rarely be used. With implicit locking, we added a (fully configurable) mechanism to Quobyte that acquires application-bound locks for any file it opens and keeps the lock active as long as the application is alive. This ensures that when a second application instance is started to operate on the same data, it will be blocked when trying to acquire a lock. Or, depending on configuration, it gets an IO error.
Of course, there are topics around file opening order that require more and careful attention. But implicit locking solves the problem of possible data corruption due to zombie containers for many legacy applications. And it does so in a completely unintrusive way.