What is a Distributed File System (DFS)?

DFS (distributed file system), as the name suggests, is a file system that is distributed across multiple file servers or multiple locations. Its primary purpose is to reliably store data or, more specifically, files.

A distributed system is composed of several servers connected via a computer network – like ethernet or the internet:

The opposite of a distributed system is a centralized system, e.g. a single server or storage appliance:

There are several advantages of a distributed system over a centralized one, depending on the abilities of the distributed (storage) system:

Scale-out

Scale-out refers to the ability of a system to scale certain dimensions when you add more components. A distributed system that works across multiple servers can scale out by adding more machines. Distributed systems with the proper architecture can scale to very large clusters with thousands of servers. Then, the distributed system aggregates the performance and capacity of all those servers to achieve results that are impossible to achieve with a centralized system or a single server. Thus, distributed file systems allow building very large systems that can not be built with centralized systems. And since there is a limit to how fast a single server can be or how much storage it can have, a centralized system can not be as fast or have as much storage capacity as a distributed file system.

Distributed file systems can handle larger loads than centralized systems because they can use different protocols than centralized systems. For example, distributed file systems can efficiently distribute very large loads across multiple servers. However, a centralized system, e.g., using NFS (network file system), suffers from performance bottlenecks because one single server needs to handle the entire load. Since NFS was designed for single server systems, it does not have load-balancing capabilities, which is why NFS is a bad choice for distributed systems. With distributed file systems, you would not have to worry about load-balancing because they should do this automatically.

Because distributed file systems can scale-out, there is no need for scale-up. Unlike scaling out, scaling up is making a component larger or faster to handle a greater load.

Fault tolerance

Leslie Lamport, one of the most accomplished researchers in distributed systems, famously said, “A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.” He later contributed one of the most significant algorithms for distributed systems: PAXOS, which helped to build fault-tolerant distributed systems.

Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of (or one or more faults within) some of its servers or disks. A fault-tolerant distributed system can handle failures of such components by spreading data across multiple machines. Thus a distributed system has much better availability and data durability than you can achieve with any centralized system.

Cost

When a distributed system is properly built, both scale-out and fault tolerance can help reduce its cost compared to a centralized system. When trying to scale-up, the cost exponentially increases. This is because your system might get to the point where it needs the best and most expensive resources like CPUs or network cards. Then, you also need to optimize those resources to obtain the best results for your system.

On the other hand, it is possible to scale-out with smaller and cheaper components and achieve better performance than a single super-fast computer with expensive hardware. That is possible because, in a distributed system, you get the sum of each component. Instead of relying on a sole device or component, distributed file systems benefit from the combined efforts of all their resources.

Also, because distributed systems are fault-tolerant, you do not need the most reliable hardware. In a distributed system, a component can fail, and the system takes care of it. However, for a centralized system, you need to buy the most expensive hardware in the hope of reducing failure because you need to make your system the most reliable possible at all costs.

The Challenge of Distributed Storage Systems

In the context of storage, the challenge of a distributed file system or storage system is to redundantly store the data across multiple servers so that the outage of a single server doesn’t lead to data loss or unavailability, and at the same time, guarantee consistency of data. The consistency part is the most tricky because the distributed file system must create the illusion of a centralized system. Consistency has been subject to extensive research; however, going into details of how to achieve consistency is outside of the scope of this article. If you would like to learn more about this topic, please make sure to check out the following articles written by our founders and developers:

  • B. Kolbeck, M. Högqvist, J. Stender, F. Hupfeld. “Flease – Lease Coordination without a Lock Server”. 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011). [PDF]
  • J. Stender, M. Högqvist, B. Kolbeck. “Loosely Time-Synchronized Snapshots in Object-Based File Systems”. 29th IEEE International Performance Computing and Communications Conference (IPCCC 2010). [PDF]
  • J. Stender, B. Kolbeck, M. Högqvist, F. Hupfeld. “BabuDB: Fast and Efficient File System Metadata Storage”. [PDF6th IEEE International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI 2010)
  • F. Hupfeld, B. Kolbeck, J. Stender, M. Högqvist, T. Cortes, J. Malo, J. Marti. “FaTLease: Scalable Fault-Tolerant Lease Negotiation with Paxos.” [PDF]. In: Cluster Computing 2009.
  • J. Stender, B. Kolbeck, F. Hupfeld, E. Cesario, E. Focht, M. Hess, J. Malo, J. Marti. “Striping without Sacrifices: Maintaining POSIX Semantics in a Parallel File System”. [PDF1st USENIX Workshop on Large-Scale Computing (LASCO ’08), Boston, 2008

Quobyte is a distributed file system that protects your data using synchronous replication and erasure coding. Quobyte’s architecture is based on 15 years of research and development and, as a result, is able to give you linear scalability of performance and capacity up to 100s of petabytes.

Learn More

Talk to Us

We are here to answer all of your questions about how Quobyte can benefit your organization.

Are you ready to chat? Want a live demo?