DFS (distributed file system), as the name suggests, is a file system that is distributed across multiple file servers or multiple locations. Its primary purpose is to reliably store data or, more specifically, files.
A distributed system is composed of several servers connected via a computer network - like ethernet or the internet:
The opposite of a distributed system is a centralized system, e.g. a single server or storage appliance:
There are several advantages of a distributed system over a centralized one, depending on the abilities of the distributed (storage) system:
Scale-out refers to the ability of a system to scale certain dimensions when you add more components. A distributed system that works across multiple servers can scale out by adding more machines. Distributed systems with the proper architecture can scale to very large clusters with thousands of servers. Then, the distributed system aggregates the performance and capacity of all those servers to achieve results that are impossible to achieve with a centralized system or a single server. Thus, distributed file systems allow building very large systems that can not be built with centralized systems. And since there is a limit to how fast a single server can be or how much storage it can have, a centralized system can not be as fast or have as much storage capacity as a distributed file system.
Distributed file systems can handle larger loads than centralized systems because they can use different protocols than centralized systems. For example, distributed file systems can efficiently distribute very large loads across multiple servers. However, a centralized system, e.g., using NFS (network file system), suffers from performance bottlenecks because one single server needs to handle the entire load. Since NFS was designed for single server systems, it does not have load-balancing capabilities, which is why NFS is a bad choice for distributed systems. With distributed file systems, you would not have to worry about load-balancing because they should do this automatically.
Because distributed file systems can scale-out, there is no need for scale-up. Unlike scaling out, scaling up is making a component larger or faster to handle a greater load.
Leslie Lamport, one of the most accomplished researchers in distributed systems, famously said, "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." He later contributed one of the most significant algorithms for distributed systems: PAXOS, which helped to build fault-tolerant distributed systems.
Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of (or one or more faults within) some of its servers or disks. A fault-tolerant distributed system can handle failures of such components by spreading data across multiple machines. Thus a distributed system has much better availability and data durability than you can achieve with any centralized system.
When a distributed system is properly built, both scale-out and fault tolerance can help reduce its cost compared to a centralized system. When trying to scale-up, the cost exponentially increases. This is because your system might get to the point where it needs the best and most expensive resources like CPUs or network cards. Then, you also need to optimize those resources to obtain the best results for your system.
On the other hand, it is possible to scale-out with smaller and cheaper components and achieve better performance than a single super-fast computer with expensive hardware. That is possible because, in a distributed system, you get the sum of each component. Instead of relying on a sole device or component, distributed file systems benefit from the combined efforts of all their resources.
Also, because distributed systems are fault-tolerant, you do not need the most reliable hardware. In a distributed system, a component can fail, and the system takes care of it. However, for a centralized system, you need to buy the most expensive hardware in the hope of reducing failure because you need to make your system the most reliable possible at all costs.
In the context of storage, the challenge of a distributed file system or storage system is to redundantly store the data across multiple servers so that the outage of a single server doesn't lead to data loss or unavailability, and at the same time, guarantee consistency of data. The consistency part is the most tricky because the distributed file system must create the illusion of a centralized system. Consistency has been subject to extensive research; however, going into details of how to achieve consistency is outside of the scope of this article. If you would like to learn more about this topic, please make sure to check out the following articles written by our founders and developers:
Quobyte is a distributed file system that protects your data using synchronous replication and erasure coding. Quobyte's architecture is based on 15 years of research and development and, as a result, is able to give you linear scalability of performance and capacity up to 100s of petabytes.