What is a Distributed File System (DFS)?

DFS (distributed file system) as the name suggests, is a file system that is distributed on multiple file servers or multiple locations. Its primary purpose is to reliably store data or, more specifically files.

Let's start with the distributed system part. The opposite of a distributed system is a centralized system, e.g. a single server or storage appliance. A distributed system is composed of several servers that are connected via a computer network - like ethernet or the internet.

There are several advantages of a distributed system over a centralized one, depending on the abilities of the distributed (storage) system:

Scale-out

A distributed system that works across multiple servers can scale out by adding more machines. This is known as scaling out. Distributed systems with the right architecture can scale to very large clusters with thousands of servers. This is impossible to do with a centralized system, or a single server, and be as fast as or has as much storage capacity as DFS.

Scale-up

Unlike scaling out, scaling up is making a component larger or faster to handle a greater load.

Fault tolerance

Leslie Lamport, one of the most accomplished researchers in distributed systems, famously said "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable". He later contributed the most significant algorithms which helped to build fault tolerant distributed systems.

Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of (or one or more faults within) some of its servers or disks. A fault tolerant distributed system is able to handle failures of such components by spreading data across multiple machines. Thus a distributed system as such has much better availability and data durability than you can achieve with any centralized system.

The Challenge of Distributed Storage Systems

In the context of storage, the challenge of a distributed file system or storage system is to redundantly store the data across multiple servers so that the outage of a single server doesn't lead to data loss or unavailability, and at the same time guarantee consistency of data. The consistency part is the most tricky part because the distributed file system must create the illusion of a centralized system. This has been subject to extensive research in computer science, some of which has been done by our own founders and developers which you can find here:

  • B. Kolbeck, M. Högqvist, J. Stender, F. Hupfeld. “Flease - Lease Coordination without a Lock Server”. 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011). [PDF]
  • J. Stender, M. Högqvist, B. Kolbeck. “Loosely Time-Synchronized Snapshots in Object-Based File Systems”. 29th IEEE International Performance Computing and Communications Conference (IPCCC 2010). [PDF]
  • J. Stender, B. Kolbeck, M. Högqvist, F. Hupfeld. “BabuDB: Fast and Efficient File System Metadata Storage”. [PDF] 6th IEEE International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI 2010)
  • F. Hupfeld, B. Kolbeck, J. Stender, M. Högqvist, T. Cortes, J. Malo, J. Marti. “FaTLease: Scalable Fault-Tolerant Lease Negotiation with Paxos.” [PDF]. In: Cluster Computing 2009.
  • J. Stender, B. Kolbeck, F. Hupfeld, E. Cesario, E. Focht, M. Hess, J. Malo, J. Marti. “Striping without Sacrifices: Maintaining POSIX Semantics in a Parallel File System”. [PDF] 1st USENIX Workshop on Large-Scale Computing (LASCO '08), Boston, 2008

Quobyte is a distributed file system that protects your data using synchronous replication and erasure coding. Quobyte's architecture is based on 15 years of research and development and as a result, is able to give you linear scalability of performance and capacity up to 100s of petabytes.

More Articles About Enterprise Storage