DFS (distributed file system) as the name suggests, is a file system that is distributed on multiple file servers or multiple locations. Its primary purpose is to reliably store data or, more specifically files.
Let's start with the distributed system part. The opposite of a distributed system is a centralized system, e.g. a single server or storage appliance. A distributed system is composed of several servers that are connected via a computer network - like ethernet or the internet.
There are several advantages of a distributed system over a centralized one, depending on the abilities of the distributed (storage) system:
A distributed system that works across multiple servers can scale out by adding more machines. This is known as scaling out. Distributed systems with the right architecture can scale to very large clusters with thousands of servers. This is impossible to do with a centralized system, or a single server, and be as fast as or has as much storage capacity as DFS.
Unlike scaling out, scaling up is making a component larger or faster to handle a greater load.
Leslie Lamport, one of the most accomplished researchers in distributed systems, famously said "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable". He later contributed the most significant algorithms which helped to build fault tolerant distributed systems.
Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of (or one or more faults within) some of its servers or disks. A fault tolerant distributed system is able to handle failures of such components by spreading data across multiple machines. Thus a distributed system as such has much better availability and data durability than you can achieve with any centralized system.
In the context of storage, the challenge of a distributed file system or storage system is to redundantly store the data across multiple servers so that the outage of a single server doesn't lead to data loss or unavailability, and at the same time guarantee consistency of data. The consistency part is the most tricky part because the distributed file system must create the illusion of a centralized system. This has been subject to extensive research in computer science, some of which has been done by our own founders and developers which you can find here:
Quobyte is a distributed file system that protects your data using synchronous replication and erasure coding. Quobyte's architecture is based on 15 years of research and development and as a result, is able to give you linear scalability of performance and capacity up to 100s of petabytes.