What is a Parallel File System?

The term parallel file system is used in two ways with very different meanings. The first use refers to the ability to do IO in parallel to multiple servers. The second use of the term is mostly in high-performance computing (HPC) and refers to specific IO patterns.

Parallel IO

Parallel IO means that a client accessing storage can directly access several storage servers in parallel to take advantage of the aggregated bandwidth of multiple servers. Often, parallel IO also removes bottlenecks like NFS gateways and improves load distribution. This use of the term parallel IO is often associated with pNFS (short for parallel NFS). Most high-performance or scale-out file systems offer parallel IO.

The opposite of a parallel file system is when a client talks to a single server or gateway. Any NFS-based system (except those that explicitly offer pNFS) is such a centralized storage system.

A file system with parallel IO is a must-have for demanding throughput workloads such as 4k video streaming/transcoding/editing, image processing, or big data analytics workloads, just to name a few. However, small file workloads also benefit from the direct communication of the client with the servers that have the data, rather than going through an NFS gateway that adds another network hop in latency.

Parallel read/write

In high-performance computing, parallel file systems allow distributed applications to read and -more importantly- write to a single file from many clients simultaneously without locking, i.e., in parallel. This very specific IO pattern is mainly found in research and is often associated with MPI. If you don't know what MPI or MPI-IO is, chances are high that you don't need a parallel file system in the HPC sense.

Quobyte is a distributed parallel file system and offers parallel IO. Learn more about how Quobyte can help you achieve scalable performance or make your HPC cluster more reliable.

Leave Us Your Feedback!
Leave Us Your Feedback About This Article: