Gartner defines Scalability as the measure of a system’s ability to increase or decrease in performance and cost in response to changes in application and system processing demands. Considering this is essential when choosing a storage solution as prioritizing it from the start leads to lower maintenance costs, better user experience, and higher agility. But before you can make a decision, you need to first understand the differences in how you can scale your solution.
Scale-out refers to the ability of a system to scale certain dimensions when you add more components. In a storage or file system - sometimes also called scale-out NAS (network-attached storage) - these components are hard drives (hard disk, NVMe) and servers. The more interesting part are the dimensions to scale in a storage system when you add more components:
A proper scale-out file system should scale in all of these dimensions. For example, suppose you can only scale the capacity, like with many storage appliances. In that case, you will often run out of performance for the applications that want to access the growing amount of storage. Most use-cases and applications grow performance and capacity in lockstep; however, archival storage is one of the few exceptions.
Another important aspect of a scale-out NAS system is determining how far it can actually scale. All distributed systems have limits regarding the number of servers and/or drives they can have. Good systems have limits in the thousands or tens of thousands of servers, so those limits are more of a theoretical issue.
Other systems, especially those where scale-out was added later, have much lower limits, like 16 servers. You might say 16 servers are enough for you today but are you ready to move to a new system when you go to 17? Or even worse, start a new cluster that is completely independent?
Similarly, it's essential to look for practical scalability limits, which might be much lower than what the theoretical limit says. Some examples are file or storage systems that rely on so-called "consistent hashing" to determine the location of data. Whenever the storage cluster changes (outage, new or removed server), the data needs to be moved. The more servers, the more outages or failures you'll see, which causes the clusters to become unstable and results in higher latencies and partial unavailability.
Linear scaling is the term or feature to look out for. It means that when you double the resources, you also double the performance dimensions. Linear scaling also means that you double your performance when you go from 4 to 8 or from 100 to 200. There are no diminishing returns on the performance as the system scales.
The file system itself should have the ability to scale linearly, but also the access layer should be able to scale linearly with the performance. If you use a protocol that doesn't have native support for parallel IO and load balancing, like NFS, your access layer and gateway nodes will quickly become a bottleneck. So what use is a scalable storage system when clients will cause congestion at the NFS gateways?
Finally, the question is how the resources you add benefit file systems and users on the system. Ideally, the new resources and their performance should increase the performance of all file systems (sometimes called exports or shares) on the storage system. However, if you have a file system where resources are pre-allocated to a specific file system, then your ability to scale is not uniform. This is a big issue because you have to add significantly more resources to scale all file systems.
The lack of thin provisioning, over-subscription, and a file system requiring pre-allocation of storage resources (drives, groups, or servers) to a single file system are all warning signs that the file system will not allow you to scale-out uniformly. Most block-based distributed file systems lack thin provisioning and have very static resource allocations. That makes uniform scale-out a complex and manual task or an impossible one.
Quobyte is a distributed scale-out file system that scales performance and capacity linearly and uniformly with the number of resources.