When considering how to build out your storage architecture, you need to consider both cost and performance, but most importantly, you need to know what type of enterprise storage to use to house your data. Enterprise storage systems are divided into three categories today: SAN, NAS, and Object. Like most storage systems, each has both advantages and disadvantages. So how do you know which one is right for you?
|Abstraction||Protocols||Random IO support||Typical applications|
|SAN||"virtual disks" (LUNs, block storage)||Fibre Channel, iSCSI, NVMe-over-Fabric||yes||Virtual machines, some databases|
|NAS||files and directories||NFS, CIFS, native file system clients||yes||Virtually any, including virtual machines, databases, analytics, machine learning|
|S3 / Object||write-once objects||S3 protocol, SWIFT||no, only full object overwrites||Limited, mostly archival|
Storage area network storage, otherwise known as SAN, or block storage, refers to block-based storage accessible over a network. SAN uses the same abstraction as a hard drive where blocks of data can be read or written at a specific location, which is why it’s called block-based storage. Most applications require a file system on top to organize the data stored on the block storage. The exceptions are a few databases and virtual machines that directly consume block storage. Unlike direct-attached storage (DAS), a SAN is accessed over the network. The protocols used are Fibre Channel, iSCSI e.g., over Ethernet, or NVMe over fabric.
The major downside of SAN is that you need a file system on top that needs to be exported to storage consumers. Needing a file system on top really sounds like something Monty Python’s “Royal Society for Putting Things on Top of Other Things” would do. All jokes aside, when one or multiple head nodes export the file system it often introduces a significant performance bottleneck, e.g., when exporting the file system via NFS.
NAS stands for network-attached storage, but in actuality, it's a file system storage that is accessed over a network. The significant advantage of NAS storage is that applications and users can directly use the file system. There’s no extra layer needed. However, the term NAS doesn't say anything about the storage architecture behind the file system that you see.
For example, a simple Linux server exporting local storage via NFS is considered NAS. However, this kind of NAS storage is monolithic, can only be scaled up (i.e., add more drives into a single box), and doesn't offer a lot of fault tolerance. SOHO (small office home office) NAS boxes or so-called filers (monolithic enterprise NAS appliances) are other examples of this. In contrast, a scale-out NAS system has an architecture where you can add more servers or boxes to increase capacity, and ideally, also performance.
The second aspect of NAS storage is the primary access protocol. Many NAS systems use NFS - a protocol invented around the year the movie “Back to the Future” premiered! NFS was designed for clients accessing a single server. NFS is dated and has severe limitations in terms of performance, security, and fault tolerance. Many true scale-out systems rely on a native protocol for parallel IO and avoiding NFS bottlenecks. That is how you can identify a true scale-out NAS: It doesn't use NFS as it's primary access protocol.
Now enters the new kid on the block - object storage. Object storage, also known as object-based storage, is the last category of enterprise storage. It is a data storage strategy that sections data into distinct units or objects and stores them in isolated buckets with all relevant metadata and a custom identifier. Object storage has a flat namespace, as opposed to file systems, or a NAS, which have a hierarchical folder structure. Since Amazon invented it to provide cost-effective storage for large amounts of data, the Amazon product name S3 is used synonymously with the term object storage.
There are two main differences between object storage and both SAN and NAS. The first difference is consistency. Both SAN and NAS provide strong consistency, so when you write to a file or block, you have the guarantee that the next read will return the latest data you wrote to the file or block. This consistency model is very intuitive, and most applications rely on it. On the other hand, object storage has very relaxed consistency guarantees (also called eventual consistency), which in reality means that you have no guarantees. Your read might return any value previously written to the object, so applications have to be able to cope with this; therefore, object storage is mainly used for write-once data or archival only. You can find out more about the differences between file and object here.
The second major difference is protocol. Object storage is accessed via the HTTP protocol - the same protocol your web browser uses to request the page you are reading right now. This makes it easy to access object storage from a variety of applications. However, HTTP was never designed for speed or efficiency, whereas SAN and NAS protocols are all about performance.
The performance tier describes the IOPS (input/output operations per second) and throughput your managed storage system has. Often, these terms are misused to describe the different performance and cost tiers of storage. SAN is used to describe expensive low-latency storage, NAS for general purpose, mid-performance tier, and object storage is synonymous with "cheap and deep." Unfortunately, this is largely based on historical attributes of the storage tiers. Today, NAS storage is as low latency as SAN and as scalable as object storage.
Honestly, no. The challenge in enterprise storage today are scale-out workloads like BigData analytics (Hadoop, Spark, etc.), machine learning, image analysis, and 3D rendering, just to name a few. These workloads require an unprecedented amount of storage capacity and also scalable performance. Rather than the access protocol, the question today is if your storage system can scale-out performance and capacity without bottlenecks and is on-demand to match the needs of your users and applications.
Similarly, the differentiation through access protocols has become less important as good storage systems let your users access the same data through a range of protocols. An example of this is the Hadoop cluster. A Hadoop cluster should be able to access the same data as your machine learning applications or workstations.
There are better ways to determine the right storage system for you. For example, you should always consider these three main features:
There’s no official right or wrong answer when choosing your enterprise storage solution; there is just the right answer for you and your needs. Quobyte gives you the scale-out for modern workloads and all protocols to serve a broad range of applications while providing the ability to combine flash+HDD for performance and cost.
Learn more about Quobyte - a distributed scale-out file system that runs on commodity servers with flash and hard disks.