The Battle is On: SAN vs. NAS vs. Object

When considering how to build out your storage architecture, you need to consider both cost and performance, but most importantly, you need to know what type of enterprise storage to use to house your data. Enterprise storage systems are divided into three categories today: SAN, NAS, and Object. Like most storage systems, each has both advantages and disadvantages. So how do you know which one is right for you?

Abstraction Protocols Random IO support Typical applications
SAN "virtual disks" (LUNs, block storage) Fibre Channel, iSCSI, NVMe-over-Fabric yes Virtual machines, some databases
NAS files and directories NFS, CIFS, native file system clients yes Virtually any, including virtual machines, databases, analytics, machine learning
S3 / Object write-once objects S3 protocol, SWIFT no, only full object overwrites Limited, mostly archival

SAN - Storage Area Network

Storage area network storage, otherwise known as SAN, or block storage, refers to block-based storage accessible over a network. SAN uses the same abstraction as a hard drive where blocks of data can be read or written at a specific location, which is why it’s called block-based storage. Most applications require a file system on top to organize the data stored on the block storage. The exceptions are a few databases and virtual machines that directly consume block storage. Unlike direct-attached storage (DAS), a SAN is accessed over the network. The protocols used are Fibre Channel, iSCSI e.g., over Ethernet, or NVMe over fabric.

The major downside of SAN is that you need a file system on top that needs to be exported to storage consumers. Needing a file system on top really sounds like something Monty Python’s “Royal Society for Putting Things on Top of Other Things” would do. All jokes aside, when one or multiple head nodes export the file system it often introduces a significant performance bottleneck, e.g., when exporting the file system via NFS.

Storage area network (NAS) diagram

NAS - Network-Attached Storage

NAS stands for network-attached storage, but in actuality, it's a file system storage that is accessed over a network. The significant advantage of NAS storage is that applications and users can directly use the file system. There’s no extra layer needed. However, the term NAS doesn't say anything about the storage architecture behind the file system that you see.

For example, a simple Linux server exporting local storage via NFS is considered NAS. However, this kind of NAS storage is monolithic, can only be scaled up (i.e., add more drives into a single box), and doesn't offer a lot of fault tolerance. SOHO (small office home office) NAS boxes or so-called filers (monolithic enterprise NAS appliances) are other examples of this. In contrast, a scale-out NAS system has an architecture where you can add more servers or boxes to increase capacity, and ideally, also performance.

The second aspect of NAS storage is the primary access protocol. Many NAS systems use NFS - a protocol invented around the year the movie “Back to the Future” premiered! NFS was designed for clients accessing a single server. NFS is dated and has severe limitations in terms of performance, security, and fault tolerance. Many true scale-out systems rely on a native protocol for parallel IO and avoiding NFS bottlenecks. That is how you can identify a true scale-out NAS: It doesn't use NFS as it's primary access protocol.

Object Storage

Now enters the new kid on the block - object storage. Object storage, also known as object-based storage, is the last category of enterprise storage. It is a data storage strategy that sections data into distinct units or objects and stores them in isolated buckets with all relevant metadata and a custom identifier. Object storage has a flat namespace, as opposed to file systems, or a NAS, which have a hierarchical folder structure. Since Amazon invented it to provide cost-effective storage for large amounts of data, the Amazon product name S3 is used synonymously with the term object storage.

There are two main differences between object storage and both SAN and NAS. The first difference is consistency. Both SAN and NAS provide strong consistency, so when you write to a file or block, you have the guarantee that the next read will return the latest data you wrote to the file or block. This consistency model is very intuitive, and most applications rely on it. On the other hand, object storage has very relaxed consistency guarantees (also called eventual consistency), which in reality means that you have no guarantees. Your read might return any value previously written to the object, so applications have to be able to cope with this; therefore, object storage is mainly used for write-once data or archival only. You can find out more about the differences between file and object here.

The second major difference is protocol. Object storage is accessed via the HTTP protocol - the same protocol your web browser uses to request the page you are reading right now. This makes it easy to access object storage from a variety of applications. However, HTTP was never designed for speed or efficiency, whereas SAN and NAS protocols are all about performance.

Performance Tiers

The performance tier describes the IOPS (input/output operations per second) and throughput your managed storage system has. Often, these terms are misused to describe the different performance and cost tiers of storage. SAN is used to describe expensive low-latency storage, NAS for general purpose, mid-performance tier, and object storage is synonymous with "cheap and deep." Unfortunately, this is largely based on historical attributes of the storage tiers. Today, NAS storage is as low latency as SAN and as scalable as object storage.

Are these storage categories still helpful today?

Honestly, no. The challenge in enterprise storage today are scale-out workloads like BigData analytics (Hadoop, Spark, etc.), machine learning, image analysis, and 3D rendering, just to name a few. These workloads require an unprecedented amount of storage capacity and also scalable performance. Rather than the access protocol, the question today is if your storage system can scale-out performance and capacity without bottlenecks and is on-demand to match the needs of your users and applications.

Similarly, the differentiation through access protocols has become less important as good storage systems let your users access the same data through a range of protocols. An example of this is the Hadoop cluster. A Hadoop cluster should be able to access the same data as your machine learning applications or workstations.

There are better ways to determine the right storage system for you. For example, you should always consider these three main features:

  • Scale-out
    Can you scale performance and capacity? Does the system scale linearly, or do you have diminishing returns?
  • Unified Storage
    Can your users and applications access and share data through many protocols? This allows you to avoid having many storage silos and reduces wasted resources.
  • Storage Media
    Can you combine flash and hard drives in the same storage system and cluster? What kind of performance and costs are associated with the combination of flash (high performance) and hard drives (cost-effective)?

There’s no official right or wrong answer when choosing your enterprise storage solution; there is just the right answer for you and your needs. Quobyte gives you the scale-out for modern workloads and all protocols to serve a broad range of applications while providing the ability to combine flash+HDD for performance and cost.

Learn more about Quobyte - a distributed scale-out file system that runs on commodity servers with flash and hard disks.

Leave Us Your Feedback!
Leave Us Your Feedback About This Article: