Keys to Building Really Scalable Storage

Quobyte offers tips for high-capacity infrastructures capable of handling billions of files and hundreds of petabytes

September 18, 2018

Santa Clara, CA · September 18, 2018 — Hyperscale data centers need enormous capacities for high-performance computing, artificial intelligence, big data analytics, containerized infrastructures, and other challenging modern workloads. Environments with billions of files and hundreds of petabytes delivering 24/7 applications all require large, complex, future-capable storage installations.

Quobyte® Inc., a leading developer of modern storage system software used by global-scale companies, offers the following characteristics of massive storage infrastructures that still maintain performance and manageability for demanding workloads – and companies of all sizes can benefit from these tips for building scalable storage.

Keep it on File
Block- and object-based platforms or products can’t match the flexibility of file-based systems, especially if performance is a consideration. Block storage as used in traditional systems worked well when only a handful of machines shared a common resource, but performs poorly as it scales, and becomes too complex to administer. Object storage is attractive for its ability to grow to millions if not billions of objects, however, today’s file systems have this scalability too. While object systems succeed for archival data, they become troubled in primary workloads needing high IOPs and low latency, particularly small-file workloads. Some applications have difficulty interfacing with the specialized protocol of object storage without a performance-robbing gateway, which are impractical in large environments.

United we Scale
Conventional methods of scaling create storage server sprawl problems and silos, hindering resource and data accessibility by users and applications. In contrast a “unified” approach to storage consolidates heterogeneous communications protocols into a unified pool where data is accessible from and between Linux, Windows, or Mac systems, via NFS, SMB or S3. Unified storage platforms serve legacy applications as well as new, and are effective for environments with traditional and modern workloads. For example, a Windows user can edit a large file at the same time a Mac user is reading the same file without having to copy or move it to another system, or users can easily share a data set across the globe via S3.

Open Sesame
Open-source, open computing ecosystems are the preferred choice of hyperscale environments because they boast innovation, affordability, and integratabilty. The largest data centers in the world use OpenStack to manage compute, storage, and networking economically, so storage platforms should be fully functional with OpenStack – and vice versa. The storage should also support important open-source projects, interfaces, and components such as Cinder for incorporating block-based devices, Manilla for shared files, Glance for images, and Keystone for authentication.

No Hidden SPOFs
Fault-tolerance is a must in a large infrastructure with vast numbers of hardware and software systems. In addition to resiliency and redundancy with no Single Points of Failure (SPOFs), it must be extremely simple to locate and swap out any failed system: broken or misbehaving switches, faulty NICs, even bad network cables that can result in packet loss or corruption. Hidden SPOFs also include partial outages that cause “split-brain” scenarios where it’s unclear which version of the data set is correct and up-to-date. Preventing hidden SPOFs usually requires software capable of performing verification and consistency checks as part of its data protection features and automatically handling disk and node failures.

Low Maintenance
IT staff time is a costly resource in hyperscale infrastructures. Data loads may double or triple from one year to the next, but staff and budget will rarely double or triple. High-capacity storage must be “low-touch” to keep management and maintenance to a minimum. Highly automated systems, self-monitoring/self-healing features unburden administrators tasked with running large-scale installations, and allow even small teams to manage tens to hundreds of petabytes.

“Hyperscale data centers need the size, the speed, but also the ease of use to deliver storage and data services at staggering volumes,” said Bjorn Kolbeck, Quobyte co-founder and CEO. “Lessons learned from the world’s largest users show us how to build infrastructures with a near-limitless ability to scale out in the future, no matter what your needs are today.”

Follow Quobyte

About Quobyte

Building on a decade of research and experience with the open-source distributed file system XtreemFS and from working on Google’s infrastructure, Quobyte delivers on the promise of software defined storage for the world’s most demanding application environments including High Performance Computing (HPC), Media & Entertainment (M&E), Life Sciences, Financial Services, and Electronic Design Automation (EDA). Quobyte uniquely leverages hyperscaler parallel distributed file system technology to unify file, block, and object storage. This allows customers to easily replace storage silos with a single, scalable storage system — significantly saving manpower, money, and time spent on storage management. Quobyte allows companies to scale storage capacity and performance linearly on commodity hardware while eliminating the need to expand administrative staff through the software’s ability to self-monitor, self-maintain, and self-heal.

Please visit for more information.


Media Contact

Judy Smith
JPR Communications
T: +1 818-798-1475

Victoria Koepnick
Quobyte Inc.
T: +1 650-564-3111