In a recent post, we discussed the technological advances happening around protein folding, crystallography, atomic-resolution visualization, and modeling. Success in this space carries the promise of life sciences labs being able to study existing, unmapped proteins with unprecedented speed, which in turn assists new predictive modeling techniques. Collectively, these advances may lead to cures to terrible diseases that have afflicted mankind for centuries.
As noted in that prior post, cryo-electron microscopy (cryo-EM) of a single protein at raw, atomic-scale resolution can generate hundreds of terabytes for storage. Performing machine learning (ML)-driven analysis and modeling on this data requires storage with sufficient capacity and throughput to make workloads feasible on a time scale that won’t bottleneck project workflow. Said differently, if large-scale image processing pipelines are not kept sufficiently filled with data, system cost-effectiveness and viability break down.
Cryo-EM is the backbone of next-generation protein visualization. As noted, though, cryo-EM generates prodigious data quantities. Quobyte can help accelerate cryo-EM workflows and simplify storage operations in five key ways.
#1: Linear, Scale-Out Architecture
Conventional large-scale storage solutions often employ scale-up architectures. This can seem economical at the point of deployment, but as data needs continue to grow, increasing storage density can become cost-prohibitive, and fabric limitations tend to bottleneck performance scaling. Appliance solutions do not scale linearly. Moreover, their proprietary design makes both capacity and performance increases disproportionately expensive.
Quobyte is a true scale-out storage system based on commodity hardware rather than proprietary designs. With Quobyte, performance scales linearly with the number of storage servers. For example, if a cryo-EM analysis cluster added general-purpose GPU (GPGPU) acceleration and created an imbalance wherein data processing continually wasted cycles waiting for storage to supply more data, Quobyte users would only need to add additional storage nodes to scale performance, not a completely new, higher-end storage system. Essentially, labs can add more disks whenever needed without additional, unneeded resources.
#2: Flexible, Tiered Media for Cost Optimization
Applications such as cryo-EM can easily scale into petabytes of storage for a single project. At the same time, potentially large segments of that total data must be available for analysis at extremely high speeds. Hard disk remains the medium of choice for reasonably performant mass storage, but Quobyte adds an NVMe-based flash storage tier above this for keeping the data pipeline to GPUs full. The amount of NVMe flash and hard disk storage can be optimized for specific applications and workloads for maximum cost efficiency. This tiered approach provides cryo-EM cluster solutions with just the right amount “hot” storage needed for analytics and modeling while simultaneously providing ample nearline and long-term storage at an attractive per-terabyte price point.
#3: 24/7 Uptime and Easy Maintenance
Life science labs can only drive return on their equipment investments when that equipment is in use. Every time sysadmins take down storage infrastructure for upgrades, patching, or other servicing takes another slice out of ROI. Quobyte’s integral redundancy and robustness ensures around-the-clock operation. Admin tasks can be done at any time, whether scheduled or ad hoc, without disrupting users or their applications. This gives organizations far more flexibility in how they choose to perform admin operations and schedule their IT labor.
#4: Deeply Powerful Storage Made Simple
As in so many other fields, life sciences groups are in the business of analysis and modeling — running systems, not managing them. Many petascale storage systems evolved from decades-old architectures, and the software that developed for them tend to resemble layers of bandages more than a cohesive, optimized experience. This has led to life sciences organizations needing to retain IT staff with unusually specialized expertise, which, in turn, can result in shortages of qualified (never mind relatively expensive) support.
Quobyte was designed from the outset to enable dynamic, hyperscale functionality, but the platform hides this robustness within user-space software that emphasizes automated operations and an easy user interface. If users know how to manage Linux servers, they can go from installation to running Quobyte storage within minutes. Quobyte is just another Linux application ready to download, install, and manage, whether on-site or remote.
#5: Strong Data Protection and Security
Over the last few years, spending on data security has continued to compound at anywhere from 10% (IDC) to 16% (MarketsandMarkets). The need for keeping data safe from theft continues to rise alongside growth of total data volumes, and life sciences data, where even one advance can be worth many billions of dollars, is no exception. Quobyte ensures data protection on two fronts. First, the platform employs end-to-end checksums to verify that the data at one end of a communication is the same exact data that arrives at the other end. (This guards against random bit errors as well as intentional tampering). Second, Quobyte uses government-grade encryption for all data, whether at-rest or in-transit. With encryption, any intercepted data registers as gibberish to unauthorized third parties.
Cryo-EM and similar breakthroughs now enable long-awaited advances in protein folding and the broader life sciences. A robust, cost-effective storage platform like Quobyte will allow researchers to retain data at full resolution for the most accurate results and faster processing of petascale projects.