Few disciplines generate more data than life sciences. In fact, striking a proper balance between storage performance, capacity, and cost can make or break an entire research organization. This holds especially true in genomics, where annual data volumes are rising exponentially. Statista shows a graph illustrating this, noting how the genomics field created 1 petabyte (PB) in the five years from 2009 to 2014. In the following five years (2015-2019), that number exploded by 20 times.
Based in Huntsville, Alabama, the HudsonAlpha Institute for Biotechnology understands the need for effective, scalable storage like few others. The non-profit organization continues to lead in its field, specializing in agricultural research as well as disease research in humans. Over time, HudsonAlpha has continued to grow its IT infrastructure to the extent that it now provides on-campus services to over 50 associated life science companies.
If we had a new project come online and we needed another petabyte of storage, we could literally order a couple of servers, put them online in a matter of minutes, and have that capacity available to the Quobyte file system. If we were dealing with storage appliances, those abilities would be a lot more constrained.
Richard Johnson – HudsonAlpha System Architect
HudsonAlpha has two primary storage systems, each sporting a capacity of approximately 4.5PB. One of these, used for production and “hot” projects, runs on the Quobyte software storage solution. The other, used for archiving, runs on a separate object storage appliance. Due to its large data volumes, HudsonAlpha does little with cloud-based storage, save for what the organization calls “very cold storage.”
“We embarked on proof-of-concept trials with a number of vendors,” says HudsonAlpha system architect Richard Johnson. “After testing for six months, what we discovered during that testing made us ultimately decide to go with Quobyte.”
Alternatives vs. Quobyte
According to Johnson, the problem with all-flash solutions was how quickly they became unaffordable when scaling to petabyte-class capacities. However, a less obvious problem was that every flash-based solution he examined was strictly based on NFS. As a result, even though the individual drives might offer exceptional performance, the throughput constraints introduced by the file system bottlenecked all single-stream processing. Ultimately, that was even more of a problem than price.
HudsonAlpha also found the ability to have a web-based user interface with real-time information beneficial. In the past, the group’s storage platform offered only rudimentary on-screen reporting and limited ability to monitor data conditions and manage them in real time. HudsonAlpha wanted more direct, immediate awareness and control over their data, and Quobyte supplied that.
The appliance-based platforms HudsonAlpha evaluated had limits on the number of enclosures, the number of drives per enclosure, where certain drive types had to reside, and the mix of media the platform could tolerate. Additionally, if users wanted to scale beyond the small number of enclosures supported by the appliance platform, another large infrastructure and licensing investment was required: a forklift upgrade. This made it infeasible to add resources on the scale of a few hundred terabytes. Rather, growth had to come in petabytes. All HudsonAlpha wanted was the ability to add another service to a cluster as needed and have that capacity immediately available.
Johnson notes, “Quobyte came at it from the standpoint of, ‘Show some more storage and we’ll give it to you.’”
Given the limitations of appliance-based options, HudsonAlpha was very enthusiastic about Quobyte’s ability to operate on any commodity, off-the-shelf platform. This agnosticism allows HudsonAlpha to invest in more performant hardware for production.
“When we worked with HudsonAlpha to design a storage solution able to meet all their objectives, everyone agreed it was important for them not to buy more than was necessary for today,” says Andrew E Gauzza III, System Architect, HudsonAlpha. “Making sure they weren’t locked into a particular platform would give them maximum control over costs, life cycle management, and the ability to procure what they needed when they needed it. Hardware agnosticism is what lets us make sure we can deliver the right server and configuration for a job and stay within budget.”
Not least of the factors pushing HudsonAlpha into a new storage platform, the genomics group constantly struggled against not having a way to talk directly to legacy parallel file system. Johnson describes having to run CLI commands to scrape output from the system — a time-intensive process that was far from intuitive.
In contrast, Quobyte was much more flexible with placement rules. For example, admins could specify which file types should and should not be placed on SSD media based on the workload and other factors. Quobyte offered a wealth of such factors while competitors were generally far more restrictive.
“We’ve been able to structure the volumes in a way that I can get project-level capacity information at a glance,” says HudsonAlpha’s Johnson. “I can get user-level capacity information. I have real-time access to how big people’s home directories are, which we heavily use. We have scripts that run all the time to gather information and help us plot it. We use it for chargeback reports and showback reports and all kinds of things. It’s incredibly powerful.”
This API flexibility is proving to be a valuable tool for day-to-day operations. Being able to accurately measure the resources consumed by a project allows HudsonAlpha to do chargebacks and enable cost recovery.
Conducting these platform comparisons and reaching the decision to move forward with Quobyte is fundamental to HudsonAlpha’s mission because storage is at the heart of every genomics activity. From cancer research to neonatal work to drought-resistant crop improvements, the work this organization does is beneficial around the world. As Johnson says, “if the storage doesn’t work, the research doesn’t work.”
HudsonAlpha Institute for Biotechnology is a nonprofit institute dedicated to innovating in the field of genomic technology and sciences across a spectrum of biological challenges. Their research and teaching missions are focused on genomics and genetics for truly individualized medicine, developing new and sustainable energy sources, and understanding the normal function of cells and organisms.In addition, HudsonAlpha is home to the Genome Sequencing Center – one of the few centers in the world that specializes in de novo eukaryotic whole genome sequencing, assembly and analysis – the CAP accredited and CLIA licensed Clinical Services Lab, and The Smith Family Clinic for Genomic Medicine.