Overcoming storage obstacles to continue the advancement of life sciences research.
On April 28, 1990, a New York Times headline proclaimed, “10,000 Are Expected to Lose Jobs to Spotted Owl.” The U.S. Forest Service predicted job losses of up to roughly 150,000 as conservationists faced off against the lumber industry. Over the years, the owl became a symbol of the struggle between progressive and legacy industry agendas in a battle where there were no winners. After three decades, though, researchers at Oregon State University devised a way to harness big data analytics and cutting-edge computing infrastructure and bring constructive, practical guidance to both sides.
The project arose within the USDA Forest Service, which had been working on methods to help protect Northwest owls since the 1990s. The Forest Service teamed with OSU scientists and devised a plan. The group deployed roughly 1,500 autonomous recording units across a large forest area. The units recorded audio around the clock every day. Scientists periodically collected the data. It took a year and a half for researchers to devise an algorithm able to parse the audio and identify different animal species. At first, the system could identify seven. Today, the system can identify about thirty, distinguish male from female, and even spot behavioral changes within a species over time.
“That project generates about 250 terabytes of data every month or so,” Christopher Sullivan, assistant director for biocomputing at the OSU Center for Quantitative Life Sciences (CQLS) and author of the animal-tracking algorithm. “That’s also image data, because we create spectrograms from the audio and process those spectrograms through a convolutional neural net we made to do species identification. We scale to about a petabyte of data, so we keep taking data off and reusing the space. Otherwise, we would just infinitely be buying storage.”
Obviously, processing up to a petabyte of data can consume massive, specialized resources. Compute work happens on IBM Power System AC922 servers, collectively containing more than 6,000 processors across 20 racks in two server rooms to service 2,500 users. The AC922 architecture puts AI-optimized GPU resources directly on the northbridge bus, much closer to the CPU than conventional server architectures. The trick was coming up with a file system and storage solution able to keep massive datasets close to compute resources. Swapping data in and out with external scratch resources doubled processing time. With so many departments and groups depending on Sullivan’s team — not to mention the owls — that was time no one could afford.
Quobyte was our path to faster deployment of user file space and tenancy... This is what gives us time to do research.
OSU team member replaces SD card in devices used to record audio in the field
The most obvious storage solution was the public cloud, renowned for its near-infinite scalability and supposedly unbeatable cost efficiencies. The reality, as Sullivan discovered, doesn’t always match the reputation. The CQLS maintains up to 18PB of storage, with the amount changing as users add and remove their own drives to and from OSU’s infrastructure. Naturally, there were concerns with the time needed to upload and download such massive workloads, even given the group’s 100 Gbps core network. Ultimately, though, cost proved to be the primary cloud barrier.
“One of the main reasons we have Quobyte is because of cloud costs. With the amount of data we’re talking about, and the rate at which we cycle it, doing this in the cloud would kill our budget. I’m not just using a couple of gigs here and there. I’m using petabytes of data. We’d be constantly moving stuff there and back. With all that ingress and egress, we would literally hemorrhage money to the cloud.”
After evaluating a host of storage options, Sullivan and the CQLS turned to Quobyte. Since then, the CQLS has found that Quobyte’s ROI is most impressive when dealing with large files, as opposed to millions of highly similar small files. This is particularly advantageous when working on AI training, wherein, according to Sullivan, TIFF files can consume 20 to 200 gigabytes each. Concurrently, those files may need to be correlated with data from sensors, secondary cameras, microphones, and more. Everything must flow through one server, which puts a massive load on compute and storage.
Additionally, Sullivan cites cost savings around being able to use COTS hardware rather than “canned, outrageously expensive” storage appliance solutions, which would have also entailed managing considerable, specialized support (labor) resources. With Quobyte, a single person can manage the entire storage infrastructure. Not least of all, Sullivan found Quobyte’s support to be much more efficient than that of competing providers, especially in how Quobyte assigned dedicated support reps to help prevent fragmented assistance, with different reps making different (and sometimes conflicting) recommendations.
Since OSU builds much of the CQLS’s value and monetization around tenancy, provisioning and management of tenant data and applications is paramount. Sullivan describes experiences with other storage options and how difficult they made regular operations, such as setup, management, and scaling, especially across redundant servers. He describes his pre-Quobyte experiences as “brutal, brutal, brutal. I’d rather have a root canal.”
Because of Quobyte’s easy compatibility with industry-standard servers and storage infrastructure, building out Quobyte-managed resources within the CQLS proved fast and straightforward. The scalability Sullivan wanted from the cloud could be had in-house, and at a multi-petabyte scale, total storage TCO proved even lower than what any cloud solution could offer.
The distinction between Quobyte and its competitors led Sullivan to the realization that there’s more to storage infrastructure than “just turning it on.” He points to the ease of turning on storage from Amazon and similar providers, but that’s different from getting that storage to do things. When it came to getting large-scale, complex work done, no solution made tasks easier or more effective than Quobyte. Sullivan boils the idea down to a single word: enablement. Quobyte was one of several technologies within OSU’s infrastructure that enabled users to implement their applications and achieve superior results affordably.
When Sullivan first began working for OSU, what is now the CQLS serviced six departments. Today, the group handles 26 departments along with a host of outside private and government organizations. The Forest Service is only one of many. With the CQLS’s infrastructure operating at peak efficiency, that only leaves one bottleneck on Sullivan’s mind: having just two rooms for his datacenter. That problem will have to be solved in the future. For now, Quobyte is helping OSU make the most of its physical resources, which in turn helps improve the world by solving big problems that truly matter.
“In the end for us,” says Sullivan, “Quobyte was our path to faster deployment of user file space and tenancy. That’s what our users care about. And if our users are happy, obviously our stress level is low, our return is much better, and my group is getting other things done. This is what gives us time to do research.”
Audio data is collected around Oregon, including Medford, OR
Oregon State is an international public research university that draws people from all 50 states and more than 100 countries.
With a record $380 million in competitive research grants and contracts in 2021, Oregon State University continues to lead the way with practical, problem-solving research that improves lives, protects natural resources, and generates economic growth to transform our future for the better.
Oregon State’s researchers are top-ranked in their fields, hold leadership positions in international and national professional organizations, have received prestigious honors, and earned global reputations.
Quobyte delivers performance to OSU’s research community, simplicity to the administrators, and the cost reduction that they need to remain within their grant funding.
Quobyte’s functionality and cost efficiency help make the project robust, performant, easily managed, and globally available.