On April 28, 1990, a New York Times headline proclaimed, “10,000 Are Expected to Lose Jobs to Spotted Owl.” The U.S. Forest Service predicted job losses of up to roughly 150,000 as conservationists faced off against the lumber industry. Over the years, the owl became a symbol of the struggle between progressive and legacy industry agendas in a battle where there were no winners. After three decades, though, researchers at Oregon State University devised a way to harness big data analytics and cutting-edge computing infrastructure and bring constructive, practical guidance to both sides.
Finding a storage solution with a file system capable of effectively managing massive datasets.
Discovering a storage solution that could provide a cloud-like experience while working with large amounts of data, at a much lower cost than cloud solutions.
Budgets can be tight, so OSU needed a storage system with the ability to work with any standard x86 server, as well as working with existing hardware.
OSU’s time needs to be used to focus on research, not storage. They had difficulty accessing a storage solution with a simplified installation, setup, management, and scaling that worked with redundant servers.
The project arose within the USDA Forest Service, which had been working on methods to help protect Northwest owls since the 1990s. The Forest Service teamed with OSU scientists and devised a plan. The group deployed roughly 1,500 autonomous recording units across a large forest area. The units recorded audio around the clock every day. Scientists periodically collected the data. It took a year and a half for researchers to devise an algorithm able to parse the audio and identify different animal species. At first, the system could identify seven. Today, the system can identify about thirty, distinguish male from female, and even spot behavioral changes within a species over time.
“That project generates about 250 terabytes of data every month or so,” Christopher Sullivan, assistant director for biocomputing at the OSU Center for Quantitative Life Sciences (CQLS) and author of the animal-tracking algorithm. “That’s also image data, because we create spectrograms from the audio and process those spectrograms through a convolutional neural net we made to do species identification. We scale to about a petabyte of data, so we keep taking data off and reusing the space. Otherwise, we would just infinitely be buying storage.”
Obviously, processing up to a petabyte of data can consume massive, specialized resources. Compute work happens on IBM Power System AC922 servers, collectively containing more than 6,000 processors across 20 racks in two server rooms to service 2,500 users. The AC922 architecture puts AI-optimized GPU resources directly on the northbridge bus, much closer to the CPU than conventional server architectures. The trick was coming up with a file system and storage solution able to keep massive datasets close to compute resources. Swapping data in and out with external scratch resources doubled processing time. With so many departments and groups depending on Sullivan’s team — not to mention the owls — that was time no one could afford.
Quobyte was our path to faster deployment of user file space and tenancy… This is what gives us time to do research.
Chris Sullivan – Assistant Director for Biocomputing at Oregon State University’s Center for Genome Research and Biocomputing
Quobyte: Big Cloud Brought On-Site
The most obvious storage solution was the public cloud, renowned for its near-infinite scalability and supposedly unbeatable cost efficiencies. The reality, as Sullivan discovered, doesn’t always match the reputation. The CQLS maintains up to 18PB of storage, with the amount changing as users add and remove their own drives to and from OSU’s infrastructure. Naturally, there were concerns with the time needed to upload and download such massive workloads, even given the group’s 100 Gbps core network. Ultimately, though, cost proved to be the primary cloud barrier.
“One of the main reasons we have Quobyte is because of cloud costs. With the amount of data we’re talking about, and the rate at which we cycle it, doing this in the cloud would kill our budget. I’m not just using a couple of gigs here and there. I’m using petabytes of data. We’d be constantly moving stuff there and back. With all that ingress and egress, we would literally hemorrhage money to the cloud.”
After evaluating a host of storage options, Sullivan and the CQLS turned to Quobyte. Since then, the CQLS has found that Quobyte’s ROI is most impressive when dealing with large files, as opposed to millions of highly similar small files. This is particularly advantageous when working on AI training, wherein, according to Sullivan, TIFF files can consume 20 to 200 gigabytes each. Concurrently, those files may need to be correlated with data from sensors, secondary cameras, microphones, and more. Everything must flow through one server, which puts a massive load on compute and storage.
Additionally, Sullivan cites cost savings around being able to use COTS hardware rather than “canned, outrageously expensive” storage appliance solutions, which would have also entailed managing considerable, specialized support (labor) resources. With Quobyte, a single person can manage the entire storage infrastructure. Not least of all, Sullivan found Quobyte’s support to be much more efficient than that of competing providers, especially in how Quobyte assigned dedicated support reps to help prevent fragmented assistance, with different reps making different (and sometimes conflicting) recommendations.
Since OSU builds much of the CQLS’s value and monetization around tenancy, provisioning and management of tenant data and applications is paramount. Sullivan describes experiences with other storage options and how difficult they made regular operations, such as setup, management, and scaling, especially across redundant servers. He describes his pre-Quobyte experiences as “brutal, brutal, brutal. I’d rather have a root canal.”
About Oregon State
Oregon State is an international public research university that draws people from all 50 states and more than 100 countries.
With a record $380 million in competitive research grants and contracts in 2021, Oregon State University continues to lead the way with practical, problem-solving research that improves lives, protects natural resources, and generates economic growth to transform our future for the better.
Oregon State’s researchers are top-ranked in their fields, hold leadership positions in international and national professional organizations, have received prestigious honors, and earned global reputations.