T he challenge with data in life sciences today? Managing the sheer volume of it.
The first genome took 15 years and 4 billion dollars to sequence. Today’s next-gen sequencers can sequence in days for less than $1,000. More genomes are being sequenced, which means more data is being analyzed—and it all has to be stored somewhere. In fact, Public Library of Science (PLOS) estimates that genomic data could soon surpass YouTube as the biggest generator of data. It’s clear that life sciences teams have their work cut out for them.
However, the storage challenge goes beyond managing a flood of data. Teams of scientists often need to work on the same data at the same time, even if collaborators are in a lab half a world away. When these researchers access large genome data sets or high-res medical images, they need fast access. And research takes time—some research studies can last for decades. Data generated during the beginning of the study needs to remain accessible over the lifespan of the entire project.
Storage is a pain point in the life sciences IT community, and it definitely came up at the recent Converged IT Summit in San Francisco hosted by BioTeam and Cambridge Healthtech Institute. Legacy storage solutions run into a lot of problems with the vast amounts of data generated by life sciences work. Scaling a monolithic high-performance disk array can break the research budget fast. Because science is unpredictable and studies last a long time, data needs to be readily available, when it’s needed, even if it’s not needed for decades.
Here’s the thing: to scientists, it not just data. It’s their life’s work. It’s work that is building a better future. A quick look at the headlines on GenomeWeb shows that genomic research results in innovation across healthcare, veterinary medicine, agriculture, climate science, and more. And the innovation isn’t just academic. 30% of the MIT Technology Review 50 Smartest Companies 2015 list are life sciences companies.
THE FUTURE OF SCIENTIFIC DATA MANAGEMENT
At the Converged IT Summit, I noted a shift in the discussion around storage. Both in presentations and roundtable discussions, I heard attendees talking about the need for smarter ways to handle data, both storing it and sharing it with research teams. The common thread was the importance of collaboration—between IT and scientists, between teams of researchers in an organization, and amongst the scientific community at large. The bottom line: the right infrastructure needs to make data available to those who can do the most good with it.
Why is this a shift? Traditionally life sciences researchers have relied on a single tier of scale-out NAS to get research done. But as the data flood has gotten worse, life sciences teams need a more flexible approach. In several conversations, I heard about answers to the data management puzzle that rely on more flexible tiered approaches to storage—including tiering to object storage.
A SINGLE STORAGE TIER JUST WON’T CUT IT IN 2015
Chris Dagdigian, Co-founder and Principal Consultant at BioTeam, covered tiered storage and object storage at Bio-IT World 2015 in his Trends From the Trenches presentation. He offers the opinion that multiple storage tiers aren’t a nice-to-have—they’re a requirement. A single tier just doesn’t offer the flexibility to cost-effectively manage life sciences data for the long term.
So what do storage tiers look like in the context of life sciences?
Tiered storage for life sciences should have the ability to use flash, high-performance disk, object storage, tape libraries, and cloud. Intelligent software automatically moves data between storage tiers so teams can use storage resources in the way that makes the most sense for their budgets—and their work. With tiered storage, the physical location of the file—and the type of storage the file is stored on—is invisible to the scientists doing the research. Files are accessed in the same way regardless of which storage tier they reside on, so the storage infrastructure truly gets out of the way of the research.
My colleagues here at Quantum sketched out what a tiered infrastructure for life sciences might look like—unsurprisingly, it’s similar to what Dagdigian lays out in his presentation:
- SSD for high-speed and IOPS-sensitive workflows (5-50TB)
- High-performance disk for active projects and scratch projects (50-500TB)
- Object storage as a massively scalable extension of your online storage (100TB-PBs)
- Tape for long-term retention at the lowest possible cost (100TB-PBs)
- Cloud for long-term retention
OBJECTS OBJECTS EVERYWHERE
So why does object storage have such an important place in a tiered storage solution—and how does it extend your online storage?
First, it’s helpful to understand what object storage is.
Object storage is based on forward error correction (FEC) codes that were designed for use in telecommunications; it encodes and stores information across a dispersed collection of devices, in a way that is resilient to device failures. This design brings with it a valuable set of benefits for life sciences teams.
Primarily, object storage is built to scale massively even at hundreds of petabytes. Capacity can easily be expanded by simply adding more nodes—but more importantly, newer storage nodes can be added over time and data can easily be migrated over from the old nodes to the new in the background without compromising access, making forklift technology upgrades a thing of the past. And Quantum’s object storage has robust data integrity features. Self-healing features can further protect data by continuous monitoring of data integrity in the background, with automatic repair of objects if a problem is detected.
So why do some people say object storage is the future of scientific data?
As part of a tiered approach to storing research data, object storage presents an attractive answer to data growth in the face of limited budgets. With massive petascale archives that need to be accessed for decades, storage needs to support the requirements of the long-term. That means that storage that avoids the administrative costs and headaches of forklift migrations has an edge.
The right kind of storage infrastructure can enable science rather than hold it back.
Object storage also makes it easier to share access to data. As huge genomic data sets are created, teams have to figure out how to get that data to the scientists who need to work with it. In fact, at Converged IT Summit I heard mention of bike couriers outfitted with huge backpacks full of high-capacity disk. These couriers are tasked with transporting life sciences data around Manhattan for collaboration amongst research facilities. Extending your online storage with object storage certainly offers a better alternative to pedal power. Because object storage can spread data across multiple sites and geographies, it makes it easier to support collaboration across different locations.
And the durability of object storage has immense benefits for life sciences research as well, particularly in the context of forced hardware refreshes and new technologies. When researchers return to their original data in 2, 5, 10 or 20 years, they get what they’re looking for without lost information regardless of how other IT has evolved over time.
YOUR DATA + YOUR RESEARCH + THE RIGHT INFRASTRUCTURE = BRIGHTER FUTURE
The best thing about science is that it never stops. Techniques, instrumentation, software analysis—they’re all advancing at light speed. As long as we can keep data ready and available for analysis, these advances can be put to work mining new insights. While a sequenced genome might not reveal the cure for cancer today, it could lead to medical breakthroughs years from now as our ability to interpret and analyze improves. Discovery is a long-term investment—that’s precisely why it’s important that we keep supporting life sciences innovation with our own innovative approach to data management.
Ready to Learn More?
Learn more about Quantum end-to-end storage solutions for genomics, bioinformatics, and medical imaging data. Visit www.quantum.com/lifesciences for customer stories, case studies, and solution info.