T oday, researchers across scientific disciplines are benefiting from technology innovations in both software and hardware.

Think: High performance computing and compute acceleration technologies like GPU. Unmanned drones and robots that enable scientists to go places humans cannot go—in space, on land, and at sea. Ultra high-definition 4K and 8K video formats. Advanced sensors that collect infrared, ultraviolet, microwave, and radar data. And analytics that make it easier to make sense out of all this data.

The worldwide buzz about the first-ever detection of gravitational waves at LIGO in February just reinforces how technology is helping to advance our understanding of the physical world.

Whether in chemistry, genomics, bioinformatics, climate science, particle physics, or cancer research—today, data can be analyzed and mined for insight more effectively than ever. But to manage scientific data at the petascale—and to enable demanding high-speed workflows—as well as collaboration across teams and departments and institutes—calls for specialized storage infrastructure.

The right kind of storage infrastructure can make all the difference in finding the answers to today’s scientific questions.

While it might seem strange to quote a famous German-language poet in a post about storage and scientific research—I’m going to do it anyway. Because the pursuit of scientific knowledge all starts with questions, and with the desire to understand.

The Austrian-Hungarian poet Rainier Maria Rilke is famous for having said:

try to love the questions themselves”

I don’t think any of my scientist friends precisely subscribe to that philosophy. My science friends might love the questions, yes. They might love the journey, and the whole scientific pursuit of discovery, yes. But they also love those moments of epiphany, those “a-ha” moments when they get closer to answers.

So how can storage help the scientific process? Here are…

5 WAYS THAT STORAGE INFRASTRUCTURE CAN MAKE ALL THE DIFFERENCE IN SCIENTIFIC RESEARCH

  1. Access Made Easy
  2. Reanalyze, Replicate, Reproduce
  3. Easy to Grow and Scale
  4. Cost-Effective Resource Allocation
  5. Interoperable & Integrated

1. ACCESS MADE EASY

Storage isn’t about storing bits on a disk—the objective of a storage solution is to ensure that people have access to the information they need, when they need it, how they need it.

Most researchers need shared access, self-service access, and for some things, high-speed access.

Shared access enables more efficient workflows when individuals and teams are collaborating.  And yet, not all storage solutions optimize for sharing. Some storage solutions optimize for the high IOPs speed of the local storage at the expense of sharing. An inability to share data across systems and between users can lead to inefficient, serialized workflows, where data has to first be moved from local storage to some other repository to enable other teams to work on it. Not good!

Self-service access means that scientists don’t have to wait. Most researchers do not want to have to file an IT ticket to request archived data—and then wait a few hours (or a few days) to get it back. When data has been archived for long-term storage on different infrastructure, it’s ideal if researchers can still access the files they need themselves, from the location where they expect to find the data, without the intermediation and delays of having to file an IT ticket.

High-speed access matters to data-intensive applications and workloads, particularly in high performance computing. And storage that delivers the speeds required by applications and HPC clusters—while also letting customers spread data across multiple tiers, so they’re not forced to put all of their data on expensive disk—well, that is yet another way that storage infrastructure can help research teams today.

2. REANALYZE, REPLICATE, REPRODUCE

When I talk to customers in other industries, such as media and entertainment, I sometimes refer to this requirement as “Reuse, Repurpose, Remonetize.” In the M&E world, video that was first shot years ago may be reused and repurposed as part of a new movie, documentary, or television show. Sometimes old sports video from decades ago can be remastered and remonetized in a new form.

Content producers recognize the value of remixing older content with new video—and that it’s important to archive content effectively. Because “your archive is only as valuable as your ability to retrieve it quickly.”

In the scientific world, of course, people’s goals and objectives are different. Their focus is on curing cancer, rather than entertaining customers. But that only means the requirement to reference older, archived data is perhaps even more critical.

Sometimes research projects last for years. Before publishing a genomics paper, for example, researchers might need to reanalyze some of the original raw sequencing results using newer bioinformatics techniques, in order to augment the original analysis.

And then there is all the discussion and controversy in recent years about the challenges with reproducing scientific test results. I’m told this debate has increased the pressure to preserve the original raw data as well as the secondary data (aka the results of the analysis)—rather than to simply publish the results.

So not only are scientists likely to want to reanalyze their data—it’s also possible they will need to reference it again so they (or others) can attempt to reproduce their results.

What this means is that storage infrastructure that makes it easy (not just possible—but easy) to retrieve and reanalyze older archived data can make scientific workflows more efficient.

3. EASY TO GROW AND SCALE

The need to reanalyze older data leads to another key storage attribute in the realm of scientific research. Teams need to be able to scale—and to scale big. It’s not uncommon for our customers to have 15PB (yes, petabytes) of data today and to know that it will grow to, say, 25 or 30PB in the next several years.

So a storage solution that makes it easy to grow a file system on the fly, without downtime, without stopping people from doing their jobs—well, that can be useful.

And a storage solution that enables organizations to scale capacity with different types of storage, so they can balance the tradeoffs between cost and risk—well that is another way that storage infrastructure can make it easy to grow and scale.

And there’s more. A storage solution that makes it easy to archive on ingest, to create near-immediate copies of data without any backup headaches—well that too becomes incredibly useful, especially as the size of the dataset gets to a point where backup simply isn’t an option.

Of course, the need to grow and scale isn’t always about capacity. Many institutes and departments are securing more grants and increasing the number of projects they need to manage. Which drives up performance requirements, and increases the number of users who need access to the data repository.

So yet another way storage infrastructure can support scientific workflows is to make it easy to scale—to scale capacity, to scale users, and to scale performance.

4. COST-EFFECTIVE RESOURCE ALLOCATION

Everyone has to live within some kind of budget, right?

In the world of science, grant money and charitable gifts create opportunity and also create limitations. So even in the noble disciplines of science, bills have to be paid and tradeoffs must be made.

Storage infrastructure that provides the ability to combine different types of storage such as flash, disk, object storage, tape, and cloud—each with different cost attributes—can give organizations the flexibility to deploy the type of storage that best balances their needs for performance, scale, access, and budget.

At the Converged IT Summit in San Francisco in late 2015, one of the founders of BioTeam, Chris Dagdigian, spoke about how multi-tier storage is the future of scientific data.  Chris described multi-tier storage solutions that are quite similar to the tiered storage solutions that Quantum customers use, depicted here:

  • SSD for high-speed and IOPS-sensitive workflows (5-50TB)
  • High-performance disk for active projects and scratch projects (50-500TB)
  • Object storage as a massively scalable extension of online storage (100TB-PBs)
  • Tape for long-term retention at the lowest possible cost (100TB-PBs)
  • Cloud for long-term retention

It’s simple actually: with the new normal being 2,000 cores and petabytes of disk, customers need the flexibility to deploy multiple different tiers of storage (each with different performance and cost attributes) in order to meet the needs of their workflow and their data.

So a multi-tier storage solution that lets organizations combine flash, disk, object storage, tape, and cloud can make all the difference in an organization’s ability to stay within budget and yet still enable the scientists to move research forward.

5. INTEROPERABLE & INTEGRATED

Finally, rip and replace doesn’t work in environments where resources need to be allocated wisely.

So a storage solution that avoids rip and replace can make a big difference. Research teams can benefit from storage that integrates easily into existing infrastructure and with existing applications.

Quantum’s multi-tier storage infrastructure has been created with interoperability as a design goal.  It provides heterogeneous support for all the leading operating systems—Linux, Windows, UNIX, and even MacOS. Not all storage solutions can say that—and even when they do, performance might be compromised on one of the operating systems, such as Windows.

Quantum StorNext multi-tier storage also supports different types of network connectivity options—including Fibre Channel, Ethernet, iSCSI, and InfiniBand. StorNext supports standard file sharing protocols—including NFS and SMB—and also offers high-speed protocols from Quantum that deliver faster performance, both over FC and Ethernet.

THE RIGHT KIND OF STORAGE INFRASTRUCTURE CAN MAKE ALL THE DIFFERENCE

At a recent machine data event, Doug Merritt, the CEO of Splunk, said that his company was lucky to give birth to the Splunk product at the same time as the modern digital revolution we’re all living through. I feel the same way about StorNext multi-tier storage. As the amount of scientific data has grown astronomically—as the need for sharing and collaboration between researchers has grown—as demands for performance and access speeds have increased, well, our StorNext storage platform has become even more valuable to our users. And it can make all the difference.

Ready To Learn More?

Curious to learn more about Quantum’s storage solutions for scientific research? And how the team of storage experts at Quantum feels privileged to work with top research institutes and universities around the world—to empower insights with intelligent multi-tier storage? Visit out our solution page to learn more: Smart Storage for Scientific Research.

Recommended Posts

Leave a Comment