If you have followed International Data Corp (IDC), the well-known technology analyst firm over the last few years, you may be familiar with their work on the Global Datasphere, a measure and forecast of the amount of new data created and stored across the globe annually. Their latest analysis reports that over 64 Zettabytes of data were created in 2020. That’s about 32 trillion (2) hour movies, that’s three stacks of DVDs (without jewel cases) to the sun.
IDC expects continued growth compounding at 19% a year into the foreseeable future. Data created over the next three years will amount to more than all the data created over the past 30 years; three times more data will be created over the next five years than was created in the past five years.
Remarkably, only 2% of that data is being stored for future use and analysis. Expectations are that stored data too will continue to grow, particularly because every forward-looking organization is recognizing the value of data as the vehicle of digital transformation. Data is driving the opportunity to create value, to invent new revenue streams, and effectively make and confirm strategic directions.
Which brings us to the topic of cold data…
What is Cold Data?
Production workloads naturally access lots of data. If you think of data as having a lifecycle, ‘hot’ data is data that is actively being used, requiring high performance access while ‘warm’ data is still frequently accessed over a given timeframe. Cold data is inactive data that is never or infrequently accessed. Industry analysts project that 60% of all stored data is cold data.1
Increasingly, cold data is being preserved not because an organization is required to save the data, but because the data is being recognized as having inherent and potential value.
Classically, cold data was limited to data that was preserved to meet regulatory or in-house compliance policies that require retention for some number of years. The data often was simply written to tape media, taken offline, and moved to a storage facility for the rare event that the data would ever need to be accessed again – not so anymore.
Why You Need a Cold Storage Strategy
So, our whole orientation toward cold data is changing, especially as its value gets recognized – on the one hand and on the other hand, its enormity and its growth becomes overwhelming. With the digitization of everything, the incessant data collection of sensors, the volume of video and imagery sources, plus the data-intensive requirements and periodic recalibration of data analysis, artificial intelligence and deep learning workloads, the amount of cold data that must be stored is going to grow, and its application and use cannot and will not remain dormant.
Key Considerations for a More Focused Cold Storage IT Strategy
As cold data grows, it requires a more focused IT strategy and approach to meet the infrastructure requirements. Key considerations include:
Budget and technology limitations. Whereas data storage requirements continue to rise, IT budgets cannot keep pace. Moreover, while solid state disk drives (SSDs) will approach the cost and capacity characteristics of today’s hard disk drives over the next several years, HDD density growth is flattening, leaving no option for a lower-cost tier of random-access storage (which has been the most effective strategy to maintain accessibility while reducing cost). DNA-based storage, while shows promise, is many years from widespread adoption and commercialization, and initially will only serve as the coldest of cold archives (i.e., relatively simple to encode, but hard to read). The largest cloud providers, for example, have already discovered that they need to rely on slower, lower-cost media to meet these constraints.
Forever archiving. Virtually all our knowledge, in whatever form and whatever topic, has been digitized. More and more, data needs to be preserved for its historic relevance (we’ll want to look at this again) and its future value (we’ll want to analyze this again). Research data, medical records, media content, genomic data, and AI/ML modeling data are all obvious candidates that must be preserved for years and decades. Data is information and information is data. With continuous advancement of deep learning algorithms, we will continue to expand our vehicles to mine value and innovate new uses. Cold storage needs to be secure, durable, and self-healing for decades.
Online, unfettered access for enrichment, value creation, and innovation. Extracting value from growing data stores becomes even more problematic. To extract value, you need to know what data you have and have meaningful ways to organize and find relevant subsets. We need more data (the metadata) about the data to maintain and grow its relevance. This requires analysis and enrichment of the data itself both upfront and over time for continued enhancement. To do so, the data must remain easily accessible for the long term. Currently, organizations are challenged to meet these goals with inhouse platforms due to cost constraints. Often reliant on public cloud storage, data sovereignty and control become major issues; plus, accessibility is hampered by access and storage charges that spike when cold data is accessed.
As we look forward to the future, the demands of cold data growth will ultimately give rise to new storage and service solutions.
The Future of Cold Storage Roundtable – October 5, 2021
Register here for our October 5th (8:30AM PT) roundtable on the Future of Cold Storage, and receive the IDC InfoBrief, “Data Deluge: Why Every Enterprise Needs a Cold Storage Strategy.”
1 Note that, whereas there is a bit of a preoccupation around the amount of stored data that is cold, in fact, 99% of all data is cold. As 98% of data doesn’t get saved, this guarantees that this data will never get accessed again, so, by our definition, this data too is cold. That’s a lot of lost opportunity.