Jason Buffington is the senior analyst for data protection at the Enterprise Strategy Group (ESG) and has been working with data protection and availability for over 25 years. Check out all of Jason’s ESG coverage at http://bit.ly/jbESG. This ‘guest’ blog is another installment in an ESG series, at Quantum’s invitation.
Often, when people use the term ‘archive’ it means many different, often erroneous, things:
- Some refer to any time that tape is used, instead of disk, to be an ‘archive’ – compared with a ‘backup’
- Some refer to ‘archiving’ as the grooming or moving of data from primary storage
- Some refer to any long-term retention mechanism (greater than a year) as an archive, even if the copies of data were originally created by a backup application or just a drag-and-drop of a folder or other object
Often, the term ‘Archive’ creates confusion in itself, regardless of the technology implementation behind it. Instead, let me first offer ESG’s guidance on differentiating a ‘backup’ from an ‘archive,’ regardless of the media:
A Backup is a copy of some collection or container of data (volume, folder, datastore, mailstore, etc.) for the purposes of being prepared to restore that data to a previous point in time, often as part of mitigating a crisis of some size (from server component failure to site-wide catastrophes).
An Archive is the intentional preservation of a subset of data, based on the perceived value of the data due to regulatory, operational, historical or governance purposes. After identifying which data is to be preserved, various move or copy functions are enacted for its preservation to secondary or tertiary storage.
Notice that a backup is mostly focused on a logical container and the purported outcome (preparation to restore), whereas an archive focuses on the value of the data and its preservation. Notice that both can hold data for a long period of time and both can use a variety of storage media – and that is the point. Many folks need to separate their misperceptions of “why” and “how” they store data for long-term retention. With that being said, let’s talk about Long-Term Retention (LTR) as the method by which data will be preserved, regardless of the backup or archive motivation … and some considerations for being successful with LTR.
Many presume that an LTR solution will be accessed infrequently and is typically far slower than production data – but that actually isn’t the case. According to ESG’s Backup and Archiving Convergence Trends research report, most IT respondents working with LTR solutions have the following business requirements:
- 58% move data into (an LTR solution) at least daily basis … plus 30% weekly
- 30% retrieve data (from an LTR solution) at least daily… plus 32% weekly
- 85% of data retrieved (from an LTR solution) is 24 months old or less
- 35% of retrievals are less (from an LTR solution) are than 1GB … 75% are less than 100GB
- 58% of retrievals done by IT alone … plus 37% self-service and/or IT
If one were to look at those characteristics, without prejudicial terms like ‘archive,’ an IT Architect wouldn’t necessarily jump to tape as the ‘obvious’ medium, any more than they might jump to disk or cloud-based storage as the logical choices. And yet, when we put a label like ‘archive’ on the solution, many folks erroneously assume the solution will be built on slow, infrequently-accessed tapes.
The decision gets even less clear, when considering users’ expectations for retrieval times that are often measured in minutes or less.
The key to appreciating this data chart is to recognize that LTR systems will vary in almost every aspect of performance (ingest, retrieval, frequency, speed, etc.) and therefore the right answer for architecting the storage for LTR, or backup, or archive is “it depends.” It depends on the value of the data and the performance characteristics of your business. The one assured reality is that ‘archive’ is not synonymous with only tape, with tape and disk (block, file and object) media each have various merits, whether on-premises or through a service provider.
There are a few other realities that many folks don’t always consider when looking at Long Term Retention strategies. Archiving solves several operational considerations around primary storage and backups, such as.
- If your LTR goal is to move or “groom” stagnant data from your primary storage, then you will not only reclaim expensive space and operational costs, but you’ll have less to back up or recover. Check out the short ESG podcast on why Everyone Should Archive, period.
- Similarly, since one would presume that your LTR storage is specifically designed to retain precious and pristine data for long periods of time, that data doesn’t have to be backed up anymore (or at least not nearly as frequently). In some cases, this is a driving force to using a solution that combines backup and archiving into a single (or integrated) ‘data-protection-and-management’ workflow, while intentionally leveraging different tiers of storage for high performance access and lower-cost, long term retention.
- And similar again, in regard to tiering, the last, albeit ugly, reality in long term retention is the stumbling that happens when IT asks users to radically change their behavior in order to gain the IT optimizations. Instead, IT organizations and partners should be looking for solutions where tiering or other transparent integration of more-and-less performant storage solutions work together. Perhaps the primary storage tiers within itself, or to secondary tape, alternative disk platforms or cloud-based repositories. Perhaps the backup and archiving software work together to archive after the backups, but leave a stub for those requesting access from the primary store.
There are several combinations and permutations for LTR, but all of the successful ones revolve around as consistent of a user experience wherever possible, with automation and integration enabling the solution itself (coupled with the right software and media choice(s) that fit your business and technical requirements for Long Term Retention … or archive goals. Quantum’s Artico NAS appliance and other StorNext-based solutions are designed with this in mind.