As I noted in an earlier blog, customers planning to move data applications (e.g., backup and archive) to the cloud must consider five key factors in selecting and migrating data. These are:
- Ongoing data transfer volume
- Expected frequency of ongoing data usage
- Data recall performance requirements
- Application integration
- Ongoing management
In the next several blogs, I’d like to illustrate the importance of these factors by illustrating how they impact your design and planning as you migrate a few common data use cases to the cloud. The four use cases we’ll consider are:
- Site disaster recovery
- Data center off-site copy (for backup)
- Compliance archive
- Remote site primary, archive and backup
Let’s start by looking at central site disaster recovery.
On the surface, disaster recovery seems like a natural use case to deploy in a cloud service. The data by definition must be transferred off-site. Sharing the cost of standby disaster recovery resources with other customers should have a positive ROI. And the ability to leverage standby cloud service provider (CSP) resources in case of a disaster seems like a winning proposition. Some cloud service providers even offer specialized deployment and testing services optimized for this use case. Let’s test this against our five operational considerations.
Disaster recovery data generally doesn’t present a challenge against the test of “ongoing data transfer volume.” While you will need to bulk load a very large initial full copy of your data to the cloud, many cloud service providers (CSPs) offer burst bandwidth capabilities, or even a physical data transfer service to help build the initial volumes. Once this initial dataset is loaded, sending hourly or daily replicas which leverage the power of modern snapshot technology to limit data transfer volume will likely fit into available bandwidth. The DR use case also fits well inside an envelope of expected frequency of usage: this data is expected to never be used, but if it is needed, it will be important enough to pay for the associated cost. The challenge, however, may arise in recall performance and application integration.
If the following conditions apply, then the DR use case is a great one for moving data to the cloud:
- The application environment is virtualized (using the same technology as the CSP),
- The applications are standard enough (and separated enough) to operate stand-alone in the cloud service environment,
- The data can be copied and integrated through a gateway which preserves its natural application format (e.g., if it needs a file system view, it can maintain a file system view), and
- A disaster network can be established to link your users directly with the CSP in case of a site disaster…
In case of a disaster, data is not recalled. You simply plan/contract for a failover into the compute and networking resources of your cloud service provider; you then use this environment to keep the business operating while you – separately – recover the site. You need, of course, to have maintained the disaster recovery data in a “warm” storage service. You don’t want to be negotiating hours of recall time from cold storage with your cloud service vendor. Making sure you have a working network directly from the CSP to the users is an absolutely critical element in this plan; it doesn’t do you any good to have working applications and storage if your users can’t get access to them. But in this scenario, the most challenging part of the operation will be to design the very detailed and well thought out plan for how to fail the (now primary) data back to your site, once you have it back up and running. But in a fully virtualized environment, a well-planned cloud deployment looks like a winning option.
What about the case where the applications have not yet been fully virtualized, or where the applications are tightly integrated with other applications or processes that must take place on the original private site? If the data must be recalled to an application environment in your own custom datacenter to be of use, then this use case is not a good early candidate for the public cloud. Trying to suddenly recall massive volumes of data across a shared network will not provide the restore time that the business needs to stay operational. And the recall will be quite expensive. For comparison, the AWS price to store data in S3 is .03/per GB per month, but the cost to recall that same GB is greater than .10/per GB (how much greater depends on the number of “calls” you need to make). And this does not consider the cost of the network. While deduplication technology can make some impact here, the time delay and cost will be substantial. In this situation, you may want to instead consider maintaining this data in your own private cloud, or even in a privately hosted service which can be customized to your application needs.
You may decide that the inability to use the cloud for disaster recovery is compelling enough to push for implementation of virtualization across those remaining applications and servers that haven’t yet been virtualized. If choosing to do this causes you to also rethink your data protection strategy, I’d like to take this opportunity to suggest you consider deploying a VM-centric solution such as Quantum’s VMPro with DXi. As we’ll review in a later blog, the leading-edge deduplication technology in these offerings can make it easier for you to migrate other data use cases to the cloud – while also making your operations easier today onsite.
Unlike traditional backup applications and other backup applications designed for virtual environments, Quantum vmPRO Software backs up VMs in native VMware format. Check out the eBook: “VM Data Protection for Dummies” or the vmPRO + LTFS Solution Brief “Backup, Archive and Restore VM Data Without a Backup App.”