One of the presentations at Gartner Catalyst Conference was titled Modernizing Business Continuity and DR Using Virtualization and the Cloud.
It can be watched online for free here. If you do not have an account you can create one for free and watch three sessions for free.
This blogpost has a summary of what was presented here.
The outline of the presentation by Werner Zurcher is :
Server virtualization and public and private cloud services have dramatically changed the alternatives organizations have to ensure greater application availability and disaster recovery. This session provides detailed guidance on how to modernize business continuity and highlights lessons learned from bleeding edge organizations that are already using private and public clouds for DR. The key questions answered in this session include:
• How are technologies like server and storage virtualization and software-defined networks architected to improve business continuity?
• Who is enabling, and who is embracing cloud DR and for what use cases?
• What are the most common architectural pitfalls that should be avoided?
As IT is becoming more and more important organizations are looking for ways to improve their business continuity and disaster recovery capabilities. Superstorm Sandy and the Fukushima nuclear disaster in Japan increased the number of questions by Gartner customers on how to improve DR.
To start improving BC and DR, some advise from Gartner
- IT must know the business requirement for BC and DR
- IT should use virtualization to improve mobility, availability and DR.
- use automation to simplify disaster recovery.
- use the cloud to support BC management and assure IT service recovery
Gartner did interviews with 16 of its clients to understand how they perform BC and DR. All Gartner customers who are innovative on DR have a high degree of virtualized servers. Their servers were between 75 and 99% virtualized. That is much higher then the average. Gartner estimates that by the end of 2013 67% of all servers will be running virtualized.
Most companies have 3 to 4 tiers of application protection levels ranging from Mission critical, critical, important and non important for example. Each tier describes uptime requirements, availability, RTO and RPO. Many companies also have a Tier 0 level for critical infrastructure components. In this level of protection are services like Active Directory, DNS and DHCP. These services need to be available at all times. Organizations do not want to rely on restore of Microsoft AD first and only after succesfull restore start with restoring the business applications.
The slide below shows the average spending on disaster recovery by industry. Clearly banks, telecom and airlines spent the most money of their IT budget on DR.
The slide gives an indication of RTO and RPO per industry.
Benefits of server virtualization to DR
Virtualization makes DR much easier and less costly. Virtualization provides hardware abstration. VMs can run on any supported hardware platform. In the secondary datacenter used for DR less hardware is needed when servers are virtualized. Virtualization also enables using the cloud as a DR site.
DR Automation tooling
A quote of one of the Gartner customers who participated in the study was: “We’re moving away from RecoverPoint to get away from replicating everything. I want only certain things to get replicated. Also, with LUN-based replication, if everthing on the same LUN isn’t related, it gets more difficult to move or test (failover and failback).
Death to array-based replication”
This quote comes from a Gartner client using Zerto Virtual Replication. They are moving to the cloud for DR this fall. In the Gartner study 10 of the 16 customers were using software replication. 7 of those 10 are using a combination of software and hardware replication (replication performed at storage level) .
<note from author: In April 2012 I did a review of Site Recovery Manager, Zerto Virtual Replication and Virtualsharp Reliable DR. Since then new versions have been released. Read the review here .
I also blogged about using Zerto for DR to the cloud in this blogposting titled Is your data ready for the next natural disaster or other hazards to IT?
Garter recommends to separate data of virtual machines in 3 different virtual disks.
- Disk 1 for operating system, application software files.
- Disk 2 for paging/swap file and temporary files
- Disk 3 for data
Disk 1 and disk 2 only require to be replicated once per day. Replication of the page disk could be needed to reserve the storage in the DR site. Actually as the virtual machine will restart in the secondary site is case of a disaster, the replicated paging/swap files will be overwritten at startup.
Automation makes DR much easier. 6 participants of the Garner study were using VMware Site Recovery Manager. One customer was using Hyper-V Replica with system Center Orchestrator, one company was using Zerto Virtual Replication, <note: Veeam Backup & Replication does not offer an automated recovery feature on par with the ones mentioned by Gartner)
One of the best features of DR tools is automated test failover to a DR sandbox. This allows failover to be tested without disrupting the production environment. One of the customers is quoted: Virtualization and SRM allow us to eat chips and watch TV during the DR-test.
DR to the cloud
18% of the respondents in a recent Storage Magazine survey are using Disaster Recovery to the cloud (aka Recovery as a Service RaaS). Those were all small companies. Large companies all operate two or more datacenters which are used for DR. Gartner asked 1000 of its clients if they were using cloud for DR purposes. Only one ! was using cloud for DR, another was moving to it. Gartner got the same response on another query to 1000 respondents executed at a later date.
<note from author: I guess most of Garners clients are large organizations. The actuall usage of DRaaS in total will be higher>
So large organizations are not doing DR to the cloud yet.
An issue with DR as a Service is the bandwidth to and from the cloud. Companies are starting to use carrier Ethernet with 1Gbps links to the cloud. When looking for DraaS vendors distance between your location and that of the DraaS provider is an important thing to consider. Even when using 1 Gbps connection if the distance is too far the throughput is low/latency high. Of course the providers location should not be too close either. Gartner recommends something like between 20 and 100 miles away.
Gartner spoke to a Recovery as a Service provider. This provider has a couple of cloud based recovery services. Veeam Backup and Replication was used most by their customers. More information on using Veeam for DR here.
Very often European organizations are using three datacenters. Two are close to eachother (arond 30 km max). Those two datacenters can be so close to eachother because in Europe there are no earthquakes , hurricanes or other natural disasters hitting a whole region. DR is done using the primary and secondary DC. When datacenters are close together it is easier for staff to get there for maintenance etc. Some organizations like banks use a third datacenter to store data just to be sure. Or use the cloud as a third datacenter to run webservers. If both datacenters are unavailable at least some critical public facing websites remain available.
Use the Business Continuity Management tools from the cloud (SaaS).
BCM tools offer features like emergency messaging and notification services. Also they offer DR plans, business impact analysis, DR strategy, IT DR planning etc. This can be very handy to have quick access to when a disaster have happend… <note from author: I am sure you stored Recovery plans in several safe places outside where your production data lives)>
The documentation having instructions how to recover, who to inform, contact details etc can even be accessed from a mobile Phone when using SaaS tools.
Investigate cloud DR for appropiate IT services. Cloud DR is not an all of nothing proposition. You can protect a subset of your applications using cloud DR. Get comfortable with it and extend the usage of Cloud DR when you are comfortable with it.
Test your DR plan regularly, This is what all companies do who participated in the Gartner study. However most of them did not test a failback.