Gartner Catalyst session: Modernizing Business Continuity and DR Using Virtualization and the Cloud

One of the presentations at Gartner Catalyst Conference was titled Modernizing Business Continuity and DR Using Virtualization and the Cloud.

It can be watched online for free here. If you do not have an account you can create one for free and watch three sessions for free.

This blogpost has a summary of what was presented here.

The outline of the  presentation by Werner Zurcher is :

Server virtualization and public and private cloud services have dramatically changed the alternatives organizations have to ensure greater application availability and disaster recovery. This session provides detailed guidance on how to modernize business continuity and highlights lessons learned from bleeding edge organizations that are already using private and public clouds for DR. The key questions answered in this session include:
• How are technologies like server and storage virtualization and software-defined networks architected to improve business continuity?
• Who is enabling, and who is embracing cloud DR and for what use cases?
• What are the most common architectural pitfalls that should be avoided?

As IT is becoming more and more important organizations are looking for ways to improve their business continuity and disaster recovery capabilities. Superstorm Sandy and the Fukushima nuclear disaster in Japan increased the number of questions by Gartner customers on how to improve DR.

To start improving BC and DR, some advise from Gartner

  • IT must know the business requirement for BC and DR
  • IT should use virtualization to improve mobility, availability and DR.
  • use automation to simplify disaster recovery.
  • use the cloud to support BC management and assure IT service recovery

 

Gartner did interviews with 16 of its clients to understand how they perform BC and DR. All Gartner customers who are innovative on DR have a high degree of virtualized servers. Their servers were between 75 and 99% virtualized. That is much higher then the average. Gartner estimates that by the end of 2013 67% of all servers will be running virtualized.

Most companies have 3 to 4 tiers of application protection levels ranging from  Mission critical, critical, important and non important for example. Each tier describes uptime requirements, availability, RTO and RPO. Many companies also have a Tier 0 level for critical infrastructure components. In this level of protection are services like Active Directory, DNS and DHCP. These services need to be available at all times. Organizations do not want to rely on  restore of Microsoft AD first and only after succesfull restore start with restoring the business applications.

The slide below shows the average spending on disaster recovery by industry. Clearly banks, telecom and airlines spent the most money of their IT budget on DR.

The slide gives an indication of RTO and RPO per industry.

.Dr-slide1GIF

 

Benefits of server virtualization to DR

Virtualization makes DR much easier and less costly. Virtualization provides hardware abstration. VMs can run on any supported hardware platform. In the secondary datacenter used for DR less hardware is needed when servers are virtualized. Virtualization also enables using the cloud as a DR site.

DR Automation tooling

A quote of one of the Gartner customers who participated in the study was: “We’re moving away from RecoverPoint to get away from replicating everything. I want only certain things to get replicated. Also, with LUN-based replication, if everthing on the same LUN isn’t related, it gets more difficult to move or test (failover and failback).

Death to array-based replication

Dr-slide2

This quote comes from a Gartner client using Zerto Virtual Replication. They are moving to the cloud for DR this fall. In the Gartner study 10 of the 16 customers were using software replication. 7 of those 10 are using a combination of software and hardware replication (replication performed at storage level) .

<note from author: In April 2012 I did a review of Site Recovery Manager, Zerto Virtual Replication and Virtualsharp Reliable DR. Since then new versions have been released. Read the review here . 

I also blogged about using Zerto for DR to the cloud in this blogposting titled Is your data ready for the next natural disaster or other hazards to IT?

>

Garter recommends to separate  data of virtual machines in 3 different virtual disks.

  • Disk 1 for operating system, application software files.
  • Disk 2 for paging/swap file and temporary files
  • Disk 3 for data

Disk 1 and disk 2 only require to be replicated once per day. Replication of the page disk could be needed to reserve the storage in the DR site. Actually as the virtual machine will restart in the secondary site is case of a disaster, the replicated paging/swap files will be overwritten at startup.

Automation makes DR much easier. 6 participants of the Garner study were using VMware Site Recovery Manager. One customer was using Hyper-V Replica with system Center Orchestrator, one company was using Zerto Virtual Replication,  <note: Veeam Backup & Replication does not offer an automated recovery feature on par with the ones mentioned by Gartner)

One of the best features of DR tools is automated test failover to a DR sandbox. This allows failover to be tested without disrupting the production environment. One of the customers is quoted: Virtualization and SRM allow us to eat chips and watch TV during the DR-test.

DR to the cloud

18% of the respondents in a recent Storage Magazine survey are using Disaster Recovery to the cloud (aka Recovery as a Service RaaS). Those were all small companies. Large companies all operate two or more datacenters which are used for DR. Gartner asked 1000 of its clients if they were using cloud for DR purposes. Only one !  was using cloud for DR, another was moving to it. Gartner got the same response on another query to 1000 respondents executed at a later date.

<note from author: I guess most of Garners clients are large organizations. The actuall usage of DRaaS in total will be higher>

So large organizations are not doing DR to the cloud yet.

An issue with DR as a Service is the bandwidth to and from the cloud. Companies are starting to use carrier Ethernet with 1Gbps links to the cloud. When looking for DraaS vendors distance between your location and that of the DraaS provider is an important thing to consider. Even when using 1 Gbps connection if the distance is too far the throughput is low/latency high. Of course the providers location should not be too close either. Gartner recommends something like between 20 and 100 miles away.

Gartner spoke to a Recovery as a Service provider. This provider has a couple of cloud based recovery services. Veeam Backup and Replication was used most by their customers. More information on using Veeam for DR here. 

Very often European organizations are using three datacenters. Two are close to eachother (arond 30 km max). Those two datacenters can be so close to eachother because in Europe there are no earthquakes , hurricanes or other natural disasters hitting a whole region. DR is done using the primary and secondary DC. When datacenters are close together it is easier for staff to get there for maintenance etc.  Some organizations like banks use a third datacenter to store data just to be sure. Or use the cloud as a third datacenter to run webservers. If both datacenters are unavailable at least some critical public facing websites remain available.

Use the Business Continuity Management tools from the cloud (SaaS).
BCM tools offer features like emergency messaging and notification services. Also they offer DR plans, business impact analysis, DR strategy, IT DR planning etc.   This can be very handy to have quick access to when a disaster have happend… <note from author: I am sure you stored Recovery plans in several safe places outside where your production data lives)>

The documentation having instructions how to recover, who to inform, contact details etc can even be accessed from a mobile Phone when using SaaS tools.

Dr-slide6Dr-slide5

Investigate cloud DR for appropiate IT services. Cloud DR is not an all of nothing proposition. You can protect a subset of your applications using cloud DR. Get comfortable with it and extend the usage of Cloud DR when you are comfortable with it.

Test your DR plan regularly, This is what all companies do who participated in the Gartner study. However most of them did not test a failback.

Advertisements

VMware Site Recovery Manager 5.1.1 and vSphere Replication 5.1.1 released

VMware released Site Recovery Manager 5.1.1 and vSphere Replication 5.1.1 at April 25.
This release does not contain new features but has many bugfixes.

More info at the VMware blog site.

SRM 5.1.1 release notes are here.
vSphere Replication 5.1.1 release notes are here.

VMware vCenter Site Recovery Manager 5.0.2 and 5.1.0.1 released

December 20 has been a busy day at VMware HQ. Many new software releases were made available for download.
VMware released two new versions of vCenter Site Recovery Manager:  5.0.2 and 5.1.0

The difference between the versions is the vSphere support.SRM versions are in lockstep with the vSphere releases.

SRM 5.1 supports VMware ESX5.1 while SRM 5.0.2 does not support ESX5.1
SRM 5.1 supports vCenter Server 5.1. SRM 5.02 is limited to support for vCenter Server 5.0 releases.

SRM-compatibility-matrix

For compatibility see this matrix.

VMware vCenter Site Recovery Manager 5.1.0.1

Release notes can be read here.

SRM 5.1.0.1 provides the following improvements:

  • Resolves critical issues in SRM 5.1. If you have installed SRM 5.1 (build 820150), you must upgrade your installation to SRM 5.1.0.1 (build 941848).
  • Includes vSphere Replication 5.1.0.1. If you have installed vSphere Replication 5.1, after you upgrade SRM to 5.1.0.1, you must also vSphere Replication to vSphere Replication 5.1.0.1.

VMware vCenter Site Recovery Manager 5.0.2

It offers improvements and bug fixes.
Release notes are here.

VMware vCenter Site Recovery Manager 5.0.2 offers the following improvements:

  • Added support for protection and IP customization of the following guest operating systems:
    • Windows 8 (32-bit and 64-bit)
    • Windows Server 2012 (32-bit and 64-bit)
    • RHEL Server 6.2 and 6.3 (32-bit and 64-bit)
    • Ubuntu 12.04

    NOTE: To protect virtual machines that run the above operating systems, you must upgrade ESXi Server to version 5.0 update 2 on both the protected and recovery sites.

  • The vSphere Replication management server accepts MD5 certificates. See Caveats and Limitations.
  • Upgraded OpenSSL 0.9.8m to 0.9.8t for improved security. This addresses the security advisory that was issued for OpenSSL in January 2012.
  • Auto-generated certificates use RSA keys of 2048 bits.

Two Dutch VMUG Event 2012 presentations online

Two of the many presentations given at dutch VMUG Event 2012 are placed online by the speakers.

Mattias Sundling & Eric Sloof presented  a new version of their famous ‘Mythbusting Goes Virtual’ presentation. This time the myths to be busted are:

1.VMware HA works out-of-the-box
2.VMware snapshots impacts performance
3.Disk provisioning type doesn’t affect performance
4.Always use VMware tools to sync the time in your VM
Download the presentation from NTpro.nl
Viktor van den Berg did a presentation titled “vSphere – What option do you choose for Disaster Recovery” (or in Dutch: vSphere – Hoe is uw Disaster Recovery geregeld?). This session presents a general DR framework  and discusses two popular DR solutions: vSphere Metro Stretched Cluster and Site Recovery Manager.The slides of this session are available for download below, the presentation/slides are in Dutch. The link will bring you to a blogposting of Viktor with lots of links to resources on Site Recovery Manager and Metro Stretched Clusters.
Presentation for Dutch VMUG Event: vSphere DR – Stretched Cluster versus Site Recovery Manager

VMware vSphere Replication 5.1 and Site Recovery Manager 5.1 available for download

Lots of releases today by VMware. Not only vSphere 5.1, vCloud Suite and vCloud Director 5.1.
Today also vSphere Replication 5.1 and Site Recovery Manager 5.1 are released and  available for download.

Release notes and download here.

Disaster Recovery as a Service and VMware Site Recovery Manager

At VMworld 2011 USA there were quite a few sessions on Disaster Recovery using Site Recovery Manager 5.0. Also lots of focus on Disaster Recovery As a Service (DRaaS). In this posting I will give information on the DRaaS offering by four USA based providers.

DRaaS is basically outsourcing of the Disaster Recovery infrastructure and operations to a Service Provider. Cloud DRaaS  is another name commonly used in marketing for DRaaS.

An organization which wants to protect their IT-infrastructure for disasters (fire, flooding. hurricane, major hardware or software failures) has two options:

  1. build and operate their own failover site;
  2. oursource the failover site to a service provider;

Do it yourself DR is very expensive. Think about having to buy and maintain additional server hardware, storage, networking and software licenses. And having to rent or buy additional datacenter space, cooling and power. And manpower to operate it.

An alternative to DIY/on premises DR is usage of services of a provider/commercial party. Not so many years ago we sent tapes to a recovery site. Once a year organizations performed a test recovery at the recovery site only to find out the recovery failed. Using storage replication made things much easier. However the service provider needed the same brand of storage to be able to replicate the data. This was not very cost effective for the provider as he needed to have all sorts of storage to be able to replicate data from the customers datacenter.

Site Recovery Manager 5.0 changes that. It offers two methods to replicate data:

  1. traditional array based in which the storage array is responsible for the replication.
  2. host based replication or vSphere Replication as it is called by VMware

vSphere Replication is storage agnostic. That means it does not matter which brand of storage array, disks or fabric is used at the protected and recovery site. It can use any storage as long as it is supported by VMware. A component in the ESX kernel takes care of tracking which data has changed and sents this over the network to the DR site. The exact working of vSphere Replication is explained here.  Jason Boche wrote an overview of disadvantages of vShere Replication compared to array based replication in this posting.

The cloud makes disaster recovery much more cheaper and efficient than do it yourself DR. The costs of hardware, software, facilities and management of the DR site is devided over the customers of the service provider offering the DRaaS. It makes sense to share an infrastructure and pay only for the resources you are using (ie Cloud). DRaaS can be seen as a first step of organizations towards cloud computing.

One of the sessions at VMworld about SRM was titled ‘BCO3336: Disaster Recovery to the Cloud: Service Provider Perspective’. This session was more of a sales session than a deep dive technical session. Four US based service providers were on stage and each were given around 10 minutes time to do their sales pitch. All four (FusionStorm, iland.com, hosting.com and VeriStor )   are using SRM 5.0 for their DR to the cloud service offering.

I was surprised to learn that DR to the cloud was not yet offered by many service providers in the USA. All four just started the offering. VeriStor  was presented with VMware’s “First to Market with Disaster Recovery to the Cloud” award during VMworld. The four providers mentioned are all using VMware SRM 5.0. Some others are considering other tooling. Zerto Virtual Replication got a lot of attendance during VMworld.

FusionStorm has two offerings: standard which is a low cost protection per virtual machine. Customers do not have acces to the SRM console. VM’s are running in a multi-tenant infrastructure. vSphere Replication is used. In the advanced offering the customer has dedicated vSphere hosts, has limited access to the SRM console and both vSphere Replication 3rd-party replication is available.

VeriStore reports that the costs for on-premise DR is around $320.000 for 36 months. This includes hardware, software, facilities and operations costs for a SMB infrastructure having 5-10 hosts, 25-75 virtual machines and 5-10 TB storage. 
Cloud DR promises a cost reduction of 40-60% compared to on-premises DR. Costs for standby fee are $4000 and for failover fee $ 7,700 . For 36 months the costs are $ 144,000

hosting.com shows a self service webportal which can be used to test recovery, initiate a failover or failback. It also gives an overview of the status of the virtual machines level of protection. It shows the progress of a recovery and logfiles can be used to find out why a recovery has failed.

 iland.com talked about one of ther customers Psomas who was able to reduce OPEX by 35% by using iland.com Continuity Cloud.

New releases of VMware Workstation, View and SRM available for download

VMware made VMware Workstation 8 available for download at September 14. Download here

Also released at the same time are:

View 5.0
Fusion 4.0.1
Site Recovery Manager 5.0
To learn what is new in Site Recovery Manager (SRM) 5.0 read this posting.
vFabric 5 Standard/Advanced

VMware vCenter Orchestrator Plug-In for vCenter Server 5.0 More info here. A great overview of Orchestrator at vcoteam.info

Overview of Business Continuity/Disaster Recovery for virtual infrastructures

Business Continuity/Disaster Recovery (BC/DR) gets more attention from IT-management since more and more solutions become available to perform cost-effective DR. Cloud computing is one of the enablers of cost-effective DR. This article will give a high level overview of methods to protect virtual infrastructures (both VMware and Hyper-V) and will provide some solutions which protect the infrastructure when a disaster happens.

Any decision on what method to use for protection of the datacenter can only start when the RTO and RPO of applications are known. The RTO or Recovery Time Objective is a metric of how much time maximal can be used to recover the application in case of disaster and get it back operational. The RPO or Recovery Point Objective is a metric which says how much data maximal (in time) is acceptable to lose in case of disaster. Obviously a RTO and RPO of max 5 minutes will need more expensive components than a RTO and RPO of 48 hours.

Fundamental in DR is that data is copied to another physical location. Keeping a backup in the same datacenter but another room is an option but not wise as the whole datacenter can be destroyed (fire, earthquake,plane crash, bomb etc).

I will focus on data replication to another disk system. This is by far the fastest, most reliable option to protect the datacenter and it’s applications and data. Tape is an alternative but slow to recover data from and difficult to test on a regular basis. Let alone test this automatically as disk based relication is able to do.

There are two ways to make use of resources used for DR:

  • do it yourself. Own a second datacenter, or rent datacenter capacity (rackspace, power, cooling). Server hardware, storage is all owned by your organisation. 
  • make use of cloud computing. Send data to a cloud provider which is responsible for having the right resources for storage and for compute when a DR is tested or performed.

There are two ways to protect the primary datacenter:

  • using a cold backup. Data is being copied to an alternate location. Virtual machines cannot be started in this alternate location. In case of data lose in the primary datacenter, the data needs to be copied back from the alternate to the primary datacenter.
  • using a warm or hot backup. Data including virtual machine data is copied to an alternate location. Virtual machines and applications can be made operational in the alternate datacenter. Warm backup will mean hours to day recovery time, hot backup will mean recovery of an hour or less.

Replication of data can be performed using various methodes:

  • array based replication (hardware or software initiated)
  • guest/os based replication
  • hypervisor based replication /host based replication  
  • application level relication
  • replication of backup data

Compression and data deduplication.
Data sent over wan connections can be compressed and also quality of service can be set so replication does not interfer with business application network traffic.

HyperIP of Netex is an example of a WAN optimization solution. It is often used in combination with Veeam Backup & Replication.

Data deduplication technology identifies and eliminates redundant data segments so that backups consume significantly less storage capacity. It lets organizations hold onto months of backup data to ensure rapid restores (better recovery time objective [RTO]) and lets them back up more frequently to create more recovery points (better recovery point objective [RPO]). Companies also save money by using less disk capacity and by optimizing network bandwidth.

There are many solutions on the market which offers deduplication compatible with VMware.

Features you might want to have for DR and testing of your DR procedures.

  • one button automated disaster recovery. By pressing one button an automated and orchestrated failover of the primary datacenter will start. Virtual machines will start in the recovery site using a predefined run-book. High priority VM’s like AD, DNS will start first. Then databases, then application servers. All performed without manual intervention
  • one button failback. When the primary datacenter has been restored virtual machines running in the recovery site are moved back to the primary datacenter automatically.
  • automated, scheduled disaster recovery testing. To make sure the DR procedures actually result in operational virtual machines in the recovery site you want to test this regulary. Reporting about the status is a nice extra feature. Production vm’s are not affected by the test.
  • planned migration. Situations are possible in which a datacenter needs to be evacuated, but not instantly. Think about an expected hurricane, or planned downtime because of maintenance on a core switch or important storage array. In that case you want a clean shutdown of virtual machines which are then moved to the alternate datacenter.
  • application consistent replication. When replication is performed, you want data (databases) to be consistent so no data is lost.
  • guarantee that RTO and RPO are meet. Confirmation a virtual machine is up and running does not mean the application/services are in a state they are expected to be. You might want to ensure the services are recovered according the RTO and dataloss is better than the RPO.
  • grouping of VM’s. You might want to have the same policies set for a group of protected VM’s. Like replication schedule.

Quite  a few solutions are available to perform diaster recovery. This section will mention some. In future blogpostings I will go into more detail about solutions and their features and will compare solutions.

Array-based replication (done by array)

Available in synchronous and a-synchronous replication. Example of synchronous replication (not data loose) is HP Storageworks P4000 serie. Using network RAID10 data is redundant written oto two nodes simultaneously. Each nodes can be located in a different datacenter. Connection between both datacenter has a requirement of around 2-3 ms latency max. This is an expensive solution because only 50% or raw storage capacity can be used for data storage, and because of fiber connection. Also disk in both sites need to have the same specs (both SAS for instance). A VMware metro-cluster can be created. Advantage is that no manual work needs to be done to present LUNs to VMware hosts in case of a failover. Automation of DR in case of an unplanned failure is not possible. VM’s will restart by VMware HA using priority settings configurered at the cluster level.  

Using a-synchronous replication volumes can be replicated to another location. If the primary site is down, the replicated volume in the recovery site can be made operational. Some data will be lost. Some manual work needs to be done to present LUN’s to hosts and register virtual machines into vCenter Server or SCVMM/Hyper-V manager. Restart needs to be performed all manually.

To automate and orchestrate DR for Hyper-V environments, NetApp has created Powershell scripts and is using System Center Orchestrator (former Opalis) to perform DR automation.  

A disadvantage of array-based replication is that the storage array in the primary site needs to be of the same vendor as the storage array in the recovery site. Also replication is configurered per LUN. Administrators always needs to make sure protected VM’s are on the replicated LUN’s.

Array-based replication (software initiated)

Besides the array replication described above, also software solutions are available which use array-based replication but add value to the solution. For VMware environments Site Recovery Manager is well known. Another solution is VirtualSharp ReliableDR. Both use the storage layer features to relicate data. Added on that are features to perform DR testing, automation, configure priority, re-configure IP-configuration on vm’s which are failed over etc.

VirtualSharp goes a step further.It not only checks if a VM is booted in the recovery site. It also checks if an application or service (group of VMs) are functional and if RTO and RPO are met according to the SLA. See this post about what VirtualSharp ReliableDR has to offer for DR.

guest/os based replication
Replication is executed by an agent in the virtual machine or triggered by a backup server on a per vm base. Examples are Veeam Backup & Replication, Novell PlateSpin Forge and PlateSpin Protect, Veritas Volume Relicator and Double-Take Availability.  Most solutions will do for recovery of a single or limited number of VM’s but not for recovery of a large environment. There is no automated and orchestrated recovery possible.

hypervisor replication and host based replication 
These forms of replication are typical for virtual environements. Here an appliance is installed on each VMware host. Zerto Virtual Replication is a new product lauched at August 2011. It clones write i/o of virtual machines, compresses it and sends the data to a recovery site.  It does not need array-based replication and is much easier to configure than array-based replication solutions.

Starting version 5.o of Site Recovery Manager (available in September 2011) replication at host  level is also possible. In this scenario replication is not done by the storage array but by a software appliance. This new SRM 5.0 feature is called vSphere Replication. See this post what is new in SRM 5.0. SRM has some drawbacks like it cannot be used on datastores which are part of a Storage DRS datastore cluster.

VirtualSharp ReliableDR also has host based replication.   

Windows Server 8 with Hyper-V R3 will have a feature called Hyper-V Replica. Each VM can be configuered to replicate its data to another Hyper-V server which runs in a recovery site.   

Application level replication
Probably the easiest to install and manage method of replication. Services like Microsoft Active Directory and  Exchange 2010 replicate data themselves to antother instance. However most applications do not have replication features so you will need one of the methods mentioned above.

Replication of backup data
Backup data of virtual machines can be replicated to a recovery site. Microsoft DPM can replicate it’s backup data. DPM is not VMware aware so it will treat a VMware vm as a physical server. Veeam Backup & Replication backup files (VBK etc) can be replicated using third party solutions. Free solution often used is rsync. Mind Veeam by default changes the filename of the VBK after each run. To prevent this rename so rsync only replicates the changed blocks inside files, set a registry entry explained here.

Disaster Recovery to the cloud
Cloud computing has many advantages. It is cost-effective, scaleable and has a pay per use model. Combined with solutions which offer host based replication Disaster Recovery as a Service or DRaaS is a vert appealing option to do your DR.  

This posting titled How The Cloud Changes Disaster Recovery by Mike Klein explains conventional disaster recovery versus disaster recovery to the cloud. Good read. It shows the black arrow for conventional and the red arrow for DR in the cloud.

ZDnet.com has a good article titled Disaster Recovery is your transitional Cloud step

Virtualizationreview.com has a great webinar on the subject of DR for VMware sponsored by Virtacore, PHD Virtual, Zerto and Veeam

You can view the event at your convenience until October 19, 2011.

This event explored how to best implement disaster recovery in your VMware environment and virtual infrastructure. Get all the details.

see this link for more info.

What’s new in VMware Site Recovery Manager 5.0

At July 12 VMware announced the soon to be released Site Recovery Manager 5.0 (SRM). In short SRM 5.0 enables to fully automate a test and actual failover of a VMware infrastructure to a recovery site. Needed are a protected and a recovery site, a cluster of VMware hosts at each site, storage at both sides and a vCenter Server instance at each site. Also each sites needs to have an instance of SRM running.

Lots of interesting new features has been added to the 5.0 release. Also the pricing has been adjusted wth two editions available.
-vSphere Replication
-automated failback
-planned migration
-IPv6 support
-changes to the user interface
-more granular control over VM startup order
-Protection-side API’s

Pricing
Site Recovery Manager 5.0 will be available in two editions. The Standard edition list price is $195 per protected virtual machine. The Enterprise Edition is $495 per protected virtual machine. The difference between the two editions is Standard Edition is limited to protection of 75 virtual machines (per site and SRM instance) while the Enterprise Edition has no limit. All features of the Enterprise edition are availble in the Standard edition as well.

As SRM 5.0 focusses on the SMB-market, mind SRM 5.0 is not supported on Essentials and Essentials Plus kits! You will need vSphere Standard edition or higher to run SRM 5.0!

vSphere Replication (used to be called Host Based Replication during the development).
This enables replication of all kinds of storage without the need for a storage adapter. It is a simple, cost efficient replication for Tier 2 application or for SMB, branch and remote offices. This feature will directly compete with host based replication in  products like Veeam and Vizioncore. The storage based replication we know of SRM 4 can be used for high-performance replication of business critical applications.
The advantage of vSphere Replication is that the storage at the protected and recovery site can be different. With SRM 5.0 replication costs for license at the storage layer are not needed. The RPO can be anything between 15 minutes and 24 hours. Because only changed disk data (changed block tracking is used) is replicated there is a low network utilization. And it can protect 500 VM’s.
vSphere Replication does not support automated failback , fault tolerance, templates, linked clones and physical RDM’s. It also has file level consistency only (no application consistency). vSphere Replication needs ESXi 5. vSphere 4 or lower hosts are not supported!
Storage based replication is the choice when synchronous replication is needed of high data volumes and application consistency is needed.

Management of vSphere Replication will be less complex than with storage-based replication. For storage-based replication the vSphere admin needs to talk with the storage admin about which LUN’s needs to be replicated to the recovery site. The vSphere  admin needs to be sure about which datastores to select to place the VMDK’s of protected VM’s on. Setting up SRM can be difficult. VM’s that do not need protection needs to be moved to another datastore to make sure it is not wasting replication capacity. (storage and network)
For management of vSphere Replication just vCenter Server is needed. Protection is done at the VM level.
Current SRM customer do not protect all of their VM’s with SRM. Only business critical ones are protected, mainly because of the costs for protection. Tier 2 and tier 3 apps get protected by backup.

vSphere Replication architecture
The hosts at the protected site run a vSphere Replication agent. This agent will send all the changed blocks (delta) of protected virtual machines to the vSphere Replication Server on the recovery site. The replication process is managed by vSphere Replication Management Server which needs to be installed on both sites and it is integrated in vCenter Server and Site Recovery Manager. The Replication Manager Server can be stacked. You will need around 1 server to protect 100 VMs. Mind the replication schedule (which defines the RTO) is set on the ESXi host. So all VM’s running on that host will have the same replication schedule!

-automated failback.
When the protected , original site has been restored, a failback can be executed automatically. In previous version failback had to be done manually. Now the same recovery plan used for the failover can be used for the failback. Mind the original site needs to be intact to perform an automated failback. If the site is destroyed and new hardware is used to rebuilt it, SRM does not know the config and a manual failback needs to be performed. Automated failback is not available for vSphere Replication.

-planned migration.
Planned migration is a new workflow that can be applied to any recovery plan. This ensures no data loss, application consistent migrations of virtual machines. Planned migration is used by customers for various reasons:
-Disaster avoidance  because of a hurricane, flooding,  power failure in the datacenter
-planned maintenance of the datacenter (replacement of a core switch)
-for datacenter consolidation (companies merge)
-loadbalancing of virtual machines over datacenters.
In the case of planned migration  there are hours to prepare the move. The planned migration is a much cleaner migration process with no data loss compared to the DR migration plan. In a disaster recovery plan all VM’s are migrated as soon as possible and it might lead to data lose because it is not application consistent. The planned migration workflow will do a clean shutdown of the vms at the protected site, sync data, stop the replication and present the LUN’s in the recovery site to the ESX hosts there.

VMware vCenter Site Recovery Manager 4.1.1 is released!

VMware released a lot of new products and updated versions of current products in February. And a lot more products are to be released this year. Think about vSphere 5.

Site Recovery Manager version 4.1.1was released at February 15. It does not have new functionality. VMware vCenter Site Recovery Manager 4.1.1 is a maintenance release that resolves issues identified in previous Site Recovery Manager releases.

More info here

Site Recovery Manager per virtual machine licensing explained

This blog by VMware explains in detail the new per virtual machine licensing of Site Recovery Manager

http://blogs.vmware.com/uptime/2010/09/vmware-vcenter-site-recovery-manager-and-per-vm-licensing.html

VMware vCenter Site Recovery Manager 4.0.2 is now available

Uptime, the VMware.com blogsite on Business Continuity reports a new maintenance release of SRM is available. Version 4.0.2 has no new features but bugfixes only.

read more here:

http://blogs.vmware.com/uptime/2010/07/vmware-vcenter-site-recovery-manager-402-is-now-available.html

Patch 3 for Site Recovery Manager 1.0 Update 1 released

Patch 3 for SRM 1.0 Update 1 is a cumulative patch that corrects several problems:

  • a problem that prevents protected virtual machines from following recommended Distributed Resource Scheduler (DRS) settings when recovering to more than one DRS cluster.
  • a problem observed at sites that support more than seven ESX hosts. If you refresh inventory mappings when connected to such a site, the display becomes unresponsive for up to ten minutes.
  • a problem that could prevent SRM from computing LUN consistency groups correctly when one or more of the LUNs in the consistency group did not host any virtual machines.
  • a problem that could cause the client user interface to become unresponsive when creating protection groups with over 300 members
  • several problems that could cause SRM to log an error rmessage vim.fault.AlreadyExists when recomputing datastore groups.
  • a problem that could cause SRM to log an Assert Failed: “ok” @ src/san/consistencyGroupValidator.cpp:64 error when two different datastores match a single replicated device returned by the SRA.
  • a problem that could cause SRM to remove static iSCSI targets with non-test LUNs during test recovery..
  • several problems that degrade the performance of inventory mapping.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1010053

Some Arrays Might Require a Second Rescan During SRM Test and Recovery

New KB Article on SRM

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008283

DC16 vCenter Site Recovery Manager – Implementation Common Pitfalls and best practises

Diaster Recovery is becoming a hot topic nowadays because virtualization makes it much more easy to recover from a disaster. Over 500 attendeed are present in the room for this presentation on Site Recovery Manager presented by Lee Dilworth of VMware.

A basis overview of the product is given. Two datacenters need a vCenter Server in each and two SRM instanced. Replication of data and virtual machines is done by the SAN. SRM is not able to do a fully automated failback yet. But is can be done much quicker than a manual failback.

The next release of SRM, released in 2009, will have support for NFS. During a test failover replication of data will continue. A good thing as you do not want to stop replication during a test. You lose your data in case of a emergency during testing and next your job.

Most of the presentation is old news for people already familiar with the product. Best practises presented can be found on the vmware.com website. Advise is to always read the documentention supplied by the supplier of the Site Recovery Adapter. The SRA  is a piece of software made by the SAN supplier which integrates SRM and the SAN software. The SRA can be dowloaded off the VMware.com website.

%d bloggers like this: