Live webcast Best Practices: When & How to Use Stretched Clusters

At April 26 11:00 AM – 12:00 PM EDT   (start 8:00 am PT/  15:00 GMT ) EMC will have a free live webcast on the use of stretched clusters.

Host: Scott Lowe, CTO, VMware Affinity Team, EMC
The abstract:

EMC VPLEX’s AccessAnywhere functionality, when coupled with the industry-leading virtualization solution VMware vSphere, enables new topologies, like stretched clusters. But is a stretched cluster the best solution? In this webcast, we’ll examine stretched clusters and VMware’s own Site Recovery Manager to determine which solution is the right solution for your organization.

Attend this session to learn to:

  • Distinguish between stretched clusters and SRM-based topologies.
  • Gain a better understanding of the technical requirements of each solution.
  • Select the right solution based on your specific business requirement.

Registration at

Citrix and VMware best practices for running XenApp on virtual machines

VMware published in March 2011 a document titled ‘Citrix XenApp on VMware Best Practices Guide.‘  

Now Citrix published a best practices document as well called XenApp Planning Guide: Virtualization Best Practics.


VMware vMotion/Live Migration and HA/Host based failover clustering supported for Exchange 2010 SP1

Microsoft announced at May 16 2011 that features like moving live virtual machines to other hosts or high availability features like VMware HA are now fully supported by Microsoft for Exchange 2010 SP1 virtual machines running in a database availability group (DAG). Also the Unified Messaging server role is supported in a virtualized environment.

Microsoft also released a document titled Best Practices for Virtualizing Exchange Server 2010 with Windows Server® 2008 R2 Hyper V™

Taken from this posting on titled Announcing Enhanced Hardware Virtualization Support for Exchange 2010

The Microsoft Exchange team is enhancing positioning by including additional supported scenarios regarding Exchange Server 2010 running under hardware virtualization software. As of today, the following support scenarios are being updated, for Exchange 2010 SP1, and later:

  • The Unified Messaging server role is supported in a virtualized environment.
  • Combining Exchange 2010 high availability solutions (database availability groups (DAGs)) with hypervisor-based clustering, high availability, or migration solutions that will move or automatically failover mailbox servers that are members of a DAG between clustered root servers, is now supported.Due to improvements we made in Exchange Server 2010 SP1, along with more comprehensive testing of Exchange 2010 in a virtualized environment, we are happy to provide this additional deployment flexibility to our customers. The updated support guidance applies to any hardware virtualization vendor participating in the Windows Server Virtualization Validation Program (SVVP).

There is some confusion about the words Microsoft used. Below the confusing text in bold

Exchange server virtual machines, including Exchange Mailbox virtual machines that are part of a Database Availability Group (DAG), can be combined with host-based failover clustering and migration technology as long as the virtual machines are configured such that they will not save and restore state on disk when moved or taken offline. All failover activity must result in a cold start when the virtual machine is activated on the target node. All planned migration must either result in shut down and a cold start or an online migration that utilizes a technology such as Hyper-V live migration.

This article describes that the bold text referes to the Quick Migration feature of a previous release of Hyper-V. In that release the virtual machine was temporary saved to disk when performing a migration.

Microsoft Exchange Server 2010 on VMware vSphere Best Practices Guide

Updated May 15: new statement of Microsoft on Live migration of Exchange mailbox servers part of a DAG

VMware released in November 2010 a white paper called ‘Microsoft Exchange Server 2010 on VMware vSphere Best Practices Guide’
Download the whitepaper from this link

Microsoft has some comments on best practices advised by VMware. Specifically the use of HA. Read more about it on the post called Answering Exchange Virtualization Questions and Addressing Misleading VMware Guidance

In short: make sure if you are using Exchange Server virtual machines having mailbox role and member of a DAG, those VM’s are not restarted automatically when the host fails or shutdowns. Also make sure the VM is not manually of automatically moved to another host while running. All of these can simple be configured and you are all fine!

Just before TechEd 2011 Microsoft released a new whitepaper on virtualizing Exchange 2010 titled Best Practices for Virtualizing Exchange Server 2010 with Windows Server 2008 R2 Hyper V 
The policy on Live Migration of DAg members has changed:
Live Migration seems to be supported now for Exchange 2010 SP1 Database Availability Groups. The Microsoft document on virtualizing Exchange Server 2010 states the following on page 29:

“Exchange server virtual machines, including Exchange Mailbox virtual machines that are part of a Database Availability Group (DAG), can be combined with host-based failover clustering and migration technology as long as the virtual machines are configured such that they will not save and restore state on disk when moved or taken offline. All failover activity must result in a cold start when the virtual machine is activated on the target node. All planned migration must either result in shut down and a cold start or an online migration that utilizes a technology such as Hyper-V live migration.”

It seems that Microsoft now tells VMware was right in the first place although the whitepaper does not mention if vMotion is supported as well.

At November 11 2010 VMware responds to the article  of Microsoft called  Answering Exchange Virtualization Questions and Addressing Misleading VMware Guidance
VMware explains HA is nothing more than an automated restart of a VM. If the Exchange DAG member fails on a physical host, an administrator will eventually start the server as well to solve the problem. HA does this automatically. VMware has been using HA on DAG members without any issues.

See my blogpost on Virtualizing Exchange Server 2010 on how to configure Exchange Server 2010 VM’s to make sure Microsoft supports them

The scope of the VMware whitepaper document is:

•     ESX Host Best Practices for Exchange – This section provides best practice guidelines for properly
preparing the vSphere platform for running Exchange Server 2010. This section includes guidance in
the areas of CPU, memory, storage, and networking.
•     Exchange Performance on vSphere – This section provides background information on Exchange
Server 2010 performance in a virtual machine. It also provides information on official VMware partner
testing and guidelines for conducting and measuring internal performance tests.
•     Exchange 2010 Capacity Planning – Sizing Exchange 2010 to run in a virtual machine follows
many of the same best practices as sizing on physical servers; however, with the introduction of new
Exchange 2010 features (i.e., Database Availability Groups), the Capacity Planning process has
changed significantly. This section walks through this new process.
•     Sizing Examples – In this section, we apply the new Capacity Planning process to two sample
configurations, one with Database Availability Groups and one without.
•     vSphere Enhancements for Deployment and Operations – This section provides a brief look at
vSphere features and add-ons that enhance deployment and management of Exchange 2010.

The following topics are out of scope for this document, but may be addressed in other documentation in

this Solution Kit:
•     Design and Sizing Examples – This information can be found in the Microsoft Exchange 2010 on
VMware: Design and Sizing Examples
•     Availability and Recovery Options – Although this document briefly covers VMware features that
can enhance availability and recovery, a more in-depth discussion of this subject is covered in the
 document included in this Solution Kit, which expands upon
the examples in this guide by showing how the Capacity Planning process works for small, medium,
and enterprise configurations.
Microsoft Exchange 2010 on VMware: Availability and Recovery Options
It is important to note that this and other guides in this Solution Kit are limited in focus to deploying
Exchange on vSphere. Exchange deployments cover a wide subject area, and Exchange-specific design
principles should  included in this Solution Kit.

Know the performance impact of snapshots used for backup !

A good understanding of VMware snapshots is essential when using backup tools using snapshot technology.

Snapshots can hurt performance of virtual machines. It even can lead to freeze / unresponsiveness for many minutes.

A traditional backup tool is not aware a server is running as a virtual machine will make a backup of the files inside the virtual machine. This will impact the vCPU and the network as all data will be transfered over the network to the backup server. Much better is using a backup tool which is able to make image level backups. Many tools are available on the market to make such backups. It copies the virtual disks and is able to save the characteristics of the virtual machine like the number of vCPUs, internal memory etc. Very important when a full recovery of one or more virtual machines is needed. You do not want to manually re-create virtual machines, boot an operating system and start recovering files.

To make a crash consistent backup of a VMware virtual machine, backup software uses the snapshot technology of VMware. It simply sends an API-call to vCenter Server or the ESX(i) host with the request to make a snapshot. When a snapshot is made, data of the virtual machine is not written anymore to the VMDK files of the virtual machine. Instead data is written to a temporary snapshot file. This enables a consistent backup of the virtual machine because during the runtime of the backup the VMDK is not changed.

Mind the backup is crash consistent which means the virtual machine will boot up when recovered from backup! This will not guarantee that applications using databases will boot up fine as well. To make sure databases can be recovered, a VSS-snapshot needs to be made as well when creating snapshots.

When the backup is done, the backup software will delete the snapshot file. Deletion or commiting the snapshot file means all data that has  temporary been written to the snapshot file is being purged into the VMDK files. While this snapshot commit is running, a second snapshot file is created to store the data which is written by the VM during the commit phase.

When a VM is busy with diskwrites  during the runtime of a snapshot based backup (for example a heavily used Exchange Server or database), the commit will take some time and uses relative lots of IO. Obviously lots of data need to be read and written. VMware ESX(i) needs to read the VMDK to see if blocks of data exists which are also existent in the snapshot file. If  data does not exist in the vmdk and does exist in the snapshot file that  data is written to the VMDK.

At the end of the commit the VM needs to be frozen for a short while to be able to process the last writes without having the create another snapshot file. This is called stun/unstun

The whole process is described in an excellent article of Erik Zandboer titled Performance impact when using VMware snapshots 

The commit process can have a negative effect on the performance of virtual machines during backup. I have seen cases where users could not login to servers because at that time a snapshot commit was done and the backup job was running. In vCenter Server the text ‘ Remove snapshot’  with status 95 %. The process seems to be stuck at 95 % for a long time. As soon as the backup job was aborted the performance issues were over.

The effect of snapshot commit is one of the reasons Veeam Backup & Replication uses serial processing of virtual machine backup. A backupjob having 10 vm’s will start with backup of the first vm, when ready the next vm and so on. If parallel processing is done there is a chance the jobs will saturate the SAN. First because the data read from SAN send to the backup storage and secondly when the backup has finished and snapshot commits are done. Another disadvantage of parallel processing in Veeam is that multiple backup jobs needs to be created and this will reduce the efficientcy of de-duplication done by Veeam B&R.

The effect is even more noticeable when virtual machines are stored on NFS storage. This is a link to a VMware KB article titled Virtual machine freezes temporarily during snapshot removal on an NFS datastore in a ESX/ESXi 4.1 host

Here are some links to postings on the forum about timeouts and freeze on virtual machine during snapshot removal. Again, these issues are not related to Veeam B&R. Timeouts can occur using any backup tool using snapshots. It is caused by the way VMware handles snapshots.

VM timeouts during snapshot commit
Guest VM halts during replication snapshot
Snapshot removal issues of a large VM

Besides being able to backup a virtual machine, Veeam B&R is also able to replicate virtual machines. This enables a low RTO and RPO a bit like Site Recovery Manager does. Mind however the replication function will use snapshots! If your replication schedule is set at 15 minutes during working hours on a server busy writing, there is a risk of performance issues!
Make sure to test the effect of replication jobs on the performance of virtual machines. If the effect is negative, an alternative solution should be considered. While Veeam does replication at the virtual machine disk level, Microsoft DPM does replication on the application data level. DPM does not create a snapshot and uses Changed Block Tracking. Instead it tranfers the changed data blocks of files. So no snapshot commits.

To avoid performance issues on virtual machines caused by snapshot:

  • snapshots must be used to save the state of a virtual machine and deleted as soon as possible. Do not keep a snapshot file if not needed anymore. The snapshot file will grow fast on servers with lots of writes. Commiting the snapshot will have an impact on performance. Forgetting to delete the snapshot can lead to disk full situations. Use tools like RVtools to quickly see if a vm has live snapshots.
  • make sure the performance of the storage area network is known before a full rollout of backup jobs.
  • Backup jobs are preferably run during off hours when the load on servers is low.
  • if replication is used, make sure you know the impact before applying it in production
  • to make the effect of snapshot commit less, use SSD drives to store the snapshot file on or spread the I/O over the available LUNs. See this article for more info.

More info
Timothy Dewin has written a great blog explaining the way snapshot works and why performance can be negatively be affected during creation and deletion of snapshots.  Recommended read!

Virtualization of an Active Directory domain controller (P2V)

If an infrastructure is going to be converted from physical servers to virtual you should consider how to handle Active Directory Domain Controllers. This posting is a reference to some articles posted on the internet.

I advise to always have at least 1 domain controller installed on a phyiscal server. This prevents the catch22 situation where the virtual infrastructure is not available and authentication is not possible because all DC’s are virtualized. For disaster recovery Microsoft DPM server needs Active Directory so that is another reason to always have at least one DC running physical.

At Microsoft TechEd 2011 North America one of the session was about virtualization of Active Directory Domain Services. See the video of the session here.

Customers are looking to further virtualize their environments: file servers, web servers, DNS servers, and even their domain controllers. It is clear that virtualization provides many benefits in areas such as deployment, disaster-recovery and lowering TCO. However, while virtualization offers many powerful capabilities and greatly simplifies repetitive tasks, it is a technology that must be handled with care when used in conjunction with Active Directory. In this session we review fundamental concepts within Active Directory and the impact of cloning and virtualization upon domain controllers, domain members and Windows in general. We also discuss how to best leverage virtualization and how to both mitigate problems and avoid occurrences in the first place.


There are several scenario’s thinkable to virtualize domain controllers:

1. install a new virtual machine. Install Active Directory Domain Controller role on it. Transfer roles of physical DC to the new DC and dc-promo the physical server to remove the DC role.. This is by far the best option as there is no risk for issues in Active Directory. Mind however that this involes a change of the DNS servers. So you might have to change the DNS server references of each servers, and adjust the DNS servers published by the DHCP servers. Also some applications running on the same server might be dependent of the local domain controller.

2. Demote the domain controller role on the physical server. Then perform a P2V. After that has finished, dcpromo the virtual server to a domain controller if needed. Ideal would be to create a server with no applications, just the domain controller role.

3. P2V the physical server to a virtual machine. Sometimes this needs to be done because of lack of time. Some organizations deciced to install applications on the domain controller. Manually  reinstalling the application(s) on a newly created virtual machine can cost a lot of time because documentation, media and licenses cannot be found. An exact copy of the physical server will prevent the hassle.
However, the procedure to P2V a domain controller needs some attention.
If PlateSpin Migrate is used to perform the migration, the job will ask for administrator credentials on the source server. As the active directory services on the source server are disabled (as the server is in AD recovery mode) the password check will fail! Make sure you know the Active Directory Restore Password as the account Administrator and the Restore Password is used for authentication of the PlateSpin job. 

Also you will need to have some experience and knowledge on Active Directory. There is a change the P2V -ed domain controller lost it’s connection to the domain and needs to be connected again. This could happen if the DC has been offline for too much time for example.

Read these articles for info:

How to: P2V a domain controller by Ted Steenvoorden
I performed the procedure described above several times and had no problems. Make sure all FSMO roles are moved from the server which needs to be P2V-ed to another domain controller as these roles are not available when the server is in Directory Services Restore Mode .  Also check Global Catalog server role.  

Virtualizing a domain controller, how hard can it be? by Gabrie van Zanten   

P2V a Domain Controller? Why would you? by Christian Mohn

Converting domain controllers by Duncan Epping

Virtualizing existing domain controllers by VMware

vSphere vStorage: Troubleshooting Performance

VMware employee Nathan Small of Global Support  Services published a very good presentation on troubleshooting performance on vSphere vStorage.

The document describes how to read the output given by ESXTop. It stresses the importance of disk alignment to get the most out of your storage array. It discusses SCSI versus SATA drives and a lot more.

This kind of documents will give you a lot of knowledge and I think it is a must read for everyone involed in management and design of VMware vSphere infrastructures.

The presentation be be seen and downloaded at

The agenda is shown below.

Server Virtualization in Microsoft Lync Server 2010

At February 21 Microsoft published a document titled “Server Virtualization in Microsoft Lync Server 2010”.
This document outlines a series of best practice recommendations for running Lync Server 2010 in a virtualized environment.

Download here

Microsoft publishes Hyper-V Cloud Reference Architecture White Paper

At February 11 Microsoft published a new whitepaper on the Hyper-V Cloud Fast Track Program titled ” Reference Architecture Technical White Paper”. It can be downloaded here.


It is an interesting document describing the  seven principles Microsoft Hyper-V cloud architecture is based on: resource pooling, elasticity and the perception of infinite capacity, perception of continuous availability, predictability, metering/chargeback, multitenancy, and security and identity.
The document also describes design principles on networking, explains Clustered Shared Volumes, SAN design, host design, virtual machine design and a lot more interesting stuff.

A must read if your role is to design a Hyper-V infrastructure, either for a  private cloud server or limited to server virtualization.

More information on the Microsoft Private Cloud TechNet blog


What is Hyper-V Cloud Fast Track ?

At TechEd Europe 2010 in Berlin, Microsoft introduced several new initiatives and some new solutions which enables customers to start using Cloud Computing.
Hyper-V Cloud Fast Track is a complete turn key deployment solution delivered from several server vendors which enables customers to quickly deploy cloud computing with reduced risk for technical issues by purchasing a virtualized infrastructure designed with best practices of Microsoft and the hardware vendor. Customers can build the infrastructure themselves based on reference architecture or use one of the many partners of the server vendor.

The solution is based on Microsoft best practices and design principles for Hyper-V and SCVMM and on partner best practices and design principles for the part of the solution deliverd by the partner (storage hardware, blades, enclosure, rack mounted, server, networking etc)
Some parts of the architecture are required by Microsoft  (redundant nics, iSCSI for clustering at the virtual machine level) and some are recommended. There is enough room for server vendors to create added value by delivering their software solution with the Fast Track.

The solution is targeted at large infrastructures running at least 1000 virtual machines per cluster. So it is an enterprise solution, not targeted at small and medium businesses.

This posting is a detailed summary of session VIR201 ‘Hyper-V Cloud Fast Track ‘ given at TechEd Europe 2010. The session can be seen and listened to via this link.

Cloud Computing is the next revolution in computing. Once every 5 to 10 years there is a dramatic change in IT-landscape. It all started with mainframes and dumb terminals, we got stand alone PC’s. Then we got desktops connected to servers, we got Server Based Computing, Software as a service, virtualization and now (private)cloud computing

Cloud Computing delivers new exciting features to the business consuming IT-services making it possible to quickly respond to new businesses. Self service portals enables business units to send change requests (for new virtual machines, additional storage and computing resources) using Webbased portals. After the request has been approved by the IT-department resources like virtual machines, CPU, memory or storage are automatically provisioned.

On the producing site (the IT-department) cloud computing delivers functionality to keep control over the life cycle of virtual machines, be able to forecast the need for additional resources, monitor and respond to alarms, report  and be able to chargeback costs of computing to the consumer.

If an organization decides to build a private cloud, three options are possible.
Either build the cloud computing infrastructure yourself on purchased hardware and software which is located on-premises.
Another option is to use the services of a Hyper-V Cloud Service Provider. Servers are located in an off-premises datacenter, the service provider makes sure networking, storage and computing power is provided. They also make sure the software is able to deliver Cloud computing  functions like charge back, self service portal and is ready to use. While doing it yourself  it takes the longest time to implement, using a service provider is the shortest time to implement.

There is a third option which is between doing it yourself and outsouring: Hyper-V Cloud Fast Track. This is a set of Microsoft validated blueprints and best practices developed by Microsoft Consulting Services and 6 server vendors. Those 6 represent over 80% of the server market. Instead of re-inventing the wheel by an organization wanting to jump on cloud computing, proven technology can be obtained from 6 hardware vendors (Dell, HP, IBM, Fujitsu, NEC and Hitachi). See for more info the Microsoft site
The technology is a set of hardware (servers and storage, software (Hyper-V/SCVMM and Self Service Portal 2.0) and services (experience and knowledge delivered by the hardware vendor).

Choosing for Hyper-V Cloud Fast Track solution has a couple of advantages:
reduce time to deploy. The hardware vendor has a selected number of configurations and best practices which is proven technology. It is ready to be installed without having to spend much time on inventory and design .
-reduce risk. The configurations are validated by the vendor to work. No risk on issues of components not working together. Performance is as designed and as expected.
-flexibility and choice. Several configurations can be chosen. Dell for example offers iSCSI storage, Fiber channel storage , blades and rack servers configurations.

See a video of the Dell Hyper-V Cloud Fast Track solution.

To me at the moment Hyper-V Fast Track seems to be more marketing related to impress the world about the solutions Microsoft can deliver for cloud computing. Microsoft is far behind VMware in it’s function offering for Infrastructure As A Service (IAAS). ESX is superieur to Hyper-V in being a hypervizor. The same accounts for vCenter Server for management versus System Center Virtual Machine Manager. Self Service Portal 2.0 far behind with functionality compared to VMware vCloud Director and additional software like vShield App.
While VMware has always been good in delivering superieur technology in it’s features (vMotion, storage vMotion) which appeals to IT-technicians, Microsoft has always been very good a luring IT-decision makers and higher management with perfect marketing material and distracting the functional shortcomings.

The website of Fujitsu, IBM, Hitachi and NEC only mention Hyper-V Fast Track but there is no reference architecure or detailed information to be found on the site.
Dell has four reference architectures available for download on their website, but none of them even mentions the VMM Self Service Portal 2.0 software! Delivering a self service portal to business units is what cloud computing distinguishes from server virtualization.  It is even a requirement for Hyper-V Cloud Fast Track!
I guess it only takes time before most of the 6 server vendors offer a really private cloud computing reference architecture.

The Hyper-V Cloud Fast Track solution consists of Hyper-V, System Center and Partner software technology. It is an open solution, the partner is free to add software solutions of its own (like management software).

One of the design principles for  hardware used in the Hyper-V Cloud Fast Track is that components and access to network and storage must be redundant. Each server needs to have multiple nics in a team. For iSCSI connections, at least 2 10 GBe nics or HBA’s are recommended. For the storage path MPIO most be used. VLAN trunks needs to be used to be able to split different type of networks and have control over the bandwidth usage of each network type by capping the bandwidth based on priorities. iSCSI traffic most likely wil be given more bandwidth than Live migration traffic. On a 10 GB pipe, iSCSI will typically get 5 GB while Live migration perfecly runs on 1 GB.

Although both iSCSI and Fiber Channel storage can be used, iSCSI storage is always required in the Fast Track solution as part of the solution. That is because clustering needs to be provided at the individual virtual machine level. Clustering at the host level (which ensures a VM is restarted on a remaining host if a host fails) is not enough to provide redundancy for cloud computing. Disk volumes inside a virtual machine can only be made available to multiple virtual machines using iSCSI. There is no such thing as a virtual Fiber Channel HBA in Hyper-V virtual machines.

If using a converged network, Quality of Service needs to be used to make sure certain types of network traffic can be priortized to make sure the virtual machines gets the guaranteed performance.

Management is an important part of the Hyper-V  Cloud Fast Track. Continious availability is very important aspect of cloud computing. To deliver that, the infrastructure needs to be monitored. If a failure is about to happen, actions need to be taken automatically to prevent downtime. For example, if the temperature in a Hyper-V server gets too high, System Center Operations Manager will notice that and initiate a Live migration of all virtual machines running on that host.

For file systems, Clustered Shared Volumes can be used, but also Melio FS for example. The server vendor delivering the Hyper-V Cloud Fast Track is free in selecting the cluster aware file system.

At the Private Cloud website a lot more of information can be found, like deployment guides.

Software iSCSI Initiator with Jumbo Frames vs Hardware dependant iSCSI Initiator without Jumbo Frames

A very interesting posting on the use of network interface cards having TCP Offload Engine versus  using the software iSCSI initiator in ESX.
A must read if you are using Broadcom NetXtreme II 5709 nics in your ESX host .

Virtualization of Sharepoint 2010

This poster created by Microsoft shows the steps that are recommended for virtualization of Sharepoint 2010 products.

Exchange Server 2010 virtualize or not? That’s the question

<update March 2012> Microsoft does not support running Exchange databases or transport queues on NFS storage, even when the NFS storage is presented to the Exchange VM as block level storage


In Exchange Server 2010 Microsoft introduced the Database Availability Group or DAG for short. A group of up to 16 Exchange 2010 Mailbox servers can be part of a DAG. A DAG offers redudancy at the database level for hardware, network and database failures. Basically, a mailbox store is made redundant by placing the data in at least two mailbox database located at physically different disksets.

As Exchange 201o does not require a lot of disk resources (IOPS, even a couple of low cost harddisks (Just a bunch of disks) JBOD wil in most cases be sufficient. Microsoft recommends to install Exchange Server 2010 on a physical server, with local storage (either inside the server, or if more capacity is needed, as direct attached storage).

But what about virtualization? Just as more and more organizations are recognizing the benefits of server virtualization (costs, power, disaster recovey etc) and even the most demanding applications are ready for virtualization, Microsoft tells to use local storage instead of shared storage and to use physical hardware.

What should be considered when deciding using physical hardware to use for your Exchange 2010 mailbox server which is member of a DAG:

-server hardware costs
-remote management cost (HP iLO for example needs a paid license)
-space in your serverroom or datacenter
-cooling costs
-power costs
-hardware maintenance costs
-Windows Server license costs
-installation costs (someone needs to install the server in a rack, plug it on the network, storage, install the operating system  etc)
-scalability. If more diskstorage is needed, can strorage easily be expanded without addional expenses for extra storage cabinets?
-security of your mail if the server gets stolen or if data is accidently not removed after phasing out of server

For Exchange 2010 running as a virtual machine you still have hardware costs. Lets say 15 VM’s can run on one host. The cost for server hardware is much less then for a phyiscal server. Same for power, cooling, maintenance, etc. If you buy a Windows Server Datacenter Edition, the license for your VM is free. The big difference in costs is the cost for shared storage.

Does Microsoft support running Exchange 2010 mailbox servers part of a DAG as a virtual machine?

Yes, they do, but under certain conditions:
1. you need to disable live migration of virtual machines
2. you need to disable automated re-start of VM’s if a host has failed
3. Microsoft does not support if Exchange data is stored on NAS devices. So storing an Exchange mailboxstore on a VMware NFS volume is not supported. It will work perfectly but Microsoft has not tested this scenario and thus does not support it.

See the Exchange 2010 system requirements . Under hardware virtualization it is stated:


Clustered root servers is another term for VMware ESX (Hyper-V ) hosts which are part of a VMware HA (Microsoft Fail Over cluster) enabled cluster .

To have Microsoft support your Exchange 2010 mailbox server virtual machine you should disable HA and disable DRS.

Microsoft states NFS or other NAS devices type of presenting data is not supported for Exchange 2010 in this article. The same is true of Exhange 2007.

The article states:

The storage used by the Exchange guest machine for storage of Exchange data (for example, mailbox databases or Hub transport queues) can be virtual storage of a fixed size (for example, fixed virtual hard disks (VHDs) in a Hyper-V environment), SCSI pass-through storage, or Internet SCSI (iSCSI) storage. Pass-through storage is storage that’s configured at the host level and dedicated to one guest machine. All storage used by an Exchange guest machine for storage of Exchange data must be block-level storage because Exchange 2010 doesn’t support the use of network attached storage (NAS) volumes. Also, NAS storage that’s presented to the guest as block-level storage via the hypervisor isn’t supported. The following virtual disk requirements apply for volumes used to store Exchange data:

vSphere 4 Update 1 offers new functionality in which HA and DRS can be disabled for an individual virtual machine, instead of for the whole ESX/ESXi host. This would mean you can run Exchange 2010 mailbox servers virtual machine on a VMware host and comply to the Microsoft condition. The Exchange vm will not be restarted in case of a host failure. And the VM will never be Vmotioned to another host.

How to configure a Exchange 2010 virtual machine running on ESX 4.x to comply to the Microsoft support policy?

A couple of steps needs to be taken on a ESX 4.x VMware HA\DRS cluster. Mind, when running ESX 4 you need Update 1 to be able to exclude a single VM from restart after a host failure.

1- create an anti-affinity rule to make sure the Exchange 2010 DAG servers are never run on the same ESX host. If the host fails, your DAG is gone.

2- disable DRS for the Exchange 2010 VM

3-disableVirtual Machine restart for the Exchange 2010 when the ESX hosts fails

Create anti-affinity rule of all Exchange 2010 mailbox servers

1 In the vSphere Client, display the cluster in the inventory.
2 Right-click the cluster and select Edit Settings.
3 In the left pane of the Cluster Settings dialog box under VMware DRS, select Rules.
4 Click Add.
5 In the Rule dialog box, type a name for the rule. (something like ‘separate Exchange MBX servers’)
6 From the Type drop-down menu, select a rule.
-select Separate Virtual Machines.
7 Click Add and select the two Exchange 2010 virtual machines to which the rule applies, and click OK. If you hav more than two Exchange 2010 mailbox servers, you should create another two anti-affinity rules to keep server 1 and 2 separated, server 1 and 3 separated and server 2 and 3 separated.
8 Click OK to save the rule and close the dialog box.

Enable Strict Enforcement of Affinity Rules
To ensure that affinity rules are strictly applied, set an advanced option for VMware DRS. Setting the advanced option ForceAffinePoweron to 1 will enable strict enforcement of the affinity rules you

1 In the vSphere Client, display the cluster in the inventory.
2 Right-click the cluster and select Edit Settings.
3 In the left pane of the Cluster Settings dialog box, select VMware DRS and click Advanced Options.
4 In the Option column, enter ForceAffinePoweron.
5 In the Value column, enter 1.
6 Click OK.

Disable Automation Level for DRS
You must set the automation level of all Exchange 2010 mailbox servers to Disabled. When you disable the DRS automation level for a virtual machine, vCenter Server will not migrate the virtual machine to another host or provide migration recommendations for it.
1 In the vSphere Client, display the cluster in the inventory.
2 Right-click the cluster and select Edit Settings.
3 In the left pane under VMware DRS, select Virtual Machine Options.
4 Select the Enable individual virtual machine automation levels check box.
5 Change the virtual machine automation level for each Exchange 2010 mailbox server virtual machine in the cluster.
a In the Virtual Machine column, select the virtual machine.
b In the Automation Level column, select Disabled from the drop-down menu.
6 Click OK.

Disable Virtual Machine Restart Priority
Restart priority determines the order in which virtual machines are restarted when the host fails. Because HA does not obey affinity or anti-affinity rules, you must set the VM Restart Priority of  Exchange 2010 vm’s to Disabled.
By default, the same restart priority is used for all virtual machines in a cluster. You can override the default setting for specific virtual machines.

1 In the vSphere Client, display the cluster in the inventory.
2 Right-click the cluster and select Edit Settings.
3 In the left pane under VMware HA, click Virtual Machine Options.
4 Change the virtual machine restart priority for each Exchange 2010 virtual machine in the cluster.
a In the Virtual Machine column, select the virtual machine.
b In the VM Restart Priority column, select Disabled from the drop-down menu.
5 Click OK

How to configure a Exchange 2010 virtual machine running on  Hyper-V R2 failover cluster to comply to the Microsoft support policy?

I have not tested this myself. You might put the Exchange 2010 mailbox servers in a separate resource group in Failover cluster Manager. Then select the properties and disable auto Start. This will ensure the virtual machine will not restart after a host failure.

Not sure how to disable Pro Tips for single VM’s.

Read more about this at Jonathan Medd blog:

Mind there is a limit to the oversubscription ratio of virtual processors versus logical processors or 2:1 when running Exchange 2010 as a virtual machine.

This means for example your host has 4 quadcore CPU’s (equals 16 logical processors). To have Microsoft support your configuration, you can define a maximum of 32 virtual cpu’s on that host for all virtual machines!

If you decided to install Exchange 2010 on a virtual machine running on Hyper-V, an error might be shown: “An error occurred with error code ‘2147504141’ and message ‘The property cannot be found in the cache.'” To solve this, disable the time synchronizatio between the Hyper-V host and the VM.;en-us;980050&sd=rss&spid=13965

More info:

In november 2010 VMware published a white paper on virtualization of Exchange Server 201. Read more at this post:

One of the sessions on VMworld 2010 was about virtualizing Exchange Server 2010 mailbox roles. Session EA7849-Design, Deploy, and Optimize Microsoft Exchange Server 2010 was presented at VMworld  2010 USA by Alex Fontana of VMware. See and listen to the presentation by downloading the MPEG4 file from the URL below.,-Deploy,-and-Optimize-Microsoft-Exchange-Server-2010-on-vSphere.mp4.html

A report on the session can be read here

Info on setting up Windows Guest clustering in vSphere

Microsoft Virtualization Best Practises for Exchange 2010.   Interesting presentation of virtualization of Exchange 2010.

Aidan Finn blogpost Exchange Support Policy for Virtualization

Exchange 2010 support in vSphere

Gerben Kloosterman has collected a lot of links to info on Exchange Server 2010 running on vSphere.

VMware blog: Exchange 2010 Disk I/O on vSphere

VMware Blog: Exchange 2010 scale up performance on vSphere

VMware blog : Scaleout performance of Exchange 2010 mailbox server vms on vsphere 4

Measuring the Performance Impact of Exhange 2010 DAG Database Copies

Going Virtual with Exchange 2010

The ClearPath blog contains a lot of useful information on Exchange 2010 running on vSphere.
Part 1 of the  ‘Exchange 2010 on vSphere 4, Best Practices’ series discusses proper Exchange 2010 sizing and requirements around Client Access, Hub Transport, and Mailbox Server roles. In part 2 of Exchange 2010 on vSphere 4, Best Practices, the focus is on  vSphere 4 environment and shows VMware’s and Microsoft’s support Best Practices for ESX cluster and Virtual Machines. Part 3 adds everything up and walks through an example deployment following the guidelines set in the previous installments.

Exchange 2010 on vSphere 4, Best Practices Part 1 by Ryan Williams, Principal Consultant of Clearpath Solutions Group
Exchange 2010 on vSphere 4, Best Practices Part 2 by Ryan Williams, Principal Consultant of Clearpath Solutions Group
Exchange 2010 on vSphere 4, Best Practices Part 3 by Ryan Williams, Principal Consultant of Clearpath Solutions Group

Clustered Shared Volumes explained, design impact and best practises

Microsoft introduced Clustered Shared Volumes (CSV) in Windows Server 2008 R2. CSV enables a Microsoft NTFS formated diskvolume to be simultaneoulsy be accessed by multiple Hyper-V hosts. This enables Live Migration of vitual machines and fail-over clustering of Hyper-v hosts. Very nice. I did some research on what is going on under the hood and will list some best practises on designing CSV volumes. Also this blogposting mentions some design issues when you are using more than a few Hyper-V nodes in a cluster while using CSV.

One important thing: the storage array hosting the CSV volumes will need to support SCSI-3 persistent reservations! HP, Dell etc iSCSI solutions support this, but the small and medium business solutions will not always support this. Keep that in mind when deciding for your iSCSI storage solution.

First, there is not much best practise information to be found on the Net. A lot of articles about CSV and clustering can be found on the internet, but most of them discuss the technology. The only info I found was written by Microsoft and had very basic and obvious information.

The Coordinator node
While all nodes in a cluster can write and read data to a CSV volume, there is one node responsible for changing the meta data. This node is called the Coordinator Node. Each CSV has one Coordinator node. If you have multiple CSV nodes available in a cluster, the Coordinator nodes for each CSV are evenly spread over the nodes. If the coordinator node fails, automatically another node will take over this role. It will probably result in a short pauze of diskaccess. Not sure if virtual machines will suffer from this. CSV volumes can only be used by Hyper-V hosts to store virtual disk files on (VHD). Do not try to store any other data on it because this could lead to corruption of the data.

Direct and redirect I/O
A CSV volume can be accessed by two networks. The first and obviously preferred network is over the iSCSI network. Each node has one or more NICs or iSCSI HBA’s attached to one or two switches. The switches are connected to the iSCSI storage array. This is called Direct  I/O.
If the iSCSI network is not available, because for instance the NIC used for iSCSI fails, or a cable is unplugged an alternative path is selected. Data is transfered over the internal cluster network to the Coordinator Node. The coordinator node then forwards the data over it’s iSCSI network to the storage. This method is called Redirected I/O. Depending on the amount of storage I/O this can lead to some lose in performance.

Copy VHD into CSV
When copying data from regular volumes (network, USB drive, C: drive) to the CSV volume (shortcut located at c:\clusterstorage\Volume#) performe the copy  from the node having the Coordinator node role. This wil ensure the copy is done as fast as possible. If done on another node, the data will be routed via the Coordinator node as filecopy is a metadata transfer. See this post “Copying VHDs onto a CSV Disk? Use the Coordinator Node!” at

Copy VHD from one CSV volume to another
If you are using Dell Equallogic iSCSI storage and want to copy or move VHD files from one volume to another, the use of the EqlXcp utility might speed things up a bit. This because the Equallogic is tranfering the data internally instead of copying it to the Window server and copy it over to the destination volume. More info at

Best practises for CSV
The recommended size of a CSV can be anything between 1TB and 2TB. The maximum size is 256 TB. If a large CSV volume fails it will have more impact that if a smaller sized CSV fails. On the other hand, more smaller sized volumes makes administration a bit more complex and it could also lead to issues with CSV reservations. Later in this article more on that.
To plan the total number of TB’s storage needed, count the data used by the virtual machine virtual disks and add extra used for snapshots, saved state etc. Something like 15 % more than the sum of all VHD’s would be okay. Make sure there is at least 512 MB free disk space available at all times on any volume.

Make sure the folder c:\clusterstorage is excluded for virusscanning. Scanning the VHD files located on this shortcut could lead to issues.
The RAID-level for the LUN which holds the CSV depends on available RAID levels, requirements for availability and applications and budget. Also exclude VMMS.exe and VMWP.exe from scanning. Run antivirus in your virtual machines instead. If using the Core installation anti virus on the Hyper-V might not be needed at all because the small diskfootprint.

Use an even number of CSV’s (if two controllers are used)

Performance results when using two LUNs (one CSV on each LUN) is 2% to 8% better, depending on the number of VMs, than when using only one CSV and LUN. This is because with two LUNs, each is managed by one of the SAN controllers. If only one LUN exists, the controller managing that LUN must service all requests, eliminating the performance benefits of the second controller. Requests sent to the secondary controller are proxied to the managing controller to be serviced, increasing service times.

Multipath IO

Microsoft MPIO (Multipathing Input/Output) or storage network load balancing provides the logical facility for routing I/O over redundant hardware paths connecting server to storage. These redundant hardware paths are made up of components such as the cabling, Host Bus Adapters (HBA’s), Switches, and Storage Controllers and possibly even power. MPIO solutions logically manage these redundant connections so that I/O requests can be rerouted in the event that a component along one path fails.
As more and more data is consolidated on Storage Area Networks (SAN’s), the potential loss of access to storage resources is unacceptable. To mitigate this risk, high availability solutions, like MPIO, have now become a requirement.

After installing the MPIO framework (add Feature in Windows Server ) either the Microsoft DSM or a storage vendor supplied DSM can be installed. The later has more knowledge of the underlying storage capabilities and will result in better performance. Mind that not all storage vendors DSM support CSV volumes. HP DSM for EVA supports CSV’s since July 2010.
There is a penalty however when selecting MPIO. Because using MPIO more paths are enabled to the storage, the storage needs to handle more iSCSI sessions.

-A cluster with two nodes, one NIC per node not using MPIO has 3 iSCSI sessions (one for the initial connection and one for each NIC)
-A cluster with two nodes, one NIC per node using MPIO has 5 ISCSI session (one for the initial and two for each NIC).

When the number of nodes in a cluster increases, the number of iSCSI sessions to the storage increases as well. Having more nodes than the storage can handle will result in CSV volumes not available.
Read more here

This problem is not related to CSV but to the firmware of the iSCSI storage. Also when you are using VMware and iSCSI SAN you can have problems with lost iSCSI connections. Read the article  ‘iSCSI connections on Dell Equallogic PSseries’  at which describes the limits of the various Equallogic series and how to calculate the limit!

Storage limitations
Your iSCSI storage can have a maximum number of Persistent Reservations. The Dell Equallogic PS series installed with firmware 4.x has a limited of 32 PR per CSV volume. That will be changed to 96 in the v5.0 firmware due out this summer.
Read this article  Microsoft Windows 2008 R2 CSV and Equallogic SAN

and this forum

CSV limits

If you are using HP LeftHand NSM or HP StorageWorks P4000 SAN and having issues with Windows clusters see this HP article with a patch

Microsoft Windows 2008 or Microsoft 2008 R2 clusters might experience resource failures in large configurations. Any combination of Microsoft cluster nodes, Multi-path I/O (MPIO) network interface cards (NICs) per cluster node and storage nodes that results in more than 31 iSCSI sessions per volume is affected by this issue. If the number of Microsoft cluster nodes plus the number of MPIO NICs per cluster node multiplied by the number of HP LeftHand or HP StorageWorks P4000 storage nodes exceeds 31 for a particular volume, the Microsoft cluster will not function and will fail the Microsoft cluster validation test.

Patch 10069-00 addresses this issue by removing the 31-session limit.

The most important issue to consider in your storage design is the number of SCSI reservations and the number of SCSI registrations. As a CSV can be accessed at the same time by multiple hosts, there need to be some sort of mechanism to prevent data corruption when multiple hosts want to change data. A reservation is a kind of lock on a file or on metadata of the volume.Reservations can be persistent (always there) or non-persistent (reservation is released after finishing the write). SCSI-3 PR (Persistent reservation) uses a concept of registration and reservation. Systems that participate, register a key with SCSI-3 device. Each system registers its own key. Then registered systems can establish a reservation. With this method, blocking write access is as simple as removing registration from a deviceiSCSI reservations:
For each CSV served out on an iSCSI array one SCSI reservation is needed. This reservation is done by the Coordinator node.
So if you have 20 CSV’s in your cluster, your storage array should support 20 reservations.iSCSI registrations:
The maximum number of SCSI registrations your storage array can handle depends on the type of reservation used by the storage:Type 1 storage that require a registration per path
Type 2 storage that require a registration per InitiatorThe calculation is nicely described in this article by Hans Bredevoort type 1 storage the number of needed registrations is <number of paths> x <number of initiators> X <number of nodes in cluster> x <number of CSV volumes available in cluster>

For type 2 storage the number of needed registrations is <number of nodes> x <number of initiators>

So far an overview of the maximum for SCSI Persistent Reservation per storage is not available.

If you are having problems with the maximum number of nodes supported you might consider to not using NTFS as filesystem and CSV as a filter above it. Sanbolic offers Melio 2010 as a shared volume filesystem. It is designed to be accessed by multiple hosts and has many advantages over CSV. See for more info

This whitepaper explains  about optimized storage solution for Enterprise Hyper-V Deployments using Melio FS.

Melio FS offers quality of service for storage. This can be compared to the Storage I/O Control feature introduced in VMware vSphere 4.1 More information on the storage Quality of Service feature of Melio FS can be read here.

An overview of Sanbolic Melio can be read at at excellent blogsite of Aidan Finn

Additional reading

This is a very interesting blog on CSV titled Cluster Shared Volume (CSV) Inside Out 

Factors to consider when deciding how many VHDs per CSV This document attempts to provide some guidance to best practices – providing exact numbers is outside the scope of this document. The goal of this document is to provide a set of questions that need to be answered for each individual deployment in order to decide how many VHDs should be placed per CSV volume.

New Cluster Docs for Cluster Shared Volumes (CSV) & Migration
Recommendations for Using Cluster Shared Volumes in a Failover Cluster in Windows Server 2008 R2

The factors that influence “How many Cluster Shared Volumes [CSV] in a Cluster & VHD’s per CSV”

How do Cluster Shared Volumes work in Windows Server 2008 R2?

Hyper-V Server R2: a few additional thoughts

Best practices for deploying an HP EVA array with Microsoft Hyper-V R2

Best practise: SQL Server on VMware ESX

VMware release a new whitepaper describing SQL Server on ESX Best Practises:

Taken from the document:

“Microsoft SQL Server is a very popular and widely deployed general purpose database server supported on Windows Server operating systems. Microsoft has consistently invested to ensure that SQL Server provides a comprehensive general purpose database platform that is competitive with Oracle, IBM DB2, and MySQL.

Because SQL Server is such a general purpose database server, applications that use SQL Server have very diverse resource requirements. It is very important that you clearly understand both the specific application requirements across important dimensions (including performance, capacity, headroom, and availability), and the virtualization context (including consolidation, scale-up, and scale-out models) when you deploy SQL Server on VMware® Infrastructure.

This best practices paper is for database administrators and IT administrators who are seriously considering virtualizing SQL Server and need to understand:

How to characterize their SQL Server databases for virtualization,

VMware Infrastructure best practices for SQL Server — best practice guidelines for ensuring your VMware Infrastructure platform is properly configured for running SQL Server, including guidance in the areas of CPU, memory, storage, and networking

Deployment and operations — best practice guidelines for deploying SQL Server on VMware Infrastructure focusing on monitoring, management, backup, and related common operational best practices

The recommendations in this paper are not specific to any particular set of hardware or to the size and scope of the SQL Server implementation. The examples and considerations in this document provide guidance only and do not represent strict design requirements. VMware encourages you to use this guidance but also to play close attention to your specific workloads and deployment requirements.

To get the most out of this paper, you should have a basic knowledge of VMware Infrastructure 3 and recent versions of SQL Server (2005 or 2008).”

%d bloggers like this: