Clustered Shared Volumes explained, design impact and best practises

Microsoft introduced Clustered Shared Volumes (CSV) in Windows Server 2008 R2. CSV enables a Microsoft NTFS formated diskvolume to be simultaneoulsy be accessed by multiple Hyper-V hosts. This enables Live Migration of vitual machines and fail-over clustering of Hyper-v hosts. Very nice. I did some research on what is going on under the hood and will list some best practises on designing CSV volumes. Also this blogposting mentions some design issues when you are using more than a few Hyper-V nodes in a cluster while using CSV.

One important thing: the storage array hosting the CSV volumes will need to support SCSI-3 persistent reservations! HP, Dell etc iSCSI solutions support this, but the small and medium business solutions will not always support this. Keep that in mind when deciding for your iSCSI storage solution.

First, there is not much best practise information to be found on the Net. A lot of articles about CSV and clustering can be found on the internet, but most of them discuss the technology. The only info I found was written by Microsoft and had very basic and obvious information.

The Coordinator node
While all nodes in a cluster can write and read data to a CSV volume, there is one node responsible for changing the meta data. This node is called the Coordinator Node. Each CSV has one Coordinator node. If you have multiple CSV nodes available in a cluster, the Coordinator nodes for each CSV are evenly spread over the nodes. If the coordinator node fails, automatically another node will take over this role. It will probably result in a short pauze of diskaccess. Not sure if virtual machines will suffer from this. CSV volumes can only be used by Hyper-V hosts to store virtual disk files on (VHD). Do not try to store any other data on it because this could lead to corruption of the data.

Direct and redirect I/O
A CSV volume can be accessed by two networks. The first and obviously preferred network is over the iSCSI network. Each node has one or more NICs or iSCSI HBA’s attached to one or two switches. The switches are connected to the iSCSI storage array. This is called Direct  I/O.
If the iSCSI network is not available, because for instance the NIC used for iSCSI fails, or a cable is unplugged an alternative path is selected. Data is transfered over the internal cluster network to the Coordinator Node. The coordinator node then forwards the data over it’s iSCSI network to the storage. This method is called Redirected I/O. Depending on the amount of storage I/O this can lead to some lose in performance.

Copy VHD into CSV
When copying data from regular volumes (network, USB drive, C: drive) to the CSV volume (shortcut located at c:\clusterstorage\Volume#) performe the copy  from the node having the Coordinator node role. This wil ensure the copy is done as fast as possible. If done on another node, the data will be routed via the Coordinator node as filecopy is a metadata transfer. See this post “Copying VHDs onto a CSV Disk? Use the Coordinator Node!” at http://blogs.msdn.com/b/clustering/archive/2009/12/09/9934381.aspx

Copy VHD from one CSV volume to another
If you are using Dell Equallogic iSCSI storage and want to copy or move VHD files from one volume to another, the use of the EqlXcp utility might speed things up a bit. This because the Equallogic is tranfering the data internally instead of copying it to the Window server and copy it over to the destination volume. More info at  http://marcmalotke.net/2010/06/28/equallogic-hit-3-4-0-eqlxcp-command/

Best practises for CSV
The recommended size of a CSV can be anything between 1TB and 2TB. The maximum size is 256 TB. If a large CSV volume fails it will have more impact that if a smaller sized CSV fails. On the other hand, more smaller sized volumes makes administration a bit more complex and it could also lead to issues with CSV reservations. Later in this article more on that.
To plan the total number of TB’s storage needed, count the data used by the virtual machine virtual disks and add extra used for snapshots, saved state etc. Something like 15 % more than the sum of all VHD’s would be okay. Make sure there is at least 512 MB free disk space available at all times on any volume.

Make sure the folder c:\clusterstorage is excluded for virusscanning. Scanning the VHD files located on this shortcut could lead to issues.
The RAID-level for the LUN which holds the CSV depends on available RAID levels, requirements for availability and applications and budget. Also exclude VMMS.exe and VMWP.exe from scanning. Run antivirus in your virtual machines instead. If using the Core installation anti virus on the Hyper-V might not be needed at all because the small diskfootprint.

Use an even number of CSV’s (if two controllers are used)

Performance results when using two LUNs (one CSV on each LUN) is 2% to 8% better, depending on the number of VMs, than when using only one CSV and LUN. This is because with two LUNs, each is managed by one of the SAN controllers. If only one LUN exists, the controller managing that LUN must service all requests, eliminating the performance benefits of the second controller. Requests sent to the secondary controller are proxied to the managing controller to be serviced, increasing service times.

Multipath IO

Microsoft MPIO (Multipathing Input/Output) or storage network load balancing provides the logical facility for routing I/O over redundant hardware paths connecting server to storage. These redundant hardware paths are made up of components such as the cabling, Host Bus Adapters (HBA’s), Switches, and Storage Controllers and possibly even power. MPIO solutions logically manage these redundant connections so that I/O requests can be rerouted in the event that a component along one path fails.
As more and more data is consolidated on Storage Area Networks (SAN’s), the potential loss of access to storage resources is unacceptable. To mitigate this risk, high availability solutions, like MPIO, have now become a requirement.

After installing the MPIO framework (add Feature in Windows Server ) either the Microsoft DSM or a storage vendor supplied DSM can be installed. The later has more knowledge of the underlying storage capabilities and will result in better performance. Mind that not all storage vendors DSM support CSV volumes. HP DSM for EVA supports CSV’s since July 2010.
There is a penalty however when selecting MPIO. Because using MPIO more paths are enabled to the storage, the storage needs to handle more iSCSI sessions.

-A cluster with two nodes, one NIC per node not using MPIO has 3 iSCSI sessions (one for the initial connection and one for each NIC)
-A cluster with two nodes, one NIC per node using MPIO has 5 ISCSI session (one for the initial and two for each NIC).

When the number of nodes in a cluster increases, the number of iSCSI sessions to the storage increases as well. Having more nodes than the storage can handle will result in CSV volumes not available.
Read more here http://forums13.itrc.hp.com/service/forums/questionanswer.do?admit=109447627+1280478584486+28353475&threadId=1409041

This problem is not related to CSV but to the firmware of the iSCSI storage. Also when you are using VMware and iSCSI SAN you can have problems with lost iSCSI connections. Read the article  ‘iSCSI connections on Dell Equallogic PSseries’  at  http://www.2vcps.com/2010/02/16/iscsi-connections-eq/ which describes the limits of the various Equallogic series and how to calculate the limit!

Storage limitations
Your iSCSI storage can have a maximum number of Persistent Reservations. The Dell Equallogic PS series installed with firmware 4.x has a limited of 32 PR per CSV volume. That will be changed to 96 in the v5.0 firmware due out this summer.
Read this article  Microsoft Windows 2008 R2 CSV and Equallogic SAN
http://blog.wortell.nl/maartenw/microsoft-windows-2008-r2-csv-and-equallogic-san-wortell/

and this forum
http://www.delltechcenter.com/thread/4007957/Microsoft+Windows+2008+R2+CSV+and+Equallogic+SAN/

CSV limits

If you are using HP LeftHand NSM or HP StorageWorks P4000 SAN and having issues with Windows clusters see this HP article with a patch

Microsoft Windows 2008 or Microsoft 2008 R2 clusters might experience resource failures in large configurations. Any combination of Microsoft cluster nodes, Multi-path I/O (MPIO) network interface cards (NICs) per cluster node and storage nodes that results in more than 31 iSCSI sessions per volume is affected by this issue. If the number of Microsoft cluster nodes plus the number of MPIO NICs per cluster node multiplied by the number of HP LeftHand or HP StorageWorks P4000 storage nodes exceeds 31 for a particular volume, the Microsoft cluster will not function and will fail the Microsoft cluster validation test.

Patch 10069-00 addresses this issue by removing the 31-session limit.

       

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02429763&dimid=1159776690&dicid=alr_jul10&jumpid=em_alerts/us/jul10/all/xbu/emailsubid/mrm/mcc/loc/rbu_category/alerts
The most important issue to consider in your storage design is the number of SCSI reservations and the number of SCSI registrations. As a CSV can be accessed at the same time by multiple hosts, there need to be some sort of mechanism to prevent data corruption when multiple hosts want to change data. A reservation is a kind of lock on a file or on metadata of the volume.Reservations can be persistent (always there) or non-persistent (reservation is released after finishing the write). SCSI-3 PR (Persistent reservation) uses a concept of registration and reservation. Systems that participate, register a key with SCSI-3 device. Each system registers its own key. Then registered systems can establish a reservation. With this method, blocking write access is as simple as removing registration from a deviceiSCSI reservations:
For each CSV served out on an iSCSI array one SCSI reservation is needed. This reservation is done by the Coordinator node.
So if you have 20 CSV’s in your cluster, your storage array should support 20 reservations.iSCSI registrations:
The maximum number of SCSI registrations your storage array can handle depends on the type of reservation used by the storage:Type 1 storage that require a registration per path
Type 2 storage that require a registration per InitiatorThe calculation is nicely described in this article by Hans Bredevoort http://hyper-v.nu/blogs/hans/?p=292For type 1 storage the number of needed registrations is <number of paths> x <number of initiators> X <number of nodes in cluster> x <number of CSV volumes available in cluster>

For type 2 storage the number of needed registrations is <number of nodes> x <number of initiators>

So far an overview of the maximum for SCSI Persistent Reservation per storage is not available.

Solution:
If you are having problems with the maximum number of nodes supported you might consider to not using NTFS as filesystem and CSV as a filter above it. Sanbolic offers Melio 2010 as a shared volume filesystem. It is designed to be accessed by multiple hosts and has many advantages over CSV. See for more info http://sanbolic.com/blog/?tag=csv

This whitepaper http://www.sanbolic.com/pdfs/EMC_Sanbolic_MS_POC-Final.pdf explains  about optimized storage solution for Enterprise Hyper-V Deployments using Melio FS.

Melio FS offers quality of service for storage. This can be compared to the Storage I/O Control feature introduced in VMware vSphere 4.1 More information on the storage Quality of Service feature of Melio FS can be read here.

An overview of Sanbolic Melio can be read at at excellent blogsite of Aidan Finn http://www.aidanfinn.com/?p=10496

Additional reading

This is a very interesting blog on CSV titled Cluster Shared Volume (CSV) Inside Out 

Factors to consider when deciding how many VHDs per CSV This document attempts to provide some guidance to best practices – providing exact numbers is outside the scope of this document. The goal of this document is to provide a set of questions that need to be answered for each individual deployment in order to decide how many VHDs should be placed per CSV volume.

New Cluster Docs for Cluster Shared Volumes (CSV) & Migration
http://blogs.msdn.com/b/clustering/archive/2010/01/26/9953375.aspx
Recommendations for Using Cluster Shared Volumes in a Failover Cluster in Windows Server 2008 R2
http://technet.microsoft.com/en-us/library/ff182320(WS.10).aspx

The factors that influence “How many Cluster Shared Volumes [CSV] in a Cluster & VHD’s per CSV”
http://itinfras.blogspot.com/2010/07/factors-that-influence-how-many-cluster.html

How do Cluster Shared Volumes work in Windows Server 2008 R2?
http://www.windowsitpro.com/article/tips/q-how-do-cluster-shared-volumes-work-in-windows-server-2008-r2-.aspx

Hyper-V Server R2: a few additional thoughts
http://it20.info/blogs/main/archive/2009/03/19/196.aspx

Best practices for deploying an HP EVA array with Microsoft Hyper-V R2

Advertisements

About Marcel van den Berg
I am a technical consultant with a strong focus on server virtualization, desktop virtualization, cloud computing and business continuity/disaster recovery.

One Response to Clustered Shared Volumes explained, design impact and best practises

  1. Pingback: Hyper-V R2 best practices and tips for designing and building your infrastructure « UP2V

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: