Clustered Shared Volumes explained, design impact and best practises
July 30, 2010 1 Comment
Microsoft introduced Clustered Shared Volumes (CSV) in Windows Server 2008 R2. CSV enables a Microsoft NTFS formated diskvolume to be simultaneoulsy be accessed by multiple Hyper-V hosts. This enables Live Migration of vitual machines and fail-over clustering of Hyper-v hosts. Very nice. I did some research on what is going on under the hood and will list some best practises on designing CSV volumes. Also this blogposting mentions some design issues when you are using more than a few Hyper-V nodes in a cluster while using CSV.
One important thing: the storage array hosting the CSV volumes will need to support SCSI-3 persistent reservations! HP, Dell etc iSCSI solutions support this, but the small and medium business solutions will not always support this. Keep that in mind when deciding for your iSCSI storage solution.
First, there is not much best practise information to be found on the Net. A lot of articles about CSV and clustering can be found on the internet, but most of them discuss the technology. The only info I found was written by Microsoft and had very basic and obvious information.
The Coordinator node
While all nodes in a cluster can write and read data to a CSV volume, there is one node responsible for changing the meta data. This node is called the Coordinator Node. Each CSV has one Coordinator node. If you have multiple CSV nodes available in a cluster, the Coordinator nodes for each CSV are evenly spread over the nodes. If the coordinator node fails, automatically another node will take over this role. It will probably result in a short pauze of diskaccess. Not sure if virtual machines will suffer from this. CSV volumes can only be used by Hyper-V hosts to store virtual disk files on (VHD). Do not try to store any other data on it because this could lead to corruption of the data.
Direct and redirect I/O
A CSV volume can be accessed by two networks. The first and obviously preferred network is over the iSCSI network. Each node has one or more NICs or iSCSI HBA’s attached to one or two switches. The switches are connected to the iSCSI storage array. This is called Direct I/O.
If the iSCSI network is not available, because for instance the NIC used for iSCSI fails, or a cable is unplugged an alternative path is selected. Data is transfered over the internal cluster network to the Coordinator Node. The coordinator node then forwards the data over it’s iSCSI network to the storage. This method is called Redirected I/O. Depending on the amount of storage I/O this can lead to some lose in performance.
Copy VHD into CSV
When copying data from regular volumes (network, USB drive, C: drive) to the CSV volume (shortcut located at c:\clusterstorage\Volume#) performe the copy from the node having the Coordinator node role. This wil ensure the copy is done as fast as possible. If done on another node, the data will be routed via the Coordinator node as filecopy is a metadata transfer. See this post “Copying VHDs onto a CSV Disk? Use the Coordinator Node!” at http://blogs.msdn.com/b/clustering/archive/2009/12/09/9934381.aspx
Copy VHD from one CSV volume to another
If you are using Dell Equallogic iSCSI storage and want to copy or move VHD files from one volume to another, the use of the EqlXcp utility might speed things up a bit. This because the Equallogic is tranfering the data internally instead of copying it to the Window server and copy it over to the destination volume. More info at http://marcmalotke.net/2010/06/28/equallogic-hit-3-4-0-eqlxcp-command/
Best practises for CSV
The recommended size of a CSV can be anything between 1TB and 2TB. The maximum size is 256 TB. If a large CSV volume fails it will have more impact that if a smaller sized CSV fails. On the other hand, more smaller sized volumes makes administration a bit more complex and it could also lead to issues with CSV reservations. Later in this article more on that.
To plan the total number of TB’s storage needed, count the data used by the virtual machine virtual disks and add extra used for snapshots, saved state etc. Something like 15 % more than the sum of all VHD’s would be okay. Make sure there is at least 512 MB free disk space available at all times on any volume.
Make sure the folder c:\clusterstorage is excluded for virusscanning. Scanning the VHD files located on this shortcut could lead to issues.
The RAID-level for the LUN which holds the CSV depends on available RAID levels, requirements for availability and applications and budget. Also exclude VMMS.exe and VMWP.exe from scanning. Run antivirus in your virtual machines instead. If using the Core installation anti virus on the Hyper-V might not be needed at all because the small diskfootprint.
Use an even number of CSV’s (if two controllers are used)
Performance results when using two LUNs (one CSV on each LUN) is 2% to 8% better, depending on the number of VMs, than when using only one CSV and LUN. This is because with two LUNs, each is managed by one of the SAN controllers. If only one LUN exists, the controller managing that LUN must service all requests, eliminating the performance benefits of the second controller. Requests sent to the secondary controller are proxied to the managing controller to be serviced, increasing service times.
Microsoft MPIO (Multipathing Input/Output) or storage network load balancing provides the logical facility for routing I/O over redundant hardware paths connecting server to storage. These redundant hardware paths are made up of components such as the cabling, Host Bus Adapters (HBA’s), Switches, and Storage Controllers and possibly even power. MPIO solutions logically manage these redundant connections so that I/O requests can be rerouted in the event that a component along one path fails.
As more and more data is consolidated on Storage Area Networks (SAN’s), the potential loss of access to storage resources is unacceptable. To mitigate this risk, high availability solutions, like MPIO, have now become a requirement.
After installing the MPIO framework (add Feature in Windows Server ) either the Microsoft DSM or a storage vendor supplied DSM can be installed. The later has more knowledge of the underlying storage capabilities and will result in better performance. Mind that not all storage vendors DSM support CSV volumes. HP DSM for EVA supports CSV’s since July 2010.
There is a penalty however when selecting MPIO. Because using MPIO more paths are enabled to the storage, the storage needs to handle more iSCSI sessions.
-A cluster with two nodes, one NIC per node not using MPIO has 3 iSCSI sessions (one for the initial connection and one for each NIC)
-A cluster with two nodes, one NIC per node using MPIO has 5 ISCSI session (one for the initial and two for each NIC).
When the number of nodes in a cluster increases, the number of iSCSI sessions to the storage increases as well. Having more nodes than the storage can handle will result in CSV volumes not available.
Read more here http://forums13.itrc.hp.com/service/forums/questionanswer.do?admit=109447627+1280478584486+28353475&threadId=1409041
This problem is not related to CSV but to the firmware of the iSCSI storage. Also when you are using VMware and iSCSI SAN you can have problems with lost iSCSI connections. Read the article ‘iSCSI connections on Dell Equallogic PSseries’ at http://www.2vcps.com/2010/02/16/iscsi-connections-eq/ which describes the limits of the various Equallogic series and how to calculate the limit!
Your iSCSI storage can have a maximum number of Persistent Reservations. The Dell Equallogic PS series installed with firmware 4.x has a limited of 32 PR per CSV volume. That will be changed to 96 in the v5.0 firmware due out this summer.
Read this article Microsoft Windows 2008 R2 CSV and Equallogic SAN
If you are using HP LeftHand NSM or HP StorageWorks P4000 SAN and having issues with Windows clusters see this HP article with a patch
Microsoft Windows 2008 or Microsoft 2008 R2 clusters might experience resource failures in large configurations. Any combination of Microsoft cluster nodes, Multi-path I/O (MPIO) network interface cards (NICs) per cluster node and storage nodes that results in more than 31 iSCSI sessions per volume is affected by this issue. If the number of Microsoft cluster nodes plus the number of MPIO NICs per cluster node multiplied by the number of HP LeftHand or HP StorageWorks P4000 storage nodes exceeds 31 for a particular volume, the Microsoft cluster will not function and will fail the Microsoft cluster validation test.
Patch 10069-00 addresses this issue by removing the 31-session limit.