19 controllers removed from VMware Virtual SAN (VSAN)compatibility list

VMware announced in this KB article 19 discontrollers initially listed on the VSAN compatibility list have been removed from this list. This makes the controllers unsupported for usage in VSAN configurations.

The reason for this removal is according to VMware’s KB article:

As part of VMware’s ongoing testing and certification efforts on Virtual SAN compatible hardware, VMware has decided to remove these controllers from the Virtual SAN compatibility list. While fully functional, these controllers offer too low IO throughput to sustain the performance requirements of most VMware environments. Because of the low queue depth offered by these controllers, even a moderate IO rate could result in IO operations timing out, especially during disk rebuild operations. In this event, the controller may be unable to cope with both a rebuild activity and running Virtual Machine IO causing elongated rebuild time and slow application responsiveness. To avoid issues, such as the one described above, VMware is removing these controllers from the Hardware Compatibility List.

 

One of the controlers removed is the Dell PERC H310. Likely one of the reasons this controller has been deleted is a serious issue described at Reddit.com. A VSAN user experienced basically a meltdown of its virtual infrastructure when a VSAN node failed and after a while a rebuild started. This caused so many IO the PERC H310 could not handle and all VMs came to a standstill.

These are all controllers removed from the VSAN compatibility list

vsan-controllers

Advertisements

VMware opens public beta for next version of vSphere

For the first time ever VMware opens a beta program of vSphere available for the public. So far participating in the beta was only available for VMware partners, vExperts and other selected groups.

Now everyone can participate and have a look into the future of vSphere. The next release will most likely be vSphere 6.0 although VMware never communicates over version numbers before product announcement.

Joining is very simple by filling in name, role, company and country. Then digital sign and you are in the beta within minutes.

Please remember that this vSphere Beta is private even though it is open to everyone. Do not share information from this Beta Program with those not in the Beta Program. What you learn and see in this Beta is meant to be kept confidential per the Master Software Beta Test Agreement and Program Rules.

The ground rules of the program are here.

This is an interesting change in policy of VMware. Microsoft always has been very open about new versions of for example Hyper-V and enabling the public to download beta’s.  VMware preferred to keep features and even the release number of vSphere secret tilll unveiled at VMworld.

I am not sure about the reason for this change in policy. Likely VMware made the beta open to public for quality reasons to make sure as many scenario’s are tested and teething problems are resolved before the product reaches GA. VMware Virtual SAN has been running the beta for a relative long time for the same reason.

Another , less likely, reason could be VMware wants to be more open on the future of products for marketing reasons.

The VMware blog announcing the beta is here

Join the beta here.

VMware releases vCenter Server 5.5 Update 1b

VMware released  vCenter Server 5.5 Update 1b at June 12. This update addresses Open SSL vulnerability and fixes some other issues.

The release notes are here
Download the bits here.

Resolved issues:

Security

Update to OpenSSL library addresses security issues
OpenSSL libraries have been updated to versions openssl-0.9.8za, openssl-1.0.0m, and openssl-1.0.1h to address CVE-2014-0224.
Server Configuration
Host profile answer file might not be applied when you reboot the ESXi host

After you restart the vCenter services, if you use Auto Deploy to reboot a stateless ESXi host, the host profile answer file might not be applied to the ESXi host and the host configuration might fail. This issue occurs if the reference host is not available.

This issue is resolved in this release.

Virtual SAN
Under certain conditions, Virtual SAN storage providers might not be created automatically after you enable Virtual SAN on a cluster

When you enable Virtual SAN on a cluster, Virtual SAN might fail to automatically configure and register storage providers for the hosts in the cluster, even after you perform a resynchronization operation.

This issue is resolved in this release. You can view the Virtual SAN storage providers after resynchronization. To resynchronize, click the synchronize icon in the Storage Providers tab.

What are the dangers of snapshots and how to avoid?

VMware vSphere snapshots can be very useful. A snapshot captures just like a photo  does the state of a virtual machine at a certain point in time. This capture cannot be modified while the virtual machine is active as it is read only. Returning to a state which is known to be good is a matter of a few mouseclicks.

However, snapshots are not that innocent. You can shoot yourself in the foot when not realizing the side effects of snapshots.

Backup software is a major culprit for causing issues with virtual machine performance and availability due to using snapshots. See my previous post about the impact of snapshots.

It is very important to understand what the impact of snapshots can be on availability and performance of virtual machines:

  • a virtual machine with active snapshot(s) performing many writes to disk can fill up capacity of a datastore causing all vm’s on a datastore to crash or pause
  • deleting a snapshot can pause a virtual machine for many minutes. This can for example result in Exchange Server DAG cluster failover or other unwanted side effects.

This post will provide information on snapshot deletions (commit as well as consolidation) and how to prevent pausing of virtual machines. We will focus on VMware snapshots but much in this post applies to snapshots of other solutions as well.

A couple of advises:

  • make sure if your application supports snapshots and under which conditions
  • a succesfull backup using snapshots does not automatically mean a succesfull restore!
  • snapshots are not a replacement for backups!
  • make sure snapshots are only active  for a couple of hours max. Then delete snapshots.
  • be very carefull using snapshots on virtual machines which perform many write transactions to disk
  • have a close look at impact and behaviour of your backup tool on snapshot files
  • make sure applications running in the virtual machine support snapshots. Snapshots of virtual machines running Microsoft Exchange are not supported. Snapshots of SQL server are supported only when VSS is used.
  • snapshots of virtual machines using in-guest iSCSI drives are not supported. 

 

Introduction to VMware snapshots

When a snapshot is made, the original VMDK (we call this the parent or base disk) is set to ReadOnly mode. All further writes to the virtual machine disks are stored in a delta disk (also called snapshot disk,  child disk, virtual  disk redologs or (sparse) delta disks). These delta disks have a <number> -delta.vmdk extension in the filename. Snapshots grow in chunks of 16 MB each. Each time a chunk is added the VMFS volume is locked.

Muliple snapshots can be taken of the same virtual machine.

Snaphots are very useful for making sure a known working situation can be restored. This because the parent disk does not change after the snapshot was taken (it is read only).

When a snapshot is deleted (we do not want to revert to the original situation when the snapshot was made), ESX(i) will merge the data written in the delta file back to the parent disk. A snapshot delete is also called a ‘commit‘ or ‘consolidation’.

While this is in progress, another delta disk is created which is used during the commit to store new writes. This is a ‘Consolidate Helper snapshot’  It is created at the moment a snapshot file is being commited to the parent disk. New incoming writes are stored in the consolidate helper snapshot file. Those are commited as well when the initial snapshot file has been succesfully commited.

 

 

 

 

 

 

 

To keep track of snapshot files ESX(i) uses a .vmsd file which is used for storing information and metadata about snapshots.

If an administrator wants to restore a certain state of a virtual machine (go back in time) , this is called revert.

This is a great article explaining what is happening under the hood of snapshots. This VMware KB article is also very informative.

Microsoft SQL and Exchange  support for snapshots 

Mind not all applications support snapshots. Microsoft policy on snapshots depends on the product. SQL Server supports snapshots which uses the VSS. This is the support policy for SQL Server.

SQL Server supports virtualization-aware backup solutions that use VSS (volume snapshots). For example, SQL Server supports Hyper-V backup. Virtual machine snapshots that do not use VSS volume snapshots are not supported by SQL Server. Any snapshot technology that does a behind-the-scenes save of a VM’s point-in-time memory, disk, and device state without interacting with applications on the guest using VSS may leave SQL Server in an inconsistent state.

Exchange Server (2010 and 2013) does not support snapshots. The quote below was taken from a Microsoft article . This is an article on Exchange 2013.

Some hypervisors include features for taking snapshots of virtual machines. Virtual machine snapshots capture the state of a virtual machine while it’s running. This feature enables you to take multiple snapshots of a virtual machine and then revert the virtual machine to any of the previous states by applying a snapshot to the virtual machine. However, virtual machine snapshots aren’t application aware, and using them can have unintended and unexpected consequences for a server application that maintains state data, such as Exchange. As a result, making virtual machine snapshots of an Exchange guest virtual machine isn’t supported.

Especially when snapshots are taken of Exchange mailbox servers care should be taken. Snapshots of HUB and CAS roles should be okay in most cases.

If you make a snapshot of an Exchange Server and want to revert, there is a chance that after revert you will notice Exchange errors. If you are in bad luck Exchange might not be able to mount mailboxstores because of corruption.

When the snapshot is commited there is a chance the virtual machine has to be paused for a while. When an Exchange DAG role is installed in that virtual machine a DAG cluster failover might occur because the heartbeat is temporary lost.

If you want to make a snapshot of an Exchange server, make sure the virtual machine is shutdown first.

Snapshots of Microsoft Active Directory running on Windows Server 2012 are supported on certain versions of the hypervisor. See my post for more info. See this Microsoft post for additional info.

Out of sync situation

Snapshots are used by many backup solutions. However not all backup solutions clean up the delta disks after the backup of a vm has finished. Some tools just delete the metadata while the delta disks are still used to write data to.

VMware introduced in vSphere 5.0 ‘snapshot consolidation’. This corrects out-of-sync situations like a leftover snapshot file. Snapshot consolidation commits a chain of snapshot files to the original virtual machine parent file when Snapshot Manager shows that no snapshots exists but the delta files still remain on the datastore.

Snapshot consolidation is a very important task for administrators. Because snapshot files are still active these continue to expand and consume disk space untill the datastore runs out of space.

How do you know a virtual machine disk needs consolidation? It will be shown in the Summary tab of the vSphere Client.

vspehre-consolidation

A small explanation of how to use consolidation is shown in this VMware video.

Slow or paused virtual machines due to commit
Consolidation and snapshot commits could lead to a situation in which the virtual machine is paused for a few seconds or up to over 30 minutes!

This pausing is called a stun and is in certain circumstances required to be able to commit delta files.
Stunning is likely to happen when the guest operating system is performing more writes to the delta file than ESX(i) can commit to the parent disk. It is like a car driving max 50mph is trying to overtake a car driving an average of 60mph. To be overtaken the fastest car will need to stop or slow down for a while.

ESX(i) stun is a pause of the virtual machine so snapshots files can be commited to the parent disk. More info on stun in this VMware KB article.

VMware made several enhancements to snapshot commits in various releases of vSphere but still snapshots can have severe impact on virtual machines.

ESX(i) will try to commit snapshots without having to stun (pause) the virtual machine. Performing snapshot commits while the virtual machine is running is called asynchronous consolidate.

Initially the commit is performed during a period of 5 minutes. If this commit fails to get rid of all snapshot files because to many writes are coming in, it will do another try with a duration of 10 minutes. If this again fails because too much new writes are written, the snapshot commit duration is extended to 20 minutes. In total ESX(i) tries at a maximum of 10 times (called iterations).

Thereafter the virtual machine will be stuned. This is called a synchronous consolidate. Stunning means no new writes are coming in and ESX(i) is able to commit all snapshot files.

Beginning in ESXi 5.0, the snapshot stun times are logged. Each virtual machine’s log file (vmware.log) will contain messages similar to:

2013-03-23T17:40:02.544Z| vcpu-0| Checkpoint_Unstun: vm stopped for 403475568 us

In this example, the virtual machine was stunned for 403475568 microseconds (1 second = 1 million microseconds).

Avoiding stun or keep the stun duration as short as possible
If you do not want to stun / pause the virtual machine you can set  snapshot.maxIterations to 20 (or higher). This means vSphere will do more tries (iterations) to commit the snapshot files. More information in this KB article.

Be carefull to change settings and closely monitor the effects.

To do this:

  1. Shut down the virtual machine
  2. Right-click the virtual machine and click Edit Settings.
  3. Click the Options tab.
  4. Under Advanced, click General.
  5. Click Configuration Parameters and add snapshot.maxIterations

However, this could make things worse. Think again about that car (the commit process) trying to chase that other (leading) car (the writes from the os and applications in the guest). If the speed of the leading car remains higher than the chasing car, the longer the duration of the chase, the bigger the distance.

Alternatively you can set snapshot.maxConsolidateTime to 60 seconds. This means you can accept a pause of the virtual machine for 60 seconds to do a synchronous consolidate. This is often a better option than wait for the snapshot file grow so big the virtual machine will require to be stunned for a much longer time.

ESXi 4.1 has a update which added parameter snapshot.asyncConsolidate.forceSync = “FALSE” which needs to be added to the VMX file. This setting disables synchronous consolidate and the virtual machine will never be stunned. More info in this KB.

 

Some additional info
VMware published a remarkable number of knowledgebase articles on snapshots. Below just some examples.

VMware KB A snapshot removal can stop a virtual machine for long time (1002836)
VMware KB Virtual machines residing on NFS storage become unresponsive during a snapshot removal operation (2010953)
V
Mware KB Delete all Snapshots and Consolidate Snapshots feature FAQ (1023657)
V
Mware KB Commands to monitor snapshot deletion in ESX 2.5/3.x/4.x and ESXi 3.x/4.x/5.x (1007566)
V
Mware KB Consolidating snapshots in vSphere 5.x (2003638)
V
Mware KB Configuring VMware vCenter Server to send alarms when virtual machines are running from snapshots (1018029)

VMware releases patch for vSphere 5.5 Update 1 NFS All Paths Down condition

After installation of VMware vSphere 5.5 Update 1 connectivity to NFS storage could be randomly lost with volumes reporting All Paths Down state. This issue was noticed around mid April and reported here.VMware documented the issue in KB article  2076392 Intermittent NFS APDs on VMware ESXi 5.5 U1

At June 10 VMware released a patch for this issue as described in KB 2077360. The patch can be downloaded using VMware Update Manager or using  VMware download page

VMware vCenter Heartbeat End of Availability starting June 2

VMware vCenter Heartbeat is not available anymore for purchase starting June 2. All support and maintenance for the removed versions will be unaffected and will continue on per VMware Life Cycle policy through the published support period until September 19, 2018.

VMware vCenter Server Heartbeat is a software product (OEM of Neverfail) that protects vCenter Server against outages–from application, operating system, hardware and network failures to external events–regardless of whether vCenter Server is deployed on a physical or virtual machine.

vCenter Server Heartbeat creates a clone of both the vCenter Server and the SQL server database used by vCenter, and then keeps both the primary and secondary vCenter Servers in sync through continuous asynchronous replication. Administrators can use vCenter Server Heartbeat in a virtual configuration, physical configuration or a hybrid model.

The reason for the end of life is that VMware believes current available protections like VMware HA, vMotion and Storage vMotion ensure availability of managed resources.

I *guess* a couple of reasons to stop selling vCenter Heartbeat could be:

1. most customers are using vCenter running in a virtual machine. They are happy with HA
2. sales of vCenter Heartbeat are reducing because of point 1
3. vCenter Heartbeat is a complex product
4. vCenter Heartbeat was a pretty expensive solution not many customer were interested in.
5. It might give VMware too many support headaches because of upgrade issues vSphere 5.0 -> 5.5 combined with SSO.

6. As of vCenter Server 5.5 in vSphere 5.5, VMware introduced support for using Microsoft SQL Cluster Service for use as a back end database. See this KB for instructions.
7. VMware is working on a new way to protect vCenter Server

More information in the VMware blog here. Answers to frequently asked questions are here.

Checking hardware recommendations might prevent VSAN nightmare.

<update June 4>

Jason Gill posted the Root Cause Analysis done by VMware on his issue with VMware described below. Indeed the issue was because of the usage of the Dell PERC H310 controller which has a very low queue depth. A quote:

While this controller was certified and is in our Hardware Compatibility List, its use means that your VSAN cluster was unable to cope with both a rebuild activity and running production workloads. While VSAN will throttle back rebuild activity if needed, it will insist on minimum progress, as the user is exposed to the possibility of another error while unprotected. This minimum rebuild rate saturated the majority of resources in your IO controller. Once the IO controller was saturated, VSAN first throttled the rebuild, and — when that was not successful — began to throttle production workloads.

Read the full Root Cause Analysis here at Reddit

Another interesting observation while reading the thread on Reddit is that the Dell PERC H310 actually is an OEM version of the LSI 2008 card. John Nicholson wrote a very interesting blog about the H310 here.

Dell seems to use H310 with old firmware. When using the latest firmware the queue depth of the Dell PERC H310 can be increased to 600!

We went from 270 write IOPS at 30 ms of write latency to 3000 write iops at .2ms write latency just by upgrading to the new firmware that took queue depth from 25 to 600

This article explains how to flash a Dell PERC H310 with newer firmware. I am not sure if a flashed PERC H310 is supported by VMware. As a HBA with better specs is not that expensive I advise to only flash Dell PERC H310 when used in non-production environments.

————————————————————-

June 02, 2014

An interesting post appeared on Reddit. The post titled My VSAN nightmare describes a serious issue in a VSAN cluster. When one of the three storage nodes failed displaying a purple screen, initially all seemed fine. VMware HA kicked in and restarted VM’s on the surviving nodes (two compute and two storage nodes). The customer was worried about redundancy as storage was located on just two nodes now. So SSD and HDD storage was added to one of the compute nodes. This node did not have local storage before.

However exactly 60 minutes after adding new storage,  DRS started to move VM’s to other hosts, lots of IO were seen, all (about 77) VM’s became unresponsive and all died. VSAN Observer showed that IO latency had jumped to 15-30 seconds (up from just a few miliseconds on a normal day).

VMware support could not solve the situation and basically said to the customer: “wait till this I/O storm is over”. About 7 hours later the critical VM’s were running again. No data was lost.

At the moment VMware support is analyzing what went wrong to be able to make a Root Cause Analysis.

Issues on VSAN like the one documented on Reddit are very rare.  This post will provide some looks under the cover of VSAN. Hope this helps to understand what is going on under the hood of VSAN and it might prevented this situation happening to you as well.

Lets have a closer look at the VSAN hardware configuration of the customer who wrote his experiences on Reddit.

VSAN hardware configuration
The customer was using 5 nodes in a VSAN cluster: 2x compute nodes (no local storage )  and 3x storage nodes, each with 6x magnetic disks and 2x SSD’s, split into two disk groups each.
Two 10 Gb nics where used for VSAN traffic. A Dell PERC H310 controller was used which has a queue depth of only 25. Western Digital WD2000FYYZ HDDs were used with a capacity of 2 TB, 7200 rpm SATA drives. SSD’s are Intel DC S3700 200 GB.

The Dell PERC H310  is interesting as in Duncan Epping post here it is stated:

Generally speaking it is recommended to use a disk controller with a queue depth > 256 when used for VSAN or “host local caching” solutions

VMware VSAN Hardware Guidance also states:

The most important performance factor regarding storage controllers in a Virtual SAN solution is the supported
queue depth. VMware recommends storage controllers with a queue depth of greater than 256 for optimal
Virtual SAN performance. For optimal performance of storage controllers in RAID 0 mode, disable the write cache, disable read-ahead,
and enable direct I/Os.

Dell states about the Dell PERC H310

 Our entry-level controller card provides moderate performance.

Before we dive into the possible cause of this issue lets first provide some basics on VMware VSAN. Both Duncan Epping and Cormac Hogan of VMware wrote some great posting about VSAN. Recommended reads! See the links at the end of this post.

VSAN servers 
There are two ways to install a new VSAN server:

  1. assemble one yourself using components listed in the VSAN Hardware Compatibility Guide
  2. use one of the VSAN Ready Nodes which can be purchased. 16 models are available now from various vendors like Dell and Supermicro.

Dell has 8 different servers listed as VSAN Ready Node. One of them is the PowerEdge R720-XD which is the same server type used by the customer describing his VSAN nightmare. However the Dell VSAN Ready Node has 1 TB NL-SAS HDD while the Reddit case used 2 TB SATA drives. So likely he was using servers assembled himself.

Interesting is that 4 out of the 8 Dell VSAN Ready Node server use the Dell PERC H310 controller. Again, VMware advises a controller with a queue depth of over 250 while the PERC H310 has 25.

Dell-vsan-ready-node

VSAN storage policies
For each virtual  machine or virtual disk active in a VSAN cluster an administrator can set ‘virtual machine storage policies’. One of the available storage policies is named ‘number of failures to tolerate’. When set to 1, virtual machines to which this policy is set will survive a failure of a single disk controller, host or nic.

VSAN provides this redundancy by creating one or more replica’s of VMDK files and stores these at different storage nodes in a VSAN cluster.

In case a replica is lost, VSAN will initiate a rebuild. A rebuild will recreate a replica of VMDKs.

VSAN response to a failure. 

VSAN’s response to a failure depends on the type of failure.
A failure of SSD, HDD or the diskcontroller results in an immediately rebuild. VSAN understand this is a permanent failure which is not caused by for example planned maintenance.

A failure of the network or host results in a rebuild which is initiated after a delay of 60 minutes. This is the default wait. The wait is because the absense of a host or network could be temporary (maintenance for example) and prevents wasting resources. Duncan Epping explains details in this post How VSAN handles a disk or host failure .
The image below was taken from this blog.

If the failed component returns within 60 minutes only a data sync will take place. Here only the data changed during the absence will be copied over to the  replica(s).

A rebuild however means that a new replica will be created for all VMDK files being not compliant. This is also referred to as a ‘full data migration’.

To change the delay time see this VMware KB article Changing the default repair delay time for a host failure in VMware Virtual SAN (2075456)

Control  and monitor VSAN rebuild progress
At the moment VMware does not provide a way to control and monitor the progress of the rebuild process. In the case described at Reddit basically VMware advised ‘wait and it will be alright’. There was no way to predit for how long the performance of all VM’s stored on VSAN would be badly affected because of the rebuild. The only way to see the status of a VM is by clicking on a VM in the vSphere web client. Then select its storage policies tab, then clicking on each of its virtual disks and checking the list – it will tell you “Active”, “Reconfiguring”, or “Absent”

For monitoring  VSAN Observer provides insight on what is happening.

Also looking at the clomd.log could give indication of what is going on. This is the logfile of the Cluster Level Object Manager (CLOM)

It is also possible to use command line tools for administration, monitoring and troubleshooting. VSAN uses Ruby vSphere Console (RVC) command line. Florian Grehl wrote a few  blogs about managing VSAN using RVC

The VMware VSAN Quick Troubleshooting and Monitoring Reference Guide has many details as well.

Possible cause
It looks like the VSAN rebuild process which started exactly 60 minutes after having added extra storage initiated the I/O storm. VSAN was correcting an incompliant storage profile and started to recreate replica’s of VMDK objects.

A possible cause for this I/O storm could be that the rebuild of almost all VMDK files in the cluster was executed in parallel.  However according to Dinesh Nambisan working for the VMware VSAN product team;

 “VSAN does have an inbuilt throttling mechanism for rebuild traffic.”

VSAN seems to use  a Quality of Service system for throttling back replication traffic. How this exacty works and if this is controlable by customers is unclear. I am sure we will soon learn more about this as this seems key in solving future issues with low-end controllers and HDDs combined with a limited number of storage nodes.

While the root cause has yet to be determined a combination of configuration choices could have caused this:

1. Only three servers in the VSAN cluster were used for storage. When 1 failed only two were left. Those two both were active in rebuild for about 77 virtual machines at the same time.
2. Using SATA 7200 rpm drives as the HDD persistent storage layer. Fine for normal operations when SSD is used for cache. In a rebuild operation not the most powerfull drives having low queue depths.
3. Using an entry level Dell PERC H310 disk controller. The queue depth of this controller is only 25 while advised is to use a controller with 250+ queue depth.

Some considerations
1. Just to be on the safe side use controllers with at least 250+ queue depth
2. for production workloads use N+2 redundancy.
3. use NL-SAS drives or better hdd. These have much higher queue depths (256) compared to SATA hdd (32).
4. in case of a failure of a VSAN storage node: try to fix the server by swapping memory/components to prevent rebuilds. A sync is always better than a rebuild.

5. It will be helpfull if VMware added more control for the rebuild process. When n+2 is used, rebuild could be scheduled to be executed only during non-business hours. Also some sort of control of priority on which replica’s are rebuild first would be nice. Something like this:

in case n+1: tier 1 vms rebuild after 60 minutes. tier 2,3  rebuild during non-business hours
in case n+2: all rebuilds only during non-business hours. Tier 1 vm’s first, then tier 2 then tier 3 etc.

Some other blogs about this particular case
Jeramiah Dooley Hardware is Boring–The HCL Corollary

Hans De Leenheer VSAN: THE PERFORMANCE IMPACT OF EXTRA NODES VERSUS FAILURE

Some usefull links providing insights into VSAN

Jason Langer : Notes from the Field: VSAN Design–Networking

Duncan Epping and others wrote many postings about VSAN. Here a complete overview.

A selection of those blog posts which are interesing for this case.
Duncan Epping How long will VSAN rebuilding take with large drives?
Duncan Epping 4 is the minimum number of hosts for VSAN if you ask me
Duncan Epping How VSAN handles a disk or host failure
Duncan Epping Disk Controller features and Queue Depth?

Cormac Hogan VSAN Part 25 – How many hosts needed to tolerate failures?

Cormac Hogan Components and objects  and What is a witness disk 

 

Did you know VMware Elastic Sky X (ESX) was once called ‘Scaleable Server’?

VMware has been a vendor of  servervirualization software for a long time. This story will inform about the history of VMware ESX(i) and uncover the various names used in the history of ESX(i).

VMware was founded in 1998. The first ever product was VMware Workstation released in 1999. Workstation was installed on client hardware.

For server workloads VMware GSX Server was developped. This is an acronym for Ground Storm X. Version 1.0 of GSX Server was available around 2001. It was required to install on top of Windows Server or Linux making GSX a Type 2 hypervisor just like Workstation. The product was targeted at small organizations. Remember we are talking 2001!

GSX 2.0 was released in summer 2002. The last version available of GSX Server was 3.2.1 released in December 2005.Thereafter GSX Server was renamed to ‘VMware Server’ available as freeware. VMware released version 1.0 of VMware Server on July 12, 2006. General support for VMware Server ended in June 2011.

However VMware realized the potential of server virtualization for enterprises and was working on development of a type 1 hypervisor which could be installed on bare metal.

VMware Scaleable Server’ was the first name of this product currently known as ESXi.  See this screenshot provided by  Chris Romano ‏@virtualirishman on his Twitter feed. This must be around 1999 or 2000.

After a while the name was changed to  ‘VMware Workgroup Server‘. This was around year 2000. Hardly no reference can be found on internet for these two early names.

In March 2001 ‘VMware ESX 1.0 Server‘ was released. ESX is short for Elastic Sky X. A marketing firm hired by VMware to create the product name believed Elastic Sky would be a good name. VMware engineers did not like that and wanted to add a X to sound it more technical and cool. 

VMware employees later started a band named Elastic Sky with John Arrasjid being the most well known member.

ESX and GSX were both available for a couple of years. ESX was targeted at enterprises. It was not untill around 2005/2006 before ESX got some traction and organizations started to use the product.

ESX had a Service Console. Basically a Linux virtual machine which allows management of the host and virtual machines. Agents could be installed for backup software or other third party tools.

Development started for a replacement of ESX. The replacement would not have the service console. In September 2004 the development team showed the 1st fully functional version to VMware internal staff. The internal name was then VMvisor (VMware Hypervisor). This would become ESXi 3 years later. 

At VMworld 2007 VMware introduced VMware ESXi 3.5. Before that the software was called  VMware ESX 3 Server ESXi edition but this  was never made available. This screenshot shows the name ‘VMware ESX Server 3i 3.5.0’ . ESX and ESXi share a lot of similar code.

ESXi has a much smaller footprint than ESX and can be installed on flash memory. This way it can be almost seen as part of the server. The i in ESXi stands for Integrated.

At the release of vSphere 4.0 (May 2009) the ‘ESX Server’ was renamed to just VMware ESX.

Up to vSphere 4.1  VMware offered two choices for customers: VMware ESX which has the Linux console and ESXi which had and still has the menu to configure the server. Mind you have still access to a limited command line by pressing Alt-F1.

Since vSphere 5 ESX is not longer available.

Similar to the hypervisor  the management software changed names a couple of times as well. VMware VMcenter is so old Google cannot find any reference for it. It might also be used as an internal name only. Here is a screendump taken from here

At December 5 2003 VMware released VirtualCenter. It was used to manage multiple hosts, (ESX Server 2.1) and virtual machines .

In May 2009 VMware released ‘vCenter Server 4.0‘ as part of vSphere 4.0. vCenter Server from now on was the new name for VirtualCenter. The last version released of VirtualCenter was 2.5

Sources used for this blog:

Wikipedia VMware ESX

vladan.fr VMware ESXi was created by a French guy !!!
VM.blog What do ESX and ESXi stand for?
yellow-bricks.com vmware-related-acronyms/

Some more images of old VMware products are here

VMware Virtual SAN Hardware Design Guide released

VMware Virtual SAN (VSAN) uses local server storage and presents it as shared storage. Benefits are lower costs, simplicity, performance and agility. Server hardware which supports Virtual SAN either can be bought (VSAN ready nodes) or assembled by the customer. VMware just released a technical whitepaper titled “Virtual SAN Hardware Guidance” It explains which hardware components should be used when assembling a server yourself. Topics covered in the Virtual SAN Hardware Guidance whitepaper includes

  • Server Form Factors
  • Server Boot Devices
  • Flash Devices
  • Magnetic Hard Disk Drives
  • Storage Controllers
  • Networking

more info in this VMware blog

VMware OpenSSL patches available

Quite a few VMware solutions are affected by the OpenSSL security vulnerabilities

VMware has now updates available for those solutions.

See links here.

 

 

VMware releases vCenter Converter Standalone 5.5.1. Adds VSAN support

VMware Converter is a software tool used for creation of VMware virtual machines from sources like physical servers, other vendor virtual machines or disk images. It is a free to use tool.
The VMware vCenter Converter Standalone 5.5.1 is an update release that fixes important issues and adds the following new features:

  • Support for vSAN
  • Support for DSA authentication for Linux conversions

Release notes are here

Download here.

 

 

Caution: do not upgrade to VMware vSphere 5.5 Update 1 if using NFS

UPDATE: VMware released a patch for this issue. See my post here for more info.

 

<update>VMware released KB2076392 ‘Frequent NFS APDs after upgrading ESXi to 5.5 U1 (2076392)

————————

On Twitter Nick Howell, a NetApp employee , warns VMware vSphere 5.5 customers not to upgrade to ESXi 5.5 Update 1. Because of a vSphere issue NFS disconnects are likely to happen.

This issue seems to occur with any NFS storage vendor, not just NetApp.

For NetApp customers the issue seems to occur when using Data ONTAP 8.0 7-Mode. Netapp is  investigating if this isuse occors on cDOT as well. 

According to Nick Howell, vSphere 5.5 Update 1 is not yet added to the NetApp Interoperability Matrix Tool which is a web-based utility. It enables you to search for information about the configurations for NetApp products that work with third-party products and components that meet the standards and requirements specified by NetApp.

Nothing more is known at the moment which can be shared externally. VMware and NetApp are aware of the issue.

VMware published a Knowledgebase article on this issue. The workaround is to use vSphere 5.5 GA

The error related to vSphere 5.5 Update 1 NFS shown in vCenter Server could be:

Device or filesystem with identifier <xxx> has entered the All Paths Down state.

The error is shown here.

Other bloggers reported as well on this issue.

Michael Webster: Heads Up Alert: vSphere 5.5 U1 NFS Random Disconnection Bug!

Nick Howell (who posted this alert first on Twitter)  NFS Disconnects in VMware vSphere 5.5 U1

Duncan Epping   Alert: vSphere 5.5 U1 and NFS issue!

There are other NetApp NFS connectivity issues on vSphere as described in this KB. This issue is related to MaxQueueDepth. datacenterdan.com has more info on this.

vsphere5.5-u1-caution

vsphere5.5nfsbug

some issues reported on Twitter

55nfs-issue1

55nfs-issues2

 

 

 

Veeam Backup & Replication v8 will have NetApp snapshot support

Veeam anounced one of the new features of the to be released Backup & Replication v8.

Version 8 will support NetApp storage snapshots.

With this feature Veeam extends its support for storage based snapshots. Veeam already supports HP storage snapshots using  Veeam Explorer for Storage Snapshots. It  lets you restore VMware VM data directly from native and HP StoreServ (3PAR), HP StoreVirtual (LeftHand) and HP StoreVirtual VSA (Virtual Storage Appliance) snapshots.

Combining NetApp snapshots with Veeam B&R is combining the best of two worlds.

NetApp ONTAP which is the operating system of the storage offers several ways to protect data at the storage layer:

  • Snapshot: creates a point in time copy of a ONTAP volume for data protection purposes. Creating a storage based snapshot does not have an impact on performance of the virtual machines. Creation of snapshots can be full scheduled and is a first line of defense for dataprotection. It allows a very low Recovery Point Objective (RPO).
  • SnapMirror: replicates the snapshot to a different NetApp storage device.
  • SnapVault: archives a snapshot

Advantage of combining Veeam B&R and NetApp snapshots

Veeam Backup & Replication will be able to directly acces the ONTAP Snapshot data and make a backup of the data included in the snapshot. Backup of virtual machines running on NetApp is a two step process now.

The advantage of an ONTAP storage snapshot is that is does not have a negative effect on the performance of virtual machines. When VMware based snapshots are used instead, there can be an impact on performance when snapshots are deleted when the backup has finished. Read about the impact on performance when using VMware snapshots here.

The reason to use Veeam B&R for backup of the NetApp snapshot is that you want your backup data to be stored on a different physical storage device. Snapshots are not backups! If your NetApp is lost (technical errors, fire) your snapshots (if not mirrored using SnapMirror) are lost as well.

 

Restoring data from NetApp snapshots is a bit complicated and time consuming process. When using Veeam B&R however restore is very simple and fast. Veeam is able to restore virtual machines, individual guest files but also Exchange and SharePoint items directly from ONTAP Snapshot, SnapMirror or SnapVault as a source. It is also possible to run a virtual machine from a snapshot using the Veeam Instant VM Recovery technology. There is no need for copying data from backup source to production volume. This enables a very low Recovery Time Objective.

You can learn more here http://go.veeam.com/v8-netapp.html or read Luca Dell’Oca’s post http://www.veeam.com/blog/veeam-to-offer-advanced-data-protection-with-netapp.html

There is a webinar on April 17th http://www.veeam.com/videos/veeam-netapp-integration-3999.html

Veeam Backup & Replication v8 is expected to be available in second half of 2014.

Slides of 112 VMware Partner Exchange breakout sessions available for Partners

VMware just published slidedecks of 112 breakout sessions which were presented during the recent VMware Partner Exchange in San Francisco.

These contain a wealth of information having technical, VMware partner network and sales/marketing content.

Examples of popular sessions are:

  • EUC3064 Horizon View Troubleshooting: Looking under the Hood
  • EUC3147 What’s New Horizon View 5.3 – Technical
  • SDDC3444 Virtual SAN – Technical Best Practices
  • SDDC3206 Introduction to VMware NSX
  • HYC3133 VMware Cloud Credits Purchasing Program: How to Increase Value to You and Your Clients on the Journey to Public/Hybrid Cloud
  • HYC3374 BC/DR Options With vCloud Hybrid Service
  • HYC3400 vCloud Hybrid Service 101: The Basics

Also recordings of the two general sessions are available.

Available on VMware Partner Central for VMware partners only.

VMware Virtual SAN (VSAN) is now available for download. Licensed per CPU or user.

VMware VSAN 5.5 is available for download now.

In order to run Virtual SAN, you need to download vSphere 5.5 U1 (or above). There are no additional binaries required. Virtual SAN does require a separate license, and is available for free 60 days evaluation period along with vSphere. Please note if you plan to use Virtual SAN with VMware Horizon View, you’ll need to download a specific Horizon View version 5.3.1 binary that supports Virtual SAN.

At least three vSphere 5.5 ESXi nodes are required to be able to use VSAN. The maximum number of nodes in a VSAN 5.5 cluster is 32.

As you noticed, the first release of Virtual SAN is named  ‘Virtual SAN 5.5’ . It will be available in two editions and one bundle (temporary offer).

Here is the official announcement on the VMware company blog about the GA of Virtual SAN 5.5. It includes pricing details listed below.

  • Standalone edition is licensed per CPU and costs $ 2,495. It features Persistent data store, read/write caching, policy-based management, virtual distributed switch, replication, snapshot and clones . This edition can run any workload; either virtual servers or desktops.
  • VSAN can also be licensed per user for VDI deployment only. (VMware or Citrix). Concurrent or named. The costs per user is $ 50. The features are the same as the per CPU licending.
  • There is also a softbundle of Virtual SAN and vSphere Data Protection Advanced (VDPA) which costs $ 2875,- per CPU. This is a limited offer which expires September 15, 2014. This is a VERY GOOD deal. The listprice of VDPA is $ 1100,-. So you save $ 720,- per CPU when purchasing this bundle.

There will be some special offers for VSAN:

  • customers using the beta will receive a discount of 20%.
  • customers using VMware Virtual Storage Appliance (VSA) will get reduced pricing when they upgrade to VSAN.

VSAN can be installed on self assembled hardware components which are listed on the VSAN HCL. Another option is to buy

VSAN Ready Node and VSAN Ready Block hardware configurations . A Ready Node recommended configuration is a single pre-configured
server for use with VIrtual SAN. A Ready Block recommended configuration is a pre-cofigured set of servers for use with Virtual SAN.

A list of vendors supplying those is listed here. 

Download vSphere 5.5 Update 1 which includes VSAN here.

A large collection of links to blogposts on VSAN here and here

VMware release a new VSAN Design and Sizing Guide edition March 2014 which can be downloaded here.

VMware has a free Hands-on Lab (HOL)available which enables you to play and explorer with VMware VSAN. No need to have hardware, software and licenses. The HOL is running in the cloud.

This is a VSAN license calculator showing costs for vSphere, VSAN, S&S

%d bloggers like this: