Hyper-V 2012 R2 virtual machines lose randomly network connections . Be carefull with Emulex NICs! New driver expected in July
June 16, 2014 Leave a comment
<update June 24>
Hyper-V Program Manager Ben Armstrong made a blogpost about this issue titled Hyper-V Network Connectivity Issues with Emulex Adapters
He is blogging about this for two reasons:
I have been contacted by a number of customers who have hit this, and want people to know about what to do.
It is really good to see a hardware vendor documenting status and workarounds for known issues, and I am glad to see this post going up.
<update June 20>
Hans Vredevoort of Hyper-V.nu wrote a blog about the conference call he had with Emulex. Emulex was unaware of this issue for a long time. It seems HP did not inform Emulex about the issue. When Emulex was aware they could n0t reproduce the issue at first, and then found other issues as well.
His blog here: Additional Background on the VMQ Issue with Emulex and HP
<update June 19>
Emulex posted a blog on the Emulex website explaining the issue described in my post. The workaround is to disable VMQ. We knew that one for a couple of months.
The good news is that a new driver and firmware is expected to be released in Mid July.
See the blog here.
<update June 17>
I have been in touch with the CEO of Emulex about this issue. He stated “My team is very aware of this and while you may not have been provided the update you deserve, the issue has not been ignored. I know the team has been very engaged with HP and MSFT on this ”
Lets hope there is some progress on resolving the issue
Virtual machines running on Windows Server 2012 R2 Hyper-V could randomly lose their network connection. The only workaround to restore network connectivity is to perform a Live Migration of the affected VM to another host or to reboot the Hyper-V host.
To do this some ITpro’s wrote scripts which pings all VMs and if no response is received a Live Migation is performed.
In many cases the issue is seen on Emulex NIC’s in HP Gen8 blades on which Windows Server 2012 R2 with Hyper-V is installed.
The problem seems to be related to the number of VMQ’s available in the network interface. If the number of netadapters/ virtual nics in VM exceeds the number of VMQ’s available, some virtual machines will lose network connectivity.
Since the emulex netadapters have 16 VMQ slots total, the first 4 slots are taken up by the host OS. The first of the 4 is supposed to be “special” (i’ll get back to that in a bit). The other 3 are regular adapters. The next 12 are regular VM adapters. Each guest VM is assigned one VMQ slot out of 16.
4 + 12 = 16; all VMQ slots are assigned.
When the 13th VM tries to get a VMQ slot, it fails to receive one.
What’s supposed to happen, is the hypervisor is supposed to just start sharing it’s “first” slot (the special one), with any additional VM’s that can’t get VMQ slots (or any that have VMQ disabled).
What actually happens, on the emulex or broadcom adapters, is that the guest OS simply fails to allocate a VMQ slot, and fails to get any network connectivity at all. It can not talk to the host OS (even if it’s on the same VLAN and not communicating through the physical ports).
Basically, the Emulex and Broadcom give you exactly the VMQ slots avaialable and the “fail-over” technology of failing back to vRSS-like queues for the other VM’s simply fails to work, and any VM that wasn’t issued a direct VMQ fails to communicate.
The intel drivers correctly share the first VMQ slot with any additional VM’s. It ends up with higher-than-normal CPU usage on the first core, but that’s no different than how Windows Server 2008 R2 (or 2012 R2 with vRSS networking) works anyway.
I understand the Emulex adapters currently support up to 30 VMQ on Windows Server 2012.
The workaround which works for many people is to disable VMQ on the nic by using this command
get-netadapter | disable-netadaptervmq
This blogpost by Ben Gelens describes the same issue. Ben solved it by disabling the Virtual Machine Queue (VMQ) on just the management nic.
It seems to occur mostly when Emulex network interface cards are used. These are for example used in HP (HP 554M , HP 554FLB, 554FLC adapters use the Emulex chipset) , Broadcom NetXtreme 57xx NICs, and IBM servers. Especially the 10 GbE cards are suspect for this issue.
Emulex driver versions 10.0.430.570 , 10.0.430.1003 and 10.0.430.1047 all seem to suffer from this issue. Some information on Emulex adapters in a Hyper-V environment using RSS and VMQ.
Also NICs of Broadcom and Intel are reported having this issue but likely less frequent.
It seems that virtual machines which handle a lot of network traffic are more affected by this issue than virtual machines which do not handle a lot of network traffic.
The probolem is experienced by many people.
There is at the moment no solution but waiting for Emulex to release a new driver.
Some other advises which I found on various sites and might or might not help. There are other network issues reported as well on various blogs. Some servers get a BSOD but this could possibly be resolved by using a Microsoft hotfix
- disable encapsulated packet task offload per Disable-NetAdapterEncapsulatedPacketTaskOffload cmdlet
- Disabled Large Send Offload v2
- Set-NetOffloadGlobalSetting -TaskOffload Disabled