Last week Microsoft published part 1 in an article series about VMQ, detailing how VMQ works and trying to clear up some misconceptions about the technology.
It’s well worth the read but the main reason I mention it is because a colleague of mine ran into an issue that’s very much related to VMQ.
The customer is a large hosting provider and they experienced poor network performance when doing backups and live migrations over their 10 Gbit infrastructure. Very important to know though is that they use a virtual switch in Hyper-V to provide vNICs for backup and LM.
As the Microsofts article states, you won’t get 10 Gbit out of a Hyper-V switch:
Many people have reported that with the creation of a vSwitch they experience a drop in networking traffic drop from line rate on a 10Gbps card to ~3.5Gbps. This is by design. With RSS you have the benefit of using multiple queues for a single host so you can interrupt multiple processors. The downside of VMQ is that the host and every guest on that system is now limited to a single queue and therefore one CPU to do their network processing in the host. On server-grade systems today, about 3.5Gbps is amount of traffic a single core can handle.
Their bandwidth was somewhat lower, around 3 Gbps, but that’s most likely due to having older hardware.
I’m not sure how they’re going to resolve this but my suggestion was to use separate, physical, NICs for backup and, if needed, for LM.
As I don’t have enough information about how they’ve designed their Hyper-V environment I’m not sure if they’ve scaled up or out. If they scale out, bandwidth for LM should be less of an issue as fewer VMs lives on each host but at a certain point you’re still going to need bandwidth (unless you plan on patching your hosts continuously).
The takeaway from this is that when designing high-end environments, it pays to know the nuts and bolts of the technology you’re using.