The dark side of server virtualisation
Here's how to deal with network management complications
By Robin Layland | Network World US | Published: 10:19, 18 July 2010
Server virtualisation is a growing reality in data centres. The economics are firmly behind the trend. Server virtualisation reduces the total cost of ownership by reducing the number of physical servers, requiring less cooling and less power while increasing flexibility. This is all good for the business and the server group, but what effect does it have on the management of the network? The truth is that it complicates network management.
There are two big network problems associated with server virtualisation. The first is configuring virtual LANs. Network managers need to make sure the VLAN used by the virtual machine (VM) is assigned to the same switch port as the physical server running the VM.
One solution is for the server virtualisation group to tell network management team every possible server the VM can be started on and preconfigure the switch ports. This is not a perfect solution because it can cause the VLAN to be defined on a very large percentage of the switch ports. It can get even more complicated because the server group may not be aware of all the servers that images can be started on, especially during a recovery situation when they are taking emergency measures.
Related Articles on Techworld
The second problem is assigning QoS and enforcing network policies, such as access control lists (ACLs). Traditionally this is done in the network switch connected to the server running the application. With server virtualisation there's a software switch running under the hypervisor in the physical server - not the traditional physical network switch that connects to the physical server.
It is still important that policy be enforced in the the software switch. For example, if two VMs running on the server are not allowed to communicate with each other, someone who gained control of VM1 could open connections to VM2 and steal its data. If ACLs are applied by the soft switch in the server then this would be blocked.
Before virtualisation, this was prevented because the applications in VM1 and VM2 would run in different servers and the ACLs defined in the network switch would prevent the communication. Having policies applied in the software switch maintains the security. The issue is how to get the software to apply the policies.
Overcoming these two challenges is critical to making server virtualisation work smoothly. It would have been nice if the vendor community had created a uniform standard that works with all the different virtualisation vendors. As is normally the case with rapidly growing new technology, this did not happen. The industry has implemented four ways to address these problems.
Virtualisation vendors' solution
The market-leading virtualisation vendor is EMC VMware but many other virtualisation products exist, including Citrix's Zen, Microsoft's Hyper-V, KVM and offerings from many other smaller vendors. The most widely available products are for VMware, which is why I use it here as the example.
VMware's vCenter controls the virtualisation process and directs where the VMs are started. The hypervisor controls the server and the VMs running on the physical server. VSwitch is a software Layer 2 switch provided by VMware. Each VM has a virtual NIC. The vNIC uses a MAC address from either the virtualisation vendor's pool of MAC addresses or one created and assigned by the enterprise.
Step 1 is for the server group to define all network characteristics and policies for the VM machines. The operator tells vCenter to start VM2 in Step 2. This process includes multiple messages between vCenter and the hypervisor on the server, one of which pushes the network policy information to the hypervisor. In Step 3, the hypervisor configures the vSwitch with the correct VLAN, QoS and policy information. When the application on VM2 starts to send packets, the policy is applied in the vSwitch. 
This solves the problem of applying policies at the first switch but it does not solve the VLAN configuration problem in the network switch. The virtualisation groups need to tell network management to configure the VLAN on the switch port before the VM starts sending traffic, which requires quick coordination or the switch has to be preconfigured. The coordination can get more complicated when the virtualisation group moves the VM on the fly. Then the virtualisation group needs to coordinate with the networking group as it moves the server, and the network group needs to clean up the configuration on the old switch after a successful move.
One of the biggest concerns with this approach is the amount of coordination required between the virtualisation and network groups. The virtualisation group must configure parameters in vCenter that are controlled by the networking group, such as VLAN numbers, QoS, and ACLs. This means that good ongoing coordination is needed between the server virtualisation group and the networking group. Any change in VLANs or policies must be immediately reflected in the virtual server configuration, which introduces another possible failure point.
Another concern is the lack of visibility to what is going on within a networking component, the vSwitch, by the networking group. The vSwitch is under vCenter's control, not traditional network management software. Additionally, the network management team has little visibility into the VM. This visibility problem has been addressed by several networking vendors by having vCenter notify the networking team of changes or polling for changes and then displaying this information along with the traditional network data, which greatly helps with problem determination.
Some ways to address the problem
Blade Networks has an application that runs on its switch and Force10's next release of its OS addresses the VLAN problem. The switches poll vCenter looking for any changes, or alternatively listens for vCenter to send out a message announcing a change. If the switch finds any changes, it will automatically perform the configuration. The virtualisation operator doesn't have to coordinate the change with network operation, allowing start up of the VM to go smoothly. The polling interval does need to be smaller than the time it takes to start a VM to make sure the switch sees the change fast enough. In Force10's first release, the only parameter it monitors for is VLAN. Blade Network goes further by also applying the full range of policies at the network switch based on the vNIC or VM's UUID. This solution still requires that policies be implemented in the vSwitch.


