VSPHERE 5.1 WEB CLIENT: VM OVERRIDES -STORAGE DRS AUTOMATION LEVEL OVERVIEW
Overall the vSphere 5.1 web client attempts to mimic the behavior of menus and settings workflows of the (old) vSphere client. When editing the settings of a datastore cluster, the web client provides the same set of options that can be edited as the vSphere client. However certain functions of overviews and menus are changed in the vSphere 5.1 web client. For example the VM overrides screen. The primary purpose of the VM overrides screen is to display deviant Storage DRS Automation level of the virtual machines inside the datastore cluster. VM overrides and Virtual Machine Settings screens The VM overrides screen is located in the storage view, select the datastore cluster, select the tab Manage and click on the Settings button. The VM overrides screen is the replacement of the virtual machine settings screen of the datastore cluster settings in the vSphere client. Difference in default behavior As you might have noticed, the web client is not listing any virtual machine while the Virtual Machine settings overview in the vSphere client shows 5 virtual machines and a VM template. Already mentioned in the introduction paragraph, the primary purpose of the VM overrides screen has changed from the Virtual Machine settings overview in the vSphere client. The VM Overrides screen only displays a virtual machine is set to a non-default automation level. To display the different behavior, I have change the Automation level of VM3, VM4, VM5. The datastore cluster is configured with a Manual Automation Mode. Therefor the default automation mode is Default (Manual). The previous screenshot shows that all virtual machines are configured with the Default (Manual) automation level, VM3 is changed to Fully Automated, VM4 to Manual and VM5 to disabled. If you want to reproduce this behavior in your own environment, change the automation level in the vSphere client and then go the VM overrides screen in the web client to see the modified virtual machines listed. The VM overrides screen displays the following: Even though VM4 is configured with the same automation level as the datastore cluster, the VM overrides screen displays VM4 as it is not configured with the default automation mode. By changing the automation mode back to Default (Manual) via the Edit screen, VM4 is removed from the VM overrides list. To be honest it took me a while to get used to the new functionality of this screen. I would like to know if you like this new behavior or if you rather prefer the way the virtual machine settings view in the old vSphere client works?
MANUAL STORAGE VMOTION MIGRATIONS INTO A DATASTORE CLUSTER
Frequently I receive questions about the impact of a manual migration into a datastore cluster, especially about the impact of the VM disk file layout. Will Storage DRS take the initial disk layout into account or will it be changed? The short answer is that the virtual machine disk layout will be changed by the default affinity rule configured on the datastore cluster. The article describes several scenarios of migrating “distributed“ and “centralized” disk layout configurations into datastore cluster configured with different affinity rules. Test scenario architecture For the test scenarios I’ve build two virtual machines VM1 and VM2 Both virtual machines are of identical VM configuration, only the datastore location is different. VM1-centralized has a “centralized” configuration, storing all VMDKs on a single datastore, while VM2-distributed has a “distributed” configuration, storing all VMDKs on separate datastores.
STORAGE DRS AND STORAGE VMOTION BUGS SOLVED IN VSPHERE 5.0 UPDATE 2.
Today Update 2 for vSphere ESXI 5.0 and vCenter Server 5.0 were released. I would like to highlight two bugs that have been fixed in this update, one for Storage DRS and one for Storage vMotion Storage DRS vSphere ESXi 5.0 Update 2 was released today and it contains a fix that should be interesting to customers running Storage DRS on vSphere 5.0. The release note states the following bug:
MULTI-NIC VMOTION – FAILOVER ORDER CONFIGURATION
After posting the article “designing your vMotion network” I quickly received the question which failover order configuration is better. Is it better to configure the redundant NIC(s) as standby or as unused? The short answer: always use standby and never unused! Tomas Fojta posted the comment that it does not make sense to place the NICs into standby mode: In the scenario as depicted in the diagram I prefer to use active/unused. If you think about it the standby option does not give you anything as when one of the NICs fails both vmknics will be on the same NIC which does not give you anything.
THIN OR THICK DISKS? – IT’S ABOUT MANAGEMENT NOT PERFORMANCE
This is my contribution to the debate Zero or Thick disks – debunking the performance myth. The last couple of years all sorts of VMware engineers worked very hard to reduce the performance difference between thin disks and thick disks. Many white-papers have been written by performance engineers to explain the improvements made on thin-disk. Therefore today the question whether to use Thin-provisioned disks or Eager zero thick is not about the difference in performance but the difference in management. When using Thin-provisioned VMDKs you need to have a very clear defined process. What to do, when your datastore, which stores the thin provisioned disks is getting full? You need to define a consolidation ratio, you need to understand which operational process might be dangerous to your environment (think Patch-Tuesday) and what space utilization threshold you need to define before migrating thin-provisioned disks to other datastores. Today Storage DRS can help you with many of the fore mentioned challenges. For more information please read the article: Avoiding VMDK level over-commitment while using Thin-provisioned disks and Storage DRS. If Storage DRS is not used, Thin-provisioned disks can require a seamless collaboration between virtualization teams (provisioning and architecture) and storage administrators. When this is not possible due to organizational cultural differences, thin provisioning is rather a risk, than bliss. Zero out process: Eager zero thick on the other hand might provide in some (corner) cases a marginal performance increase; the costs involved could outweigh the perceived benefits. First of all, Eager zero thick disks need to be zeroed out during creation, when your array doesn’t support the VAAI initiatives, this can take a hit on performance and the time to provision is extended. With terabyte sized disks becoming more common this will impact provisioning time immensely. Waste of space: Most virtualized environments use virtual machines, typically configured with oversized OS disks and over-specced data disks, resulting in wasted space full of zero’s. Thin-provisioned disks only occupy the space used for storing data, not zero’s. Migration: Storage vMotion goes out of its way to migrate every little bit of a virtual disk, this means it needs to copy over every zeroed out block. Combined with the oversized disks, you are creating unnecessary overhead on your hosts and storage subsystem copying and verifying the integrity of zeroed out blocks. Migrating thin disks only requires migrating the “user-data”, resulting in faster migration times, lesser overhead on hosts and storage subsystem. In essence, Thin-provisioned disks versus Eager zero thick is all about resource/time saving versus risk avoidance. Choose wisely
VMOTION FUTURES SURVEY
If you have a few spare minutes, please fill out the survey about vMotion futures. We are very interested in how you use vMotion and especially your opinion about use cases for Long-distance vMotion operations. It takes about 10 minutes to get through. http://tinyurl.com/VMwareVMotion2012
DESIGNING YOUR VMOTION NETWORK
A well designed vMotion network will benefit the environment in many ways. Before vSphere 5, designing a vMotion network was relative easy, select the fastest NIC and assign it to a vMotion vmknic. vSphere 4.x supports both 1GB and 10GB networks and since vSphere 5.0 vMotion is able to leverage multiple NICs. Multi-NIC in vSphere 5.x makes it a bit more challenging. The combination of NICs, the failover mode and which load balancing policy need to be taken into consideration when configuring your vMotion network. The benefit of multi-NIC vMotion network In vSphere 5.x vMotion balances the vMotion operations across all available NICs. Both for a single vMotion operation and for multiple concurrent vMotions operations. By using multiple NICs it reduces the duration of a vMotion operation. This benefits:
STORAGE VMOTION AND THE VSPHERE WEB-CLIENT
The new web client of vSphere 5.1 is my weapon of choice when working in my lab. It contains a lot of “hidden” gems, the UI team spends a lot of time crafting and aligning the user-interface to the administrator needs. One thing that drove me nuts was the lack of information when running a Storage vMotion operation. The Recent task doesn’t show anything, other than Storage vMotion operation itself and the target. When using Storage DRS, it only shows the name of the target Datastore cluster. Sometimes you just want to know which datastore the virtual machine migrated to. The new recent task window For example, take a look at the Recent Tasks window in the right size of the corner of the web client. When running a storage vMotion operation, it displays the Storage vMotion task similar to the vSphere client. However, when clicking on the task itself it shows the event info and task in one view. Helping you identify the source and destination datastore of the Storage vMotion process. Compare that to the workflow of the old vSphere client Select the task in the Recent task bar and double click it. This brings you in the Task and Events view of the target datastore cluster. By default the view is displaying events. Therefore you need to select Tasks, then select the task itself and then click on the related events in the bottom view. It may look trivial, but these things do speed up your work. Eradicating unnecessary clicks and wait time before a screen refreshes sure makes your job a little easier.
SIOC ON DATASTORES BACKED BY A SINGLE DATAPOOL
Duncan posted an article today in which he brings up the question: Should I use many small LUNs or a couple large LUNs for Storage DRS? In this article he explains the differences between Storage I/O Control (SIOC) and Storage DRS and why they work well together, to re-emphasize, the goal of Storage DRS load balancing is to fix long term I/O imbalances, while SIOC addresses short term burst and loads. SIOC is all about managing the queue’s while Storage DRS is all about intelligent placement and avoiding bottlenecks. Julian Wood makes an interesting remark, and both Duncan and I hear this remark when discussing SIOC. Don’t get me wrong I’m not picking on Julian, I’m merely stating the fact he made a frequently used argument.
OVERLAPPING DRS VM-HOST AFFINITY RULE IN A VSPHERE STRETCHED CLUSTER
A question on the VMware community forum triggered me to validate DRS behavior of applying VM to Host group rules. The scenario describes a stretched HA cluster with overlapping DRS group rules, allowing to run particular VMs on hosts in a single site and a subset of hosts of both sides. How does DRS handle overlapping groups? The architecture In this scenario the stretched cluster contains four hosts; ESXi-01 and ESXi-02 are located in Site A, ESXi-03 and ESXi-04 are located in Site B. A collection of virtual machines are run by the hosts in the cluster, two management virtual machines, vCenter and vCenterDB are a part of this group. Storage housing the virtual machines are available for all hosts. Storage architecture is not the focus of this article, for more information about storage configurations and stretched vSphere clusters, please read the white paper: VMware vSphere Metro Storage Cluster Case Study Site DRS VM-Host groups ESXi-01 and ESXi-02 are grouped in Host DRS group Host-Site-A, ESXi-03 and ESXi-04 are grouped in Host DRS group Host-Site-B. All virtual machines running in Site-A are grouped in a VM DRS group VM-Site-A, all virtual machines running in Site-B are grouped in a VM DRS group VM-Site-B. All (Site) rules are configured as preferential rules (Should run on). Management DRS VM-Host group In the scenario described on the community forum, the virtual machines vCenter and vCenterDB are placed in an additional VM DRS group; MGMT-VMs. This group should be run on a select set of hosts of both sides, to simulate similar behavior the Host DRS group configured in my environment contains ESXi-02 of Site-A and ESXi-03 of Site-B. The group is named MGMT-Hosts. Overlapping rule-set Because the management virtual machines are a part of the VM-Site-A VM group an overlap of compatible hosts exists. Please note that DRS allows virtual machines to be a member of multiple VM groups. When reviewing the active affinity rules in a DRS cluster, DRS extracts a subset of compatible hosts for each virtual machine and uses this subset for placement and load balancing decisions. A Venn diagram shows the compatible host for the VMs vCenter and vCenterDB and specifically the host(s) listed in the compatibility set. This means that under normal operation DRS will choose to run the virtual machines on ESXi-02 as it satisfies both rules. With normal condition I want to indicate that there is not excessive load on any of the host and all hosts are configured identically without any hardware failures. Back to the scenario, what if there is a host failure or ESXi-02 is placed into maintenance mode? Maintenance mode Now what happens if ESXi-02 is placed into maintenance mode? As previously mentioned, DRS determines the set of compatible hosts. As ESXi-02 is placed into maintenance mode, DRS mark this host as a source host for migration. This way DRS knows which virtual machine to select for migration and excludes ESXi-02 as a valid destination for migrations. DRS must select other hosts listed in the Host DRS group. As ESXi-02 is not a valid destination, DRS needs to select either ESXi-01 or ESXi-03. Which host will it select? This depends on the creation date of the DRS rules, in other words, the newer DRS VM-Host rule is selected first. This is important to understand when applying overlapping VM-Host affinity rules in your environment. The last rule you create is applied first, overruling all other existing rules. In my lab, I created the MGMT rule as last, therefor having the youngest timestamp. There is no option of showing the timestamp in the user-interface of the web and vSphere client. I have provided this feedback to the engineering team. Hopefully they can include it in future releases. I placed ESXi-02 into maintenance mode and DRS migrated the virtual machines to ESXi-03, regulated by the MGMT VM-Host affinity rule. After placing ESXi-03 into maintenance mode, the virtual machines were moved to ESXi-01 as there were no compatible hosts available to satisfy the MGMT affinity rule. As ESXi-01 is listed in the Host-Site-A group of the Site-A affinity rule, DRS had no choice other than moving them to ESXi-01. After resetting the lab and destroying all the rules, I created the same set of host groups, VM groups and affinity rules, but I created the Site-A affinity rule as last. This resulted in the behavior that DRS moved the virtual machines to host ESXi-01 after placing ESXi-02 into maintenance mode, as DRS respected the Site-A affinity rule. Alarms As DRS supports non-contradicting overlapping affinity rules, no alarm was generated. During the scenario where both ESXi-02 and ESXi-03 were placed into maintenance mode, no alarm was triggered. It was expected to see an alarm that an affinity alarm is violated, however after digging through some code and contacting engineering this appears to be behavior by design. In the current release, the alarm is only triggered when mandatory rules (must run on) are violated. HA behavior All rules are configured with preferential rule sets (Should run on) and HA is not aware of DRS constructs. When a mandatory rule set (Must run on) is created, the hosts listed in the rule set are registered in the compatibility list of the virtual machine itself. Only those hosts registered in the compatibility list are viable destinations. During startup, HA checks the compatibility list and attempts to start up the virtual machine on any of these hosts listed. As the virtual machines are a part of a preferential affinity rule, all hosts are listed in the compatibility list and therefor HA could place them on a host outside the DRS Host Group. Conclusion If you want certain virtual machines to gravitate to specific hosts or a specific site, please take into account the way DRS sequence the active affinity rules.