MY PUBLIC VMWORLD SCHEDULE

This year will be an action-packed VMWorld for me, presenting sessions, participating in two panel sessions, hosting a group discussion and available in two “Meet the expert” sessions. Presenting the following sessions: INF-STO1545 - Architecting Storage DRS Datastore Clusters INF-VSP1683 - VMware vSphere Cluster Resource Pools Best Practices Panel sessions: (TAM Day) - ASK THE EXPERTS INF-VSP1504 - Ask the Expert vBloggers Hosting the GD22 - Resource management (DRS/SDRS) group discussion. I invited Anne Holler (Lead engineer DRS) to host this session together with me. During Meet the Experts session 13 and session 17 I’m available for short meetings to answer your resource management (DRS\SDRS) questions. Here is the week schedule of the sessions/events/activities that I will be taking part of, be sure to sign up if you have not already: Sunday (TAM Day): 14:35 – 15:35 : ASK THE EXPERTS Monday: 14:30 – 15:30 : INF-VSP1504 - Ask the Expert vBloggers 16:00 – 17:00 : GD22 – Resource Management Tuesday: 12:30 – 13:30 : INF-STO1545 - Architecting Storage DRS Datastore Clusters (Repeat session) 15:00 – 16:00 : INF-VSP1683 - vSphere Cluster Resource Pools Best Practices Wednesday: 08:30 – 09:30 : INF-STO1545 - Architecting Storage DRS Datastore Clusters 12:30 – 13:30 : Expert 13 Thursday: 12:00 – 12:00 : Expert 17

DRS AND MEMORY BALANCING IN NON-OVERCOMITTED CLUSTERS

First things first, I normally do not recommend changing advanced settings. Always try to tune system behavior by changing the settings provided by the user interface or try to understand system behavior and how it aligns with your design. The “problem” DRS load balancing recommendations could be sub-optimal when no memory overcommitment is preferred. Some customers prefer not to use memory overcommitment. The clusters contain (just) enough memory capacity to ensure all running virtual machines have their memory backed by physical memory. Nowadays it is not uncommon seeing virtual machines with fairly highly allocated (consumed) memory and due to the use of large pages on hosts with recent CPU architectures, little to no memory is shared. Common scenario with this design is a usual host memory load of 80-85% consumed. In this situation, DRS recommendations may have a detrimental effect on performance as DRS does not consider consumed memory but active memory. DRS behavior When analyzing the requirements of a virtual machine during load balancing operations, DRS calculates the memory demand of the virtual machine. The main memory metric used by DRS to determine the memory demand is memory active. The active memory represents the working set of the virtual machine, which signifies the number of active pages in RAM. By using the working-set estimation, the memory scheduler determines which of the allocated memory pages are actively used by the virtual machine and which allocated pages are idle. To accommodate a sudden rapid increase of the working set, 25% of idle consumed memory is allowed. Memory demand also includes the virtual machine’s memory overhead. Let’s use an 8 GB virtual machine as example on how DRS calculates the memory demand. The guest OS running in this virtual machine has touched 50% of its memory size since it was booted but only 20% of its memory size is active. This means that the virtual machine has consumed 4096 MB and 1639.2 MB is active. As mentioned, DRS accommodate a percentage of the idle consumed memory to accommodate a sudden increase of memory use. To calculate the idle consumed memory, the active memory 1639.2 MB is subtracted from the consumed memory, 4096 MB, resulting in a total 2456.8 MB. By default DRS includes 25% of the idle consumed memory, i.e. 614.2 MB. The virtual machine has a memory overhead of 90 MB. The memory demand DRS uses in it’s load balancing calculation is as follows: 1639.2 MB + 614.2 MB + 90 MB = 2343.4 MB. This means that DRS will select a host that has 2343.4 MB available for this machine and the move to this host improves the load balance of the cluster. DRS and corner stone of virtualization resource overcommitment Resource sharing and overcommitment of resources are primary elements of the virtualization. When designing virtual infrastructure it is a challenge to build the environment in such a way that it can handle virtual machine workloads while improving server utilization. Because every workload is not equal, applying resource allocation settings such as shares, reservations and limits can make distinction in priority. DRS is designed with this corner stone in mind. And that’s makes DRS sometimes a hard act to follow. DRS is all about solving imbalance and providing enough resources to the virtual machines aligned to their demand. This means that DRS balances workload on demand and trust in its core value that overcommitment is allowed. It then relies on the host local scheduler to figure out the priority of the virtual machines. And this behavior is sometimes not in line with the perception of DRS. A common perception is that DRS is about optimizing performance. This is partially true. As mentioned before DRS looks at the demand of the VM, and will try to mix and match activity of the virtual machines with the available resources in the cluster. As it relies on resource allocation settings, it assumes that priority is defined for each virtual machine and that the host local schedulers can reclaim memory safely. For this reason the DRS memory imbalance metric is tuned to focus on VM active memory to allow efficient sharing of host memory resources. Allowing to run with less cluster memory than the sum of all running virtual machine memory sizes and reclaiming idle consumed memory from lower priority virtual machines for other virtual machines’ active workloads. Unfortunately DRS does not know when the environment is designed in such a way to avoid overcommitment. Based on the input it can place a virtual machine on a host with virtual machine that have lots of idle consumed memory laying around. Instigating memory reclamation. In most cases this reclamation is hardly noticeable due to the use of the balloon driver. However in the case where all hosts are highly utilized, ballooning might not be as responsive as required, forcing the kernel to compress memory and swap. This means that migrations for the sole purpose of balancing active memory are not useful in environments like these and, if the target host memory is highly consumed, can cause a performance impact on the migrating virtual machine as it waits to obtain memory and on the other virtual machines on the target host as they do processing to allow reclamation of their idle memory. The solution? You might want to change the 25% idle consumed memory setting The solution I recommend to start with is to lower the migration threshold by moving the slider to the left. This allows the DRS cluster to have an higher imbalance and allows DRS to be more conservative when recommending migrations. If this is not satisfactory, then I would suggest changing the DRS advanced option called IdleTax. Please note that this DRS advanced option is not the same setting as the memory kernel setting. Mem.IdleTax. The DRS IdleTax advanced option (default 75) controls how much consumed idle memory should be added to active memory in estimating memory demand. The calculation is as follows: 100-IdleTax. Default caluculation = 100-75=25 This means that the smaller the value of IdleTax, more consumed idle memory is added to the active memory by DRS for load balancing. Be aware that the value of IdleTax is a heuristic, tuned to facilitate memory overcommitment; tuning it to a lower value is appropriate for environments not using overcommitment. Note that the option is set per cluster, and would need to be changed for all DRS clusters as appropriate. Again, try to use a lower migration threshold setting and monitor if this setting provides satisfying results before setting this advanced feature.

STORAGE DRS ENABLES SIOC ON DATASTORES ONLY IF I/O LOAD BALANCING IS ENABLED

Lately, I’ve received some comments why I don’t include SIOC in my articles when talking about space load balancing. Well, Storage DRS only enables SIOC on each datastore inside the datastore cluster if I/O load balancing is enabled. When you don’t enable I/O load balancing during the initial setup of the datastore cluster, SIOC is left disabled. Keep in mind when I/O load balancing is enabled on the datastore cluster and you disable the I/O load balancing feature, SIOC remains enabled on all datastores within the cluster.

CONSIDERATIONS WHEN MODIFYING THE INDIVIDUAL VM AUTOMATION LEVEL

Recently I received some questions about the behavior of DRS when the automation level of an individual virtual machine is modified. DRS allows customization of the automation levels for individual virtual machines to override the DRS cluster automation level. The most common reason for modifying the automation level is to prevent DRS move a virtual machine automatically. Selecting an automation level mode other than the default cluster automation level or fully automated impacts (daily) operational procedures. It might impact cluster balance and/or resource availability if the operational procedures are not adjusted to align with the “new” behavior of DRS when dealing with non-default automation levels. Before continuing with the impact and caveats of a non-default automation level, let’s zoom into their behavior. Level of automation There are five automation level modes: • Fully Automated • Partially Automated • Manual • Default • Disabled Each automation level behaves differently:

TO WHICH HOST-LEVEL LATENCY STATISTIC IS THE SIOC CONGESTION THRESHOLD RELATED?

Today someone asked if the congestion threshold of SIOC is related to which host latency threshold? Is it the Device average (DAVG), Kernel Average (KAVG) or Guest Average (GAVG)? Well actually it’s none of the above. DAVG, KAVG and GAVG are metrics in a host-local centralized scheduler that has complete control over all the requests to the storage system. SIOC main purpose is to manage shared storage resources across ESXi hosts, providing allocation of I/O resources independent of the placement of virtual machines accessing the shared datastore. And because it needs to regulate and prioritize access to shared storage that spans multiple ESXi hosts, the congestion threshold is not measured against a host-side latency metric. But to which metric is it compared? In essence the congestion threshold is compared with the weighted average of D/AVG per host, the weight is the number of IOPS on that host. Let’s expand on this a bit further. Average I/O latency To have an indication of the load of the datastore on the array, SIOC uses the average I/O latency detected by each host connected to that datastore. Average latency across hosts is used to cope with the variety of workloads, the characteristic of the active workloads, such as read versus writes, I/O size and degree of sequential I/Os in addition to array behavior such as block location, caching policies and I/O scheduling. To calculate and normalize the average latency across hosts, each host writes its average device latency and number of I/Os for that datastore in a file called IORMSTATS.SF stored on the same datastore. A common misconception about SIOC is that it’s compute cluster based. The process of determining the datastore-wide average latency really reveals the key denominator – hosts connected to the datastore - . All hosts connected to the datastore write to the IORMSTATS.SF file, regardless of cluster membership. Other than enabling SIOC, vCenter is not necessary for normal operations. Each connected host reads the IORMSTATS.SF file each 4 seconds and locally computes the datastore-wide average to use for managing the I/O stream. Therefor cluster membership is irrelevant. Datastore wide normalized I/O latency Back to the process of computing the datastore wide normalized I/O latency. The average device latencies of each host are normalized by SIOC based on the I/O request size. As mentioned before, not all storage related workloads are the same. Workloads issuing I/Os with a large request size result in longer device latencies due to way storage arrays process these workloads. For example, when using a larger I/O request size such as 256KB, the transfer might be broken up by the storage subsystem into multiple 64KB blocks. This operation can lead to a decline of transfer rate and throughput levels, increasing latency. This allows SIOC to differentiate high device latency from actual I/O congestion at the device itself. Number of I/O requests complete per second At this point SIOC has normalized the average latency across hosts based on I/O size, next step is to determine the aggregate number of IOPS accessing the datastore. As each host reports the number of I/O requests complete per second, this metric is used to compare and prioritize the workloads. I hope this mini-deepdive into the congestion thresholds explains why the congestion threshold could never be solely related to a single host-side metric . Because the datastore-wide average latency is a normalized value, the latency observed of the datastore per individual host may be different than the latency SIOC reports per datastore. .

REMOVING THE HORIZONTAL BAR IN THE FOOTER OF A WORD DOC

Now for something completely different, a tip how to extend your life with about 5 years - or how to remove the horizontal bar in the footer of a word document. Unfortunately I have to deal with the mark-up of word documents quite frequently and am therefor exposed to the somewhat unique abilities of the headers and footers feature of MS-Word. During the edit process of the upcoming book, Word voluntarily added a horizontal bar to my footer. Example depicted below. However word doesn’t allow you to highlight and select a horizontal bar and therefor cannot be easily removed by pressing the delete button. This means you have to explore the fantastic menu of word. To remove the bar: 1. Open the footers section, by clicking in that area in the document. 2. Go to menu option Format 3. Borders and Shading 4. The borders and shading menu shows the line that miraculous appeared in my footer, by selecting the option None at the right side of the window it removes the horizontal bar from the footer. 5. Click OK I hope this short tip helps you to keep the frustration to a minimum.

DISABLING MINGOODNESS AND COSTBENEFIT

Over the last couple of months I’ve seen recommendations popping up on changing the MinGoodNess and CostBenefit settings to zero on a DRS cluster (KB1017291) . Usually after the maintenance window, when hosts where placed in maintenance mode, the hosts remain unevenly loaded and DRS won’t migrate virtual machines to the less loaded host. By disabling these adaptive algorithms, DRS to consider every move and the virtual machines will be distributed aggressively across the hosts. Although this sounds very appealing, MinGoodness and CostBenefit calculations are created for a reason. Let’s explore the DRS algorithm and see why this setting should only be used temporarily and not as a permanent setting. DRS load balance objectives DRS primary objective is to provide virtual machines their required resources. If the virtual machine is getting the resources it request (dynamic entitlement), than there is no need to find a better spot. If the virtual machines do not get their resources specified in their dynamic entitlement, then DRS will consider moving the virtual machine depending on additional factors. This means that DRS allow certain situations where the administrator feels like the cluster is unbalanced, such as an uneven virtual machine count on hosts inside the cluster. I’ve seen situations where one host was running 80% of the load while the other hosts where running a couple of virtual machines. This particular cluster was comprised of big hosts, each containing 1TB memory while the entire virtual machine memory footprint was no more than 800GB. One host could easily run all virtual machines and provide the resources the virtual machines were requesting. This particular scenario describes the biggest misunderstanding of DRS, DRS is not primarily designed to equally distribute virtual machines across hosts in the cluster. It distributes the load as efficient as possible across the resources to provide the best performance of the virtual machines. And this is the key to understand why DRS does or does not generate migration recommendation. Efficiency! To move virtual machines around, it cost CPU cycles, memory resources and to a smaller extent datastore operation (stun/unstun) virtual machines. In the most extreme case possible, load balancing itself can be a danger to the performance of virtual machines by withholding resources from the virtual machines, by using it to move virtual machines. This is worst-case scenario, but the main point is that the load balancing process cost resources that could also be used by virtual machines providing their services, which is the primary reason the virtual infrastructure is created for. To manage and contain the resource consumption of load balancing operations, MinGoodness and CostBenefit calculations were created. CostBenefit DRS calculates the Cost Benefit (and risk) of a move. Cost: How many resources does it take to move a virtual machine by vMotion? A virtual machine that is constantly updating its large memory footprint cost more CPU cycles and network traffic than a virtual machine with a medium memory footprint that is idling for a while. Benefit: how many resources will it free up on the source host and what will the impact be on the normalized entitlement on the destination host? The normalized entitlement is the sum of dynamic entitlement of all the virtual machines running on that host divided by the capacity of the host. Risk is predicted how the workload might change on both the source and destination host and if the outcome of the move of the candidate virtual machine is still positive when the workload changes. MinGoodness To understand which host the virtual machine must move to, DRS uses the normalized entitlement of the host as the key metric and will only consider hosts that have a lower normalized entitlement than the source host. MinGoodness helps DRS understand what effect the move has on the overall cluster imbalance. DRS awards every move a CostBenefit and MinGoodness rating and these are linked together. DRS will only recommend a move with a negative CostBenefit rating if the move has a highly positive MinGoodness rating. Due to the metrics used, CostBenefit ratings are usually more conservative than the MinGoodness ratings. Overpowering the decision to move virtual machine to host with a lower normalized entitlement due to the cost involved or risk of that particular move. When MinGoodness and CostBenefit are set to zero, DRS calculates the cluster imbalance and recommend any move* that increases the balance of the normalized entitlement of each host within the cluster without considering the resource cost involved. In oversized environments, where resource supply is abundant, setting these options temporarily should not create a problem. In environments where resource demand rivals resource supply, setting these options can create resource starvation. *The number of recommendations are limited to the MaxMovesPerHost calculation. This article contains more information about MaxMovesPerHost. Recommendation My recommendation is to use this advanced option sparingly, when host-load is extremely unbalanced and DRS does not provide any migration recommendation. Typically when the hosts in the cluster were placed in maintenance mode. Permanently activating this advanced option is similar to lobotomizing the DRS load balancing algorithm, this can do more harm in the long run as you might see virtual machines in an almost-constant state of vMotion.

LIMITING THE NUMBER OF STORAGE VMOTIONS

When enabling datastore maintenance mode, Storage DRS will move virtual machines out of the datastore as fast as it can. The number of virtual machines that can be migrated in or out of a datastore is 8. This is related to the concurrent migration limits of hosts, network and datastores. To manage and limit the number of concurrent migrations, either by vMotion or Storage vMotion, a cost and limit factor is applied. Although the term limit is used, a better description of limit is maximum cost. In order for a migration operation to be able to start, the cost cannot exceed the max cost (limit). A vMotion and Storage vMotion are considered operations. The ESXi host, network and datastore are considered resources. A resource has both a max cost and an in-use cost. When an operation is started, the in-use cost and the new operation cost cannot exceed the max cost. The operation cost of a storage vMotion on a host is “4”, the max cost of a host is “8”. If one Storage vMotion operation is running, the in-use cost of the host resource is “4”, allowing one more Storage vMotion process to start without exceeding the host limit. As a storage vMotion operation also hits the storage resource cost, the max cost and in-use cost of the datastore needs to be factored in as well. The operation cost of a Storage vMotion for datastores is set to 16, the max cost of a datastore is 128. This means that 8 concurrent Storage vMotion operations can be executed on a datastore. These operations can be started on multiple hosts, not more than 2 storage vMotion from the same host due to the max cost of a Storage vMotion operation on the host level. [caption id=“attachment_2099” align=“aligncenter” width=“366” caption=“Storage vMotion in progress”][/caption] How to throttle the number of Storage vMotion operations? To throttle the number of storage vMotion operations to reduce the IO hit on a datastore during maintenance mode, it preferable to reduce the max cost for provisioning operations to the datastore. Adjusting host costs is strongly discouraged. Host costs are defined as they are due to host resource limitation issues, adjusting host costs can impact other host functionality, unrelated to vMotion or Storage vMotion processes. Adjusting the max cost per datastore can be done by editing the vpxd.cfg or via the advanced settings of the vCenter Server Settings in the administration view. If done via the vpxd.cfg, the value vpxd.ResourceManager.MaxCostPerEsx41Ds is added as follows:

FAB-FOUR: VMWORLD 2012 SESSIONS APPROVED

This morning I found out that my four sessions are accepted. I’m really pleased and I am looking forward to presenting at each one of them. Two sessions, Architecting Storage DRS Datastore Clusters and vSphere Cluster Resource Pool Best Practices are also scheduled for VMWorld Barcelona. Session ID: STO1545 Session Title: Architecting Storage DRS Datastore Clusters Track: Infrastructure Presenting at: US and Barcelona Presenting with: Valentin Hamburger Session ID: VSP1504 Session Title: Ask the Expert vBloggers Track: Infrastructure Presenting at: US Presenting with: Duncan Epping, Scott Lowe, Rick Scherer and Chad Sakac Session ID: VSP1683 Session Title: vSphere Cluster Resource Pools Best Practices Track: Infrastructure Presenting at: US and Barcelona Presenting with Rawlinson Rivera Session ID: CSM1167 Session Title: Architecting for vCloud Allocation Models Track: Operations Presenting at: US Presenting with Chris Colotti Can’t wait to attend VMworld 2012! See you there.

VMWARE VSPHERE STORAGE DRS INTEROPERABILITY TECHNICAL PAPER AVAILABLE

Today my second white paper, VMware vSphere Storage DRS Interoperability, is made available for download at the Technical Resource Center at VMware.com. This white paper presents an overview of best practices for customers considering the implementation of VMware vSphere Storage DRS in combination with advanced storage device features or other VMware products. This document zooms in on Storage DRS interoperability with array based features, such as Auto-Tiering, Thin provisioning, Depulication but also explains VMware products such as Snapshots. A small preview: VMware vSphere Snapshots Storage DRS supports virtual machine snapshots. By default, it collocates them with the virtual machine disk file to prevent fragmentation of the virtual machine. Also by default, Storage DRS applies a VMDK affinity rule to each new virtual machine. If it migrates the virtual machine to another datastore, all the files, including the snapshot files, move with it. If the virtual machine is configured with an inter-VMDK affinity setting, the snapshot is placed in the directory of its related disk and is moved to the same destination datastore as when migrated by a Storage vMotion operation. VMware supports the use of vSphere snapshots in combination with Storage DRS. Go and download it here: http://www.vmware.com/resources/techresources/10286