STORAGE DRS LOAD BALANCE FREQUENCY
Storage DRS load balancing frequency differs from DRS load balance frequency, where DRS runs every 5 minutes to balance CPU and memory resources, Storage DRS uses a far more complex load balancing scheme. Let’s take a closer look at Storage DRS load balancing. Default invocation period The invocation period of Storage DRS is every 8 hours and uses what’s called past-day statistics. Storage DRS triggers a load balancing evaluation process if a datastore exceeds the space utilization threshold. Storage DRS load balances space utilization of the datastores and if I/O metric is enabled, load balances on I/O utilization as well. Space utilization and I/O load on a datastore are two different load patterns, therefore Storage DRS uses different methods to accumulate and analyze IO load patterns and space utilization of the datastores within the datastore cluster. Space load balancing statistic collection Analyzing space utilization is rather straightforward; Storage DRS needs to understand the growth rate of each virtual machine and the utilization of each datastore. It collects information from the vCenter database to understand the disk usage and file structure of each virtual machine. Each ESXi host reports datastore utilization at a frequent interval and this is stored in the vCenter database as well. Storage DRS checks whether the datastore utilization is above the user-set threshold. When generating a load balance recommendation, Storage DRS knows where to move a virtual machine as it knows the current space growth of virtual machines on destination datastores, preventing a threshold violation direct after the placement. I/O load utilization is a different beast. I/O load may grow over time, however the datastore can experience a temporary increase of load. How does Storage DRS handle these spikes? Enter past-day statistics! I/O load balancing statistic collection Storage DRS uses two main information sources for I/O load balancing statistic collection, vCenter and the SIOC injector. vCenter statistics is uses to understand the workload each virtual disk is generating, this is called workload modeling. SIOC injector is used to understand the device performance and this is called device modeling. See “Impact of load balancing on datastore cluster configuration” for more info about device and workload modeling. Data of workload and device modeling is aggregated in a performance snapshot and is used as input for generating migration recommendations. Migrating virtual machine disk files takes time and most of all it impacts the infrastructure, migrating based on peak value is the last thing you want to do when you are introducing long-term high impact workloads. Therefore Storage DRS starts to recommend I/O load-related recommendations once an imbalance persists for some period of time. To avoid being caught out by peak load moments, Storage DRS does not use real-time statistics. It aggregates all the data points collected over a period of time. By using 90th percentile values, Storage DRS filters out the extreme spikes while still maintaining a good view of the busiest period of that day as this value translate to the lowest edge of the busiest period. As workloads shift during the day enough information needs to be collected to make a good assessment of the workloads. Therefore Storage DRS needs at least 16 hours of data before recommendation are made. By using at least 16 hours worth of data Storage DRS has an enough data of the same timeslot so it can compare utilization of datastores for example: datastore 1 to datastore 2 on Monday morning at 11:00. As 16 hour is 2/3 of the day Storage DRS receives enough information to characterize the performance of datastore on that day. But how does this tie in with the 8 hour invocation period? 8-hour invocation period and 16 hours worth of data Storage DRS uses 16 hours of data, however this data must be captured in the current day otherwise the performance snapshot of the previous day is used. How is this combined with the 8-hour invocation periods? This means that technically, the I/O load balancing is done every 16 hours. Usually after midnight, after the day date, the stats are fixed and rolled up. This is called the rollover event. The first invocation period (08:00) after the rollover event uses the 24 hours statistics of the previous day. After 16 hours are passed of the current day, Storage DRS uses the new performance snapshot and may evaluate moves.
WHITEBOARD DESK
As I’m an avid fan of “post your desk topics \ workspace ” forum threads, I thought it might be nice to publish a blog article about my own workspace. I always love to see how other people design their work environment and how they customize furniture to suit their needs. Hopefully you can find some inspiration in mine. Last year I decided to refurbish my home office. To create a space that enables me to do my work in the most optimum way, and of course that is pleasing to the eye. The first thing that came to mind was a whiteboard and a really big one. So I needed to build a wall to hang the whiteboard, as the room didn’t had any wall that could hold a whiteboard big enough. After completion of the wall, a 6 x 3 feet wide whiteboard found its way to my office. Although its roughly 5 to 6 feet away from my desk, I realized I didn’t use it enough due to distance. Sitting behind the desk while on the phone or just using my computer, I found myself scribbling on pieces of paper instead of getting out of my chair and walk over to the whiteboard. Therefor I needed a small whiteboard I could grab and use at my desk. It seemed reasonable, however I like minimalistic designs where clutter is removed as much as possible. I needed to come up with something different; enter the whiteboard desk! Whiteboard desk Instead of buying a mini whiteboard that needs to be stored when not used, I decided to visit my local IKEA and see what’s available. Besides “show your desk” threads I hit ikeahackers.net on a daily basis. While looking at tables, I noticed that the IKEA kitchen department sells customized tabletops. Each dimension is possible in almost every shape. I decided to order a 7 feet by 3 feet high-gloss white tabletop with a stainless steel edge. The Ikea employee asked where to put the sink, she was surprised when I told her that the tabletop was going to function as a desk. I chose to order the 2 inch thick tabletop as I need to have a desktop that is sturdy enough to hold the weight of a 27” I-mac and a 30” TFT screen. The stainless steel edge fits snug around the desk and covers each side; it doesn’t stick out and is not noticeable when typing. It looks fantastic! However the downside is the price, it was more expensive than the tabletop itself. The alternative is a laminate cover that looks like it will be worn out easily. While spending most of my time behind my desk I thought it was worth the investment of buying the real thing. The high-gloss finish acts as the whiteboard surface and works like a charm with any whiteboard markers. I left notes on my desk for multiple days and could be removed without leaving a trace. The tabletop rest on two IKEA Vika Moliden stands, due to the cast of the shadow its very difficult to notice that the color of the stands do not exactly match the color as the stainless steel edge. The whiteboard desk is just an awesome piece of furniture. When on the phone I can take notes on my desk while immediately drawing diagrams next to it. It saves a lot of trees, saves a lot of time scrambling for a piece of paper, and a pen and decreases clutter on the desk. The only thing you need to do when building a whiteboard desk is to banish all permanent markers in your office. :) It would be awesome to see what your workspace looks like. What do you love about your workspace and maybe show your own customizations? I would love to see blogs articles pop up describing the workspace of bloggers. Please post a link to your blog article in the comment section.
AGGREGATING DATASTORES FROM MULTIPLE STORAGE ARRAYS INTO ONE STORAGE DRS DATASTORE CLUSTER.
Combining datastores located on different storage arrays into a single datastore cluster is a supported configuration, such a configuration could be used during a storage array data migration project where virtual machines must move from one array to another array, using datastore maintenance mode can help speed up and automate this project. Recently I published an article about this method on the VMware vSphere blog. But what if multiple arrays are available to the vSphere infrastructure and you want to aggregate storage of these arrays to provide a permanent configuration? What are the considerations of such a configurations and what are the caveats? Key areas to focus on are homogeneity of configurations of the arrays and datastores. When combining datastores from multiple arrays it’s highly recommended to use datastores that are hosted on similar types of arrays. Using similar type of arrays, guarantees comparable performance and redundancy features. Although RAID levels are standardized by SNIA, implementation of RAID levels by different vendors may vary from the actual RAID specifications. An implementation used by a particular vendor may affect the read and write performance and the degree of data redundancy compared to the same RAID level implementation of another vendor. Would VASA (vSphere Storage APIs - Storage Awareness) and Storage profiles be any help in this configuration? VASA enables vCenter to display the capabilities of the LUN/datastore. This information could be leveraged to create a datastore cluster by selecting the datastores that have similar Storage capabilities details, however the actual capabilities that are surfaced by VASA are being left to the individual array storage vendors. The detail and description could be similar however the performance or redundancy features of the datastores could differ. Would it be harmful or will Storage DRS stop working when aggregating datastores with different performance levels? Storage DRS will still work and will load balance virtual machine across the datastores in the datastore cluster. However, Storage DRS load balancing is focused on distributing the virtual machines in such a way that the configured thresholds are not violated and getting the best overall performance out of the datastore cluster. By mixing datastores that provide different performance levels, virtual machine performance could not be consistent if it would be migrated between datastores belonging to different arrays. The article “Impact of load balancing on datastore cluster configuration” explains how storage DRS picks and selects virtual machine to distribute across the available datastores in the cluster. Another caveat to consider is when virtual machines are migrated between datastores of different arrays; VAAI hardware offloading is not possible. Storage vMotion will be managed by one of the datamovers in the vSphere stack. As storage DRS does not identify “locality” of datastores, it does not incorporate the overhead caused by migrating virtual machines between datastores of different arrays. When could datastores of multiple arrays be aggregated into a single datastore if designing an environment that provides a stable and continuous level of performance, redundancy and low overhead? Datastores and array should have the following configuration: • Identical Vendor. • Identical firmware/code. • Identical number of spindles backing diskgroup/aggregate. • Identical Raid Level. • Same Replication configuration. • All datastores connected to all host in compute cluster. • Equal-sized datastores. • Equal external workload (best non at all). Personally I would rather create a multiple datastore clusters and group datastores belonging to a single storage array into one datastore cluster. This will reduce complexity of the design (connectivity), no multiple storage level entities to manage (firmware levels, replication schedules) and will leverage VAAI which helps to reduce load on the storage subsystem. If you feel like I missed something, I would love to hear reasons or recommendations why you should aggregate datastores from multiple storage arrays. More articles in the architecting and designing datastore clusters series: Part1: Architecture and design of datastore clusters. Part2: Partially connected datastore clusters. Part3: Impact of load balancing on datastore cluster configuration. Part4: Storage DRS and Multi-extents datastores. Part5: Connecting multiple DRS clusters to a single Storage DRS datastore cluster.
CONNECTING MULTIPLE DRS CLUSTERS TO A SINGLE STORAGE DRS DATASTORE CLUSTER.
Recently I received the question if you can connect multiple compute (HA and DRS) clusters to a single Storage DRS datastore cluster and specifically how this setup might impact Storage IO Control functionality. Let’s cover sharing a datastore cluster by multiple compute clusters first before diving into details of the SIOC mechanism. Sharing datastore clusters Sharing datastore clusters across multiple compute clusters is a supported configuration. During virtual machine placement the administrator selects which compute cluster the virtual machine will run in, Storage DRS selects the host that can provide the most resources to that virtual machine. A migration recommendation generated by Storage DRS does not move the virtual machine at host level, consequently a virtual machine cannot move from one compute cluster to another compute cluster by any operation initiated by Storage DRS. Maximums Please remember that the maximum supported number of hosts connected to a datastore is 64. Keep this in mind when sizing the compute cluster or connecting multiple compute clusters to the datastore cluster. As the maximum number of datastores inside a datastore cluster is 32 I think that the number of host connected is the first limit you hit in such a design as the total supported number of paths is 1024 and a host can connect up to 255 LUNs. The VAAI-factor If the datastores are formatted with the VMFS, it’s recommended to enable VAAI on the storage Array if supported. One of the important VAAI primitive is the Hardware assisted locking, also called Atomic Test and Set (ATS). ATS replaces the need for hosts to place a SCSI-2 disk lock on the LUN while updating the metadata or growing a file. A SCSI-2 disk lock command locks out other host from doing I/O to the entire LUN, while ATS modifies the metadata or any other sector on the disk without the use of a SCSI-2 disk lock. This locking was the focus of many best practices around the connectivity of datastores. To reduce the amount of locking, the best practice was to reduce the number of host attached. By using newly formatted VMFS5 volumes in combination with a VAAI-enabled storage array, scsi-2 disk lock commands are a thing of the past. Upgraded VMFS5 volumes or VMFS3 volumes will fall back to using SCSI-2 disk locks if the ATS command fails. For more information about VAAI and ATS please read the KB article 1021976. Note: If your array doesn’t support VAAI, be aware that SCSI-2 disk lock commands can impact scaling of the architecture. Storage DRS IO Load balancing and Storage IO Control When enabling the IO Metric on the datastore cluster, Storage DRS automatically enables Storage IO Control (SIOC) on all datastores in the cluster. Storage DRS uses the IO injector from SIOC to determine the capabilities of a datastore, however by enabling SIOC it also provides a method to fairly distribute I/O resources during times of contention. SIOC uses virtual disk shares in order to distribute storage resources fairly and are applied on a datastore wide level. The virtual disk shares of the virtual machine running on that datastore are relative to the virtual disk shares of other virtual machines using that same datastore. To be more specific, SIOC is a host-level module and aggregates the per-host views into a single datastore view in terms of observed latency. If the observed latency exceeds the SIOC level latency threshold, each host sets its own IO queue length based on the total virtual disks shares of the virtual machines in that host using the datastore. As SIOC and its shares are datastore focused cluster membership of the host has no impact on detecting the latency threshold violation and managing the I/O stream to the datastore. Previous articles in the SDRS short series Architecture and design of Datastore clusters: Part1: Architecture and design of datastore clusters. Part2: Partially connected datastore clusters. Part3: Impact of load balancing on datastore cluster configuration. Part4: Storage DRS and Multi-extents datastores.
I/O ANALYZER V1.1
I/O Analyzer v1.1 is now live on the Flings site: http://labs.vmware.com/flings/io-analyzer I/O Analyzer is a virtual appliance tool for measuring storage performance. This version of I/O Analyzer adds the ability to run trace replay – a function which allows a user to replay an I/O trace that was captured elsewhere (with vscsistats) on the target test system. This version also has cool data visualization charts, both for the characteristics of an imported trace, and performance results on the test system. This is really cool stuff, go check it out.
IMPACT OF INTRA VM AFFINITY RULES ON STORAGE DRS
By default Storage DRS applies an Intra-VM affinity rule to all new virtual machines in the datastore cluster. The Intra-VM affinity rule keeps the virtual machine files, such as VMX file, log files, vSwap and VMDK files together on one datastore. Keeping all files together on one datastore allows ease of troubleshooting. However Storage DRS load balance algorithms may benefit from distributing the virtual machine across datastores. Let’s zoom in how Storage DRS handles virtual machine with multiple disks when the Intra-VM affinity rule is removed from the virtual machine. DrmDisk Storage DRS uses the construct “DrmDisk” as the smallest entity it can migrate. A DrmDisk represent a consumer of datastore resources. This means that Storage DRS creates a DrmDisk for each VMDK belonging to the virtual machine. The interesting part is the collection of system files and swap file belonging to virtual machines. Storage DRS creates a single Drmdisk for all the system files, if an alternate swapfile location is specified, the vSwap file is represented as a separate DrmDisk and Storage DRS will be disabled on the swap DrmDisk. More info about alternate swapfile locations can be found here. For example a virtual machine with three VMDK’s and with no alternate swapfile locations configured, Storage DRS creates 4 DrmDisk: • A separate DrmDisk for each Virtual Machine Disk File • A DrmDisk for system files (VMX, Swap, logs, etc) Initial placement recommendation will look similar to this screenshot when the Intra-VM affinity rule is disabled. Notice the separate recommendation for the “virtual machine configuration file”? This is the DrmDisk containing the system files. Initial placement Space load balancing Initial placement and Space load balancing benefit from this increased granularity tremendously. Instead of searching a suitable datastore that can fit the virtual machine as a whole, Storage DRS is able to seek for appropriate datastores for each DrmDisk file separately. Recently I wrote an article about datastore cluster fragmentation and Storage DRS ability to issue prerequisite migrations. You can imagine due to the increased granularity, datastore cluster fragmentation is less likely to happen and if prerequisite migrations are required, the number of migrations is expected to be a lot less. IO load balancing Similar to initial placement and load balancing, I/O load balancing benefit from the deeper level of detail. It can find a better fit for each workload generated by the VMDK files. The system file DrmDisk will not be migrated quite often as it small in size and does not generate a lot of I/O often. Storage DRS analyzes the workload and generates a workload model for each DrmDisk, it then decides which datastore it needs to place the DrmDisk to keep the load balanced within the datastore cluster while offering enough performance for each DrmDisk. You can imagine this becomes a lot harder when Storage DRS is required to keep all the VMDK files together. Usually the datastore chosen is the datastore that provides the best performance for the most demanding workload AND is able to store all the virtual machine disk files and system files. Now let’s dig into this a little deeper, for example the virtual machine used in the previous example has two DrmDisk generating heavy workloads, while the DrmDisks containing the system files and VMDK2 are “cold”. If Intra-VM affinity rules are used, Space balancing is required to find a datastore that has 350+ GB free without exceeding the space utilization threshold. If I/O load balancing is enabled, this datastore also needs to provide enough performance to keep the latency below the I/O latency threshold (by default 15ms) after placing the 4 DrmDisks. You can imagine it’s a lot less complicated when space and I/O load balancing are allowed to place each DrmDisk on a datastore that suits their needs. How to change default datastore cluster behavior? Mentioned before, datastore cluster defaults in applying an Intra-VM affinity rule to each new virtual machine. Recently Duncan published an article on how to change the affinity rules on active virtual machines. Unfortunately there is not User-Interface option available that can disable this behavior, so I turned to my good friend and colleague Alan Renouf and he created some nice PowerCLI code to solve this problem: As I’m not a powerCLI user at all, I’m relaying Alan’s instructions: First you need to run the below code to put the function into memory:
STORAGE DRS I/O LOAD BALANCING AND ARRAY-BASED AUTO-TIERING
In its basic form Storage DRS can be used together with any array, however there are a few combinations of Storage array features and Storage DRS features that don’t mix easily. One of the most sought after question is can Storage DRS work with Array based Auto-tiering? And the answer is yes, yes you can use initial placement and out of space avoidance features that Storage DRS offers, however it is not recommended to enable the I/O metric feature. Modeling The main goal of the I/O metric function, popular called I/O load balancing, is to resolve the imbalance of performance delivered from datastores in the datastore cluster. To avoid hotspots in the datastore cluster and decrease overall latency imbalance, Storage DRS I/O load balancing uses device modeling and virtual machine workload modeling. Device modeling helps Storage DRS to understand the performance characteristics of the devices backing the datastores, while virtual machine workload modeling analyzes virtual machine workload running inside the datastore cluster. Both device and workload modeling assists Storage DRS to asses the improvement of I/O latency that will be achieved after a virtual machine migration. Device modeling and the SIOC injector To understand and learn the performance of the devices backing the datastore, Storage DRS uses the Storage IO Control (SIOC) workload injector. To characterize the datastore, SIOC injector opens and read random blocks of the datastore. As the SIOC injector does not open every block backing the datastore, we cannot ensure that the SIOC injector opens an identical number of blocks of each performance tier to characterize the disk. As multiple performance tiers of disk back the datastore there is a possibility that the SIOC injector might open blocks located on similar speed disks, either slow or fast, while the datastore is primarily backed by disk with a different performance level. Let’s use an example to clarify this further. In the diagram pictured above, SIOC opens random blocks and perform its tests. Unfortunately it doesn’t open blocks on other disks. While most of the blocks backing the datastore are located on faster performing disks, Storage DRS device modeling will characterize this disk with performance similar to 7.2K SATA disks. This inaccurate characterization of datastore performance might lead to an incorrect performance assessment and can lead to Storage DRS withholding a migration recommendation while there is sufficient performance available. Segment migration triggered by auto-tiering algorithms By using SIOC injector Storage DRS evaluate the performance of the disks, however Auto-tiering solutions migrate LUN segments (chunks) to different disk types based on the use pattern. Hot segments (frequently accessed) typically move to faster disks while cold segments move to slower disks. Depending on the array type and vendor there are different kind of policies and threshold for these migrations. By default Storage DRS is invoked every 8 hours and requires performance data over more than 16 hours to generate I/O load balancing decisions. Multiple storage vendors offer auto-tiering solutions, each using different time-cycles to collect and analyze workload before moving LUN segments. Some auto-tiering solutions move chunks based on real-time workload while other arrays move chunks after collecting performance data for 24 hours. This means that auto tiering solutions alter the landscape in which the SIOC injector performs its test. Let’s turn to another scenario for clarification. In this scenario, SIOC is primarily opening blocks located in the Tier-1 diskgroup belonging to the datastore. As the datastore isn’t using these segments that often (cold) the auto tiering solution decides to migrate these segments to a lower tier. In this case the segments are migrated to 15K disks instead of SSD devices. Storage DRS expects that the behavior of the device remains the same for at least 16 hours; it will base its calculation on these facts. Auto tiering solutions might change the underlying structure of the datastore based on its algorithm and timescales, conflicting with Storage DRS its calculation. The misalignment of Storage DRS invocation and auto-tiering algorithms cycles makes it unpredictable when LUN segments may be moved, potentially colliding with the Storage DRS calculations and recommendations. Together with the transparency of auto tiering algorithms to Storage DRS and the non-existing communication between Storage DRS and Auto-tiering algorithms create the basis of the recommendation to disable I/O metric on datastore clusters backed by devices participating in an auto-tiering solution. Always verify these recommendations with your storage vendor. Additional information: Duncan wrote an excellent article about the Storage IO Control workload injector, which can be found here. More info on device modeling and load balancing can be found in the article impact of load balancing on datastore cluster configuration. Note: This article is describing Storage DRS behavior based on vSphere 5.
(STORAGE) DRS (ANTI-) AFFINITY RULE TYPES AND HA INTEROPERABILITY
Lately I have received many questions about the interoperability between HA and affinity rules of DRS and Storage DRS. I’ve created a table listing the (anti-) affinity rules available in a vSphere 5.0 environment. Technology Type Affinity Anti-Affinity Respected by VMware HA DRS VM-VM Keep virtual machines together Separate virtual machines No VM-Host Should run on hosts in group Should not run on hosts in group No Must run on hosts in group Must not run on hosts in group Yes SDRS Intra-VM VMDK affinity VMDK anti-affinity N/A VM-VM Not available VM Anti-Affinity N/A As the table shows, HA will ignore most of the (anti-) affinity rules in its placement operations after a host failure except the “Virtual Machine to Host - Must rules”. Every type of rule is part of the DRS ecosystem and exists in the vCenter database only. A restart of a virtual machine performed by HA is a host-level operation and HA does not consult the vCenter database before powering-on a virtual machine. Virtual machine compatibility list The reason why HA respect the “must-rules” is because of DRS’s interaction with the host-local “compatlist” file. This file contains a compatibility info matrix for every HA protected virtual machine and lists all the hosts with which the virtual machine is compatible. This means that HA will only restart a virtual machine on hosts listed in the compatlist file. DRS Virtual machine to host rule A “virtual machine to hosts” rule requires the creation of a Host DRS Group, this cluster host group is usually a subset of hosts that are member of the HA and DRS cluster. Because of the intended use-case for must-rules, such as honoring ISV licensing models, the cluster host group associated with a must-rule is directly pushed down in the compatlist. Note Please be aware that the compatibility list file is used by all types of power-on operations and load-balancing operations. When a virtual machine is powered-on, whether manual (admin) or by HA, the compatibility list is checked. When DRS performs a load-balancing operation or maintenance mode operation, it checks the compatibility list. This means that no type of operation can override must- type affinity rules. For more information about when to use must and should rules, please read this article: Should or Must VM-Host affinity rules. Contraint violations After HA powers-on a virtual machine, it might violate any VM-VM or VM-host should (anti-) affinity rule. DRS will correct this constraint violation in the first following invocation and restore “peace” to the cluster. Storage DRS (anti-) affinity rules When HA restarts a virtual machine, it will not move the virtual machine files. Therefore creation of Storage DRS (anti-) affinity rules do not affect virtual machine placement after a host failure.
RETROSPECT OF 2011 CONTENT DUE TO BLOGGERS SURVEY
vSphere-land.com is running it’s annual Top 25 virtualization blog survey again and I’m really interested to see who are picked this year. Like previous year, some great bloggers disappear while other new great ones emerge. One guy I want to mention by name is Chris Colotti, his blog is a great source of information about vCloud Director. If you haven’t visited his blog yet, go do that right away!. Last year I’ve been pretty busy writing, shaping, designing, wrestling with publishers in order to get our (@DuncanYB) book “vSphere 5 Clustering technical deepdive out to the public. This meant it cut down on research time, which resulted in a smaller number of blogs being released than previous years. So after seeing other people’s blog about their top 10, I was curious to see what I’ve done last year. The articles that are listed are the ones I’m proud of, spending a lot of time on researching them, but most of all, enjoyed the most writing them. Storage DRS initial placement and datastore cluster defragmentation Impact of Load Balancing on datastore cluster configuration Partially Connected datastore clusters Mem minfreepct sliding scale function Upgrading vmfs datastores and Storage DRS Multi NIC vMotion support in vSphere 5.0 Contention on lightly Utilized Hosts Restart vCenter results in DRS load balancing IP-HASH versus Load Based Teaming Setting correct percentage of cluster resources reserved AMD Magny Cours and ESX Please take 5 minutes of your time and vote for your favorite blogger. I hope they will announce the winner like they did last year. 90 minutes of nerve wrecking but oh-so-enjoyable videoshow! Cast your vote now!
STORAGE DRS INITIAL PLACEMENT AND DATASTORE CLUSTER DEFRAGMENTATION
Recently an interesting question was raised about what happens if enough free space is available in the datastore cluster but not enough space is available per datastore during placement of a virtual machine. This scenario is often referred as a defragmented datastore cluster. The short answer is that if not enough space available on any given datastore, then Storage DRS starts to consider migrating existing virtual machines from the datastore to free up space. This article zooms in on the process of generating such an initial placement recommendation. Rules and boundaries within a datastore cluster Storage DRS will not violate the configured space utilization and IO latency threshold of the datastore cluster. This means that Storage DRS will place virtual machines that consume space up to the configured space utilization threshold, for example setting the space utilization threshold to 80% on a 1000GB datastore will allow Storage DRS to place virtual machines that consume space up to 800 GB. Be aware of this when monitoring free space available on the datastores in the cluster. When creating or moving a virtual machine in the datastore, the first thing to consider is the affinity rules. By default virtual machine files are kept together in the working directory of the virtual machine. If the virtual machine needs to be migrated, all the files inside the virtual machines’ working directory are moved. This article features the use of the default affinity rule, however if the default affinity rule is disabled, Storage DRS will move the working directory and virtual disks separately allowing Storage DRS to distribute the virtual disk files on a more granular level. Prerequisite migrations During initial placement, if no datastore with enough space is available in the datastore cluster, Storage DRS starts by searching alternative locations for the existing virtual machines in the datastores and attempts to place the virtual machines to other datastores one by one. As a result Storage DRS may generate sets of migration recommendations of existing virtual machines that allow placement of the new virtual machine. These migrations generated are called prerequisite migrations and combined with the placement operations is called a recommendation set. Depth of recursion Storage DRS uses a recursive algorithm for searching alternative placements combinations. To keep Storage DRS from trying an extremely high number of combinations of virtual machine migrations, the “depth of recursion” is limited to 2 steps. What defines a steps and what counts towards a step? A step can be best defined as a set of migrations out of a datastore in preparation of (or to make room for) another migration into that same datastore. This step can contain one vmdk, but can also contain multiple virtual machines with multiple virtual disks attached. In some cases, room must be created on that target datastore first by moving a virtual machine out to another datastore, which results in an extra step. The following diagram visualizes the process. Storage DRS has calculated that a new virtual machine can be placed in Datastore 1 if VM2 and VM3 are migrated to Datastore 2, however, placing these two virtual machines on datastore 2 will violate the space utilization, therefore room must be created. VM4 is moved out of Datastore2 as part of a step of creating space. This results in Step 1, moving out to Datastore 3, followed by Step 2, moving VM2 and VM3 to Datastore 2 to finally placing the new virtual machine on Datastore 1. Storage DRS stops its search if there are no 2-step moves to satisfy the storage requirement of an initial placement. An advanced setting can be set to change the number of steps used by the search. As always, it is strongly discouraged to change the defaults, as many hours of testing has been invested in researching the setting that offers good performance while minimizing the impact of the operation. If you have a strong case of changing the number of steps, set the advanced configuration option “MaxRecursionDepth”. The default value is 1 the maximum value is 5. Because the algorithm starts counting at 0, default value of 1 allows 2 steps. Goodness value Storage DRS will cycle through all the datastores in the datastore cluster and initiates a search for space on each datastore. A search generates a set of prerequisites migration if it can provide space that allows the virtual machine placement within the depth of recursion. Storage DRS evaluates the generated sets and award each set a goodness value. The set with the least amount of cost (i.e. migrations) is the preferred migration recommendation and shown at the top of the list. Let’s explore this a bit more by using a scenario with 3 datastores. Scenario The datastore cluster contains 3 datastores; each datastore has a size of 1000GB and contains multiple virtual machines with various sizes. The space consumed on the datastores range from 550GB to 650GB, while the space utilization threshold is set to 80%. At this point the administrator creates a virtual machine that requests 350GB of space. Although the datastore cluster itself contains 1225GB of free space, Storage DRS will not go forward and place the virtual machine on any of the three datastores, because placing the virtual machine will violate the space utilization threshold of the datastores. Search process As each ESXi host provide information about the overall datastore utilization and the vmdk statistics, Storage DRS has a clear overview of the most up to date situation and will use these statistics as input for its search. In the first step it will simulate all the necessary migrations to fit VM10 in Datastore 1. The prerequisite migration process with least number of migrations to fit the virtual machine on to Datastore 1 looks as follows: