Frank Denneman

ADJUST TIMEOUT VALUE ESXI EMBEDDED HOST CLIENT

I love to use the ESXi Embedded Host Client next to vCenter in my lab. It’s quick, it provide most of the functionality and best of it all it has a functioning VM console when accessing it from a MAC. The ESXi Embedded Host Client time-out default is set to 15 minutes, but you can adjust this setting. On the right side of the menu bar there is a drop down menu next to the IP-address or DNS name of your ESXi server. Open it and go to:

Tue, Apr 12, 2016 vmware

DVD STORE, THE PERFECT HOMELAB WORKLOAD TOOL

DVD Store 2.1, a magnificent tool for all aspiring VCP/VCAP candidates. A great tool for home lab enthousiasts to understand performance metrics, a fantastic tool to understand the behavior of an application stack in a virtual datacenter. WHAT IS DVD STORE? According to the official site the DVD Store Version 2.1 (DS2) is a complete open source online e-commerce test application, with a backend database component, a web application layer, and driver programs. The goal in designing the database component as well as the midtier application was to utilize many advanced database features (transactions, stored procedures, triggers, referential integrity) while keeping the database easy to install and understand. The DS2 workload may be used to test databases or as a stress tool for any purpose. Thanks Todd Muirhead and Dave Jaffe for creating this! However there is a slight challenge in installing it properly. You can install it on windows or on Linux and use many different database programs. I like to use windows for this. Unfortunately I tried to follow the instruction video on youtube and it was lacking some crucial details to get it deployed successfully. Therefor I started to document the steps involved to get it deployed on a Windows 2012 system using SQL 2014 SP1. Please note that you can run DVD store on Linux as well, and it might be even better (more lean and mean than a windows install) for homelabs. If you have a detailed write-up (100% reproducible) of a working deployment DVD store on Linux, please share the link to your article in the comments.

Thu, Mar 31, 2016 home-lab vmware

YOU DO NOT HAVE PERMISSIONS TO VIEW THIS OBJECT ERROR AFTER UPDATING VCSA TO 6.0 UPDATE 1B

Today I’ve updated my vCenter Server Appliance with the VC-6.0.0U1b-Appliance.ISO in my lab. After rebooting I was surprised to see the error “You do not have permissions to view this object” on almost every object in the inventory screen. Unfortunately a reboot of the DC (home lab, I do not run an elaborate AD here)Time to google and it seems that a lot of other people have hit this bug. After googling some more I found the the VMware KB article: KB 2125229. Problem is, this is solely focused on the windows version of vCenter and not focussed on solving the problem occurring on the VCSA. Although I can log in and see the inventory when using my admin account (Lab\vAdmin) I can’t access the objects. Maybe a permission problem? When checking the global permissions the (vAdmin) user is still listed as an administrator. However administrators should be able to access all objects, as I found out a refresh is required. Here is how I solved it: 1. Log out of vC and login with the default admin account “[email protected]” 2. In the Home view, select “Administration” from the menu 3. Go to Global Permissions, remove the user (In my case vAdmin) 4. Click on “Add Permission” 5. Select your AD domain and select the correct user 6. Click on Ok 7. Check the list to see whether your user is added with the correct role (administrator). 8. Logout and login with the correct AD user. 9. Back to work. Time for me to power on these servers again. Follow Frank on twitter @frankdenneman

Tue, Mar 1, 2016 vmware

INSIGHTS INTO VM DENSITY

For the last 3 months my main focus within PernixData has been (and still is) the PernixCloud program. In short PernixData Cloud is the next logical progression of PernixData Architect and provides visibility into and analytics of virtual datacenters, it’s infrastructure, and it’s applications. By providing facts on the various elements of the virtual infrastructure, architects and administrators can design their environment in a data-driven way. The previous article “Insights into CPU and Memory configurations of ESXi hosts” zoomed in to the compute configuration of 8000 ESXi hosts and helped us understand which is the most popular system in today’s datacenter running VMware vSphere. The obvious next step was to determine the average number of virtual machines on these systems. Since that time the dataset has expanded and it now contains data of more than 25.000 ESXi hosts. An incredible dataset to explore I can tell you and it’s growing each day. Learning how to deal with these vast quantities of data is incredibly interesting. Extracting various metrics from a dataset this big is challenging. Most commercial tools are not designed to cope with this amount of data, thus you have to custom build everything. And with the dataset growing at a rapid pace, you are constantly exploring the boundaries of what’s possible with software and hardware. VM Density After learning what system configurations are popular, you immediately wonder how many virtual machines are running on that system. But what level of detail do you want to know? Will this info be useable for architects and administrators to compare their systems and practical to help them design their new datacenter? One of the most sought after question is the virtual CPU to physical CPU ratio. A very interesting one, but unfortunately to get a result that is actually meaningful you have to take multiple metrics into account. Sure you can map out the vCPU to pCPU ratio, but how do you deal with the fact of oversizing of virtual machines that has been happing since the birth of virtualization? What about all these countless discussions whether the system only needs a single or double CPU because it’s running a single threaded program? How many times have you heard the remark that the vendor explicitly states that the software requires at least 8 CPU’s? Therefor you need to add utilization of CPU to get an accurate view, which in turn leads to the question what timeframe you need to use to understand whether the VM is accurately sized or whether the vCPUs are just idling most of the time? You are now mixing static data (inventory) and transient data (utilization). Same story applies for memory. In consequence I focused just on the density of virtual machines per host. The whole premise of virtualization is to exploit the variation of activity of applications, combined with distribution mechanisms as DRS and VMturbo you can argue that virtual and physical compute configurations will be matched properly. Therefor it’s interesting to see how far datacenters stretch their systems and understand the consolidation ratio of virtual machines. Can we determine a sweet spot of the number of virtual machines per host? The numbers Discovered earlier, dual socket systems are the most popular system configuration in the virtual datacenters, therefor I focused on these systems only. With the dataset now containing more than 25.000 ESXi hosts, it’s interesting to see what the popular CPU types are. The popular systems contained in total 12, 16, 20 and 24 cores. Therefor the popular CPU’s of today are 6, 8, 10 and 12 cores. But since we typically see a host as a “closed” system and trust on the host local CPU scheduler to distribute the vCPUs amongst the available pCPUs, all charts use the total cores per system instead of on a per-CPU basis. For example a 16 cores system is ESXi host containing two 8 cores CPUs. Before selecting a subset of CPU configurations let’s determine the overall distribution of VM density. Interesting to see that it’s all across the board, VM density ranging from 0-10 VM’s per host up to more than 250. There were some outliers, but I haven’t included them. One system runs over 580 VM’s, this system contains 16 cores and 192 GB. Let’s dissect the VM density per CPU config. Dissecting it per CPU configuration Instead of focusing on all dual socket CPU configurations, I narrowed it down to three popular configurations. The 16 core config as it’s the most popular today, and the 20 to 24 core as I expect this to be the configuration as the default choice for new systems this year. This allows us to compare the current systems in today’s datacenter to the average number and help you to understand what VM density possible future systems run. Memory Since host memory is an integral part of providing performance to virtual machines, it’s only logical to determine VM density based on CPU and Memory configurations. What is the distribution of memory configuration of dual socket systems in today’s virtual datacenters? Memory config and VM density of 16 cores systems 30% of all 16 cores ESXi hosts is equipped with 384GB of memory. Within this configuration, 21 to 30 VMs is the most popular VM density. Memory config and VM density of 20 cores systems 50% of all 20 cores ESXi hosts is equipped with 256GB of memory. Within this configuration, 31 to 40 VMs is the most popular VM density. Interesting to see that these systems, on average, have to cope with less memory per core than the 16 cores system (24GB per core versus 12,8GB per core) Memory config and VM density of 24 cores systems 39% of all 24 cores ESXi hosts is equipped with 384GB of memory. Within this configuration, 101 to 150 VMs is the most popular VM density. 101 to 150 VM’s sound like a VDI platform usage. Are these systems the sweetspot for virtual desktop environments? Conclusion Not only do we have an actual data on VM density now, other interesting facts were discovered as well. When I was crunching these numbers one thing that stood out to me was memory configurations used. Most architects I speak with tend to configure the hosts with as much memory as possible and swap out the systems when their financial lifespan has ended. However I seen some interesting facts, for example memory configurations such as 104 GB or 136 GB per system. How do you even get 104GB of memory in such a system, did someone actually found a 4GB DIMM laying around and decided to stick it in th system? More memory = better performance right? Please ready my memory deepdive series on how this hurts your overall performance. But I digress. Another interesting fact is that 4% of all 24 cores systems in our database are equipped with 128GB of memory. That is an average of 5,3 GB per core, 64GB per NUMA node. Which immediately raises questions such as average host memory per VM or VM density per NUMA node. The more we look at data, the more questions arise. Please let me know what questions you have!

Mon, Feb 15, 2016 miscellaneous

INSIGHTS INTO CPU AND MEMORY CONFIGURATION OF ESXI HOSTS

Recently Satyam Vaghani wrote about PernixData cloud. In short PernixData Cloud is the next logical progression of PernixData Architect and provide visibility and analytics around virtual datacenters, it’s infrastructure and applications. As a former architect I love it. The most common question asked by customers around the world was how other companies are running and designing their virtual datacenters. Which systems do they use and how do these system perform with similar workload? Many architects struggle with justifying their bill of materials list when designing their virtual infrastructure. Or even worse getting the budget. Who hasn’t heard the reply when suggesting their hardware configuration: “you want to build a Ferrari, a Mercedes is good enough”. With PernixData Cloud you will be able to show trends in the datacenter, popularity of particular hardware and application details. It let you start ahead of the curve, aligned with the current datacenter trends instead of trailing. Of course I can’t go into detail as we are still developing the solution, but I can occasionally provide a glimpse of what we are seeing so far. For the last couple of days I’ve been using a part of the dataset and queried 8000 hosts on their CPU, memory and ESXi build configuration to get insight in popularity of particular host configurations. CPU socket configuration I was curious about the distribution of CPU socket configurations. After analyzing the dataset it is clear that dual socket CPU configurations are the most popular setup. Although single CPU socket configuration are more common than quad CPU socket in the dataset, quad core are more geared towards running real world workload while single CPU configurations are typically test/dev/lab servers. Therefor the focus will primarily on dual CPU socket systems and partially quad CPU sockets systems. The outlier of this dataset is the 8 socket servers. Interesting enough some of these are chuck-full with options. Some of them were equipped with 15 core CPU’s. 120 CPU cores per host, talk about CPU power! CPU core distribution What about the CPU core popularity? The most popular configuration is 16 cores per ESXi host, but without the context of CPU sockets one can only guess which CPU configuration is the most popular. Core distribution of dual CPU socket ESXi hosts When zooming in to the dataset of dual CPU socket ESXi host, it becomes clear that 8 Core CPU’s are the most popular. I compared it with an earlier dataset and quad and six core systems are slowly reducing popularity. Six core CPU’s were introduced in 2010, assumable most will be up for a refresh in 2016. I intend to track the CPU configurations to provide trend analysis on popular CPU configurations in 2016. Core count quad socket CPU systems What about quad socket CPU systems? Which CPU configuration is the most populair? It turns out that CPU’s containing 10 cores are the sweetspot when it comes to configuring a Quad core CPU system. Memory configuration Getting insights into memory configuration of the servers provides us a clear picture of the compute power of these systems. What is the most popular memory configuration of dual socket server? As it turns out 256 and 384 GB are the most memory popular configuration. Today’s servers are getting beefy! Zooming into the dataset quering the memory configuration of dual socket 8 core servers, the memory configuration distribution is as follows: What about the memory configuration of quad CPU servers? NUMA 512 GB is the most popular memory configuration for quad CPU socket ESXi host. Assuming the servers are configured properly, this configuration is providing the same amount of memory to each NUMA node of the systems. The most popular NUMA node is configured with 128 GB in both dual and quad CPU socket systems. ESXi version distribution I was also curious about the distribution of ESXi versions amongst the dual and quad CPU socket systems. It turns out that 5.1.0 is the most popular ESXi version for dual CPU systems, while most Quad CPU socket machines have ESXi version 5.5 installed More To Come Satyam and I hope to publish more results from our dataset in the coming months. The dataset is expanding rapidly, increasing the insights of the datacenters around the globe. And we hope to cover other dimensions like applications and the virtualization layer itself. Please feel free to send me interesting questions you might have for the planet’s datacenters and we’ll see what we can do. Follow me on twitter @frankdenneman

Wed, Dec 23, 2015 miscellaneous

WHEN YOUR HOME LAB TURNS INTO A HOME DC

A little bit over a year ago I decide to update my lab and build two servers. My old lab had plenty of compute power, however they were lacking bandwidth, 3 Gbit/s SATA and 1 Gb network bandwidth. I turned to one of the masters of building a home lab, Erik Bussink, and we thought that the following configuration was sufficient to handle my needs. Overview Component Type Cost CPU Intel Xeon E5 1650 v2 540 EUR CPU Cooler Noctua NH-U9DX i4 67 EUR Motherboard SuperMicro X9SRH-7TF 482 EUR Memory Kingston ValueRAM KVR16R11D4/16HA 569 EUR SSD Intel DC 3700 100GB 203 EUR Power Supply Corsair RM550 90 EUR Case Fractal Design Define R4 95 EUR Price per Server (without disks) 1843 EUR The systems are great, but really quickly I started to hit some limitations. Limitations that I have addressed in the last year, and that are interesting enough to share. Adding a third host As FVP is a scale out clustered platform, having two hosts to test with simply just don’t cut it. For big scale out testing I use nested ESXi but to do simple tests I just needed one more host. The challenged I faced was the dilemma of investing in “old” tech or going with new hardware. Intel updated their Xeon line to version 3, the Intel Xeon v3 has more cache (from 12MB to 15MB) more memory bandwidth, increased max memory support up to 768GB and uses DDR4 memory. (Intel ark comparison) New shiny hardware might be better, but the main goal is to expand my cluster and one of the things I believe in is a uniform host configuration within a cluster. Time is a precious resource and the last thing I want to spend time on is to troubleshoot behavior that is caused by using non-uniform hardware. You might win some time by having a little bit more cache, more memory bandwidth but once you need to troubleshoot weird behavior you lose a lot more. A dilemma is not a dilemma if you go back and forth between the options, thus I researched if it was possible. The most predominant one is the change in CPU microarchitecture. The v3 is part of the new Intel Haswell microarchitecture. The Xeon v2 is build upon the Ivy Bridge. That means that the cluster has to run in EVC mode Ivy Bridge. The EVC dialog box of the cluster indicates that Haswell chips are supported in this EVC mode, thus DRS functionality remains available if I go for the Haswell chip The Haswell chip uses DDR4 memory and that means different memory timings and different memory bandwidth. FVP can use memory as a storage I/O acceleration resource and a lot of testing will be done with memory. That means that applications can behave differently when FVP decides to replicate fault tolerant writes to the DDR4 host or vice versa. In itself it’s a very interesting test, thus again another dilemma is faced. However, these tests are quite unique and I rather have uniform performance across the cluster and avoid any troubleshooting behavior due to hardware disparity. Due to the difference in memory type, a new Motherboard is required too. That meant that I have to find a motherboard that contains the same chipsets and network configuration. The SuperMicro X9SRH-7TF rocks. Onboard 10 GbE is excellent. Some other users in the community have reported overheating problems, Erik Bussink was hit hard by the overheat problem and bought another board just to get rid of weird errors caused by the overheating. That by itself made me wonder if I would buy another X9SRH-7TF or go for a new Supermicro board and buy a separate Intel X540-T2 dual port 10GbE NIC to get the same connectivity levels. After weighing the pros and cons I decided to go for the uniform cluster configuration. Primarily because testing and understanding software behavior is hard enough. Second-guessing whether behavior is caused by the hardware disparity is a time sucking beast and even worse, it typically kills a lot of joy in your work. Contrary to popular belief, prices of older hardware does not decline forever, due to availability of newer hardware and remaining stock, prices go up. The third host was almost 500 Euro’s, more expensive than the previous price I had to pay. Networking Networking is interesting as I changed a lot during the last year. The hosts are now equipped with an Intel PRO 1000 PT dual ports with the 82571 chip. Contrary to my initial post these are supported by vSphere 5.5. However network behavior is a large part of understanding scale out architectures, thus more NIC ports are needed. An additional HP NC365T Quad-port Ethernet Server Adapter was placed in each server. The HP NIC is based on the Intel 82580 chipset but is a lot cheaper than buying Intel branded cards. Each host has one NIC dedicated to IPMI, 2 10GbE ports and 6 1GbE. In hindsight, I would rather go for two Quad NIC cards as it allows me to setup different network configurations without having to tear them down each time. With the introduction of the third host I had to buy a 10GbE switch. The two host were directly connected to each other, however this configuration is not possible with three hosts. Thus I had to look for a nice cheap 10GbE switch that doesn’t break the bank and is quiet. Most 10GbE switches are made for the data center where noise isn’t really a big issue. My home lab is located in my home office, spending most of my day with something that sounds like a jet plane is not my idea of fun. The NETGEAR ProSafe Plus XS708E 8-port 10-Gigabit fit most of my needs. 8 ports for less than 900 euro’s, it’s kind of a steal compared to the alternatives. However I wasn’t really impressed by the noise levels (and spending 900 euro’s but that’s a different story). Again my main go-to-guy for all hardware related questions Erik Bussink provided the solution, the Noctua NF-A4-x10 FLX coolers. Designed to fit into 1U boxes they were perfect. But as you can see the design of the Netgear is a bit weird. The coolers are positioned at the far end of the PCB with all the heatsinks. When the switch is properly loaded, the thing emits a lot of heat. Regardless of what type of internal fan is used. To avoid heat buildup in the switch I used simple physics, but I will come to that later. Now having three hosts with seven 1 GbE connections, two storage systems eating up 3 ports and an uplink to the rest of the network I needed a proper switch. Lessons learned in that area, research thoroughly before pressing the buy button. I started of with buying an HP 1810-24G v2 switch. Silent, 24 ports, VLAN support. Awesome! No not awesome because it couldn’t route VLANs. And to the observant reader, 25 ports required, 24 ports offered. A VCDX’esque-like constraint. To work around the 24 ports limitation I changed my network design and wrote some scripts to build and tear down different network configurations. Not optimal, but dealing with home labs is almost like the real world. While testing network behavior and hitting the VMkernel network stack routing problem I decided it was time to upgrade my network with some proper equipment. I asked around in the community and a lot where using the Cisco SG300 series switch. Craig Kilborn on twitter blogged about his HP v1910 24G and told me that it was quite noisy. A Noctua hack might do the trick, but I actually wanted some more ports than 24. Erik pointed out the Cisco SG500-28-K9-G5 switches that are stackable and fanless. Perfect! I could finally use all the NICs in my servers and have room for some expansion. Time for a new rack So from this point on I have three 19” sized switches, the IKEA lack hack table was nice, but these babies deserved better. The third host didn’t fit the table therefor new furniture had to be bought anyways. After spending countless of hours looking at 19” racks I came across a 6U Patch case. This case had a lockable glass door (kids) and removable side panels, perfect for my little physics experiment. Just place the case in an upright position, remove the side panels and let the heat escape from the top. The fans will suck in “cold” air from the bottom. The dimensions of the patch case were perfect as it fitted exactly in my setup. The case is an Alfaco 19-6406. But with this networking equipment I’m feeling that my home lab is slowly turning into a #HomeDC. With all this compute and network power I wanted to see what you can do when you have enterprise grade flash devices. I’m already using the Intel DC S3700 SSD’s and I’m very impressed by their consistent high performance. However Intel has released the Intel SSD DC p3700 PCIe card that use NVMe. I turned to Intel and they were so generous of loaning me three of these beasts for a couple of months. The results are extremely impressive, soon I will post some cool test results, but imagine seeing more than 500.000 IOPS in your homeDC on a daily basis. Management server To keep the power bill as low as possible, all three hosts are shutdown after testing, but I would like to have the basic management VMs running. In order to do this, I used a Mac Mini. William wrote extensively about how to install ESXi on a Mac, if you are interested I would recommend to check out his work: http://www.virtuallyghetto.com/apple. Unfortunately 16Gb is quite limited when you are running three windows VMs with SQL DB’s, therefor I might expand my management cluster by adding another Mac Mini. Time to find me some additional sponsors. :)

Thu, Jun 4, 2015 home-lab vmware

BALLOONING, QUEUE DEPTHS AND OTHER BACK PRESSURE FEATURES REVISITED

Recently I’ve been involved in a couple conversations about ballooning, QoS and queue depths. Remarks like ballooning is bad, increase the queue depths and use QoS are just the sound bits that spark the conversation. What I learn from these conversations is that it seems we might have lost track of the original intention of these features. Hypervisor resource management Features such as ballooning and queue depths are invented to solve the gap between resource demand and resource availability. When a system experiences a state where resource demand exceeds the resources it controls the system has a problem. This is especially true in systems such as a hypervisor where you cannot control the demand directly. A guest operating system or an application in a virtual machine can demand a lot of resources. The resource management schedulers inside the hypervisor are tasked to fulfilling the demand of that particular machine while at the same time satisfy the resource demand of other virtual machines. Typically the guest OS resource schedulers are not integrated with the hypervisor resource schedulers, this can lead to a situation in which the administrator typically resorts to taking draconian measures. Measures such as disabling ballooning, increasing the queue-depth to become the digital equivalent of the Mariana trench. Sometimes it is taken for granted, but resource management inside the hypervisor is actually a though challenge to solve. Much research is done on solving this problem; a lot of research papers trying to find an answer to this challenge are published on a monthly basis. Let’s step back and take a look from an engineer perspective (or should I say developer?) and see what the problem is and how to solve it in the most elegant way. It’s an architect job to understand that this functionality is not a replacement for a proper design. Let’s start by understanding the problem. Load-shedding or back pressure When dealing with the situation where resource demand exceeds resource availability you can do two things. Well if you don’t do anything, it’s likely to encounter a system failure that can affect a lot more than only that particular resource or virtual machines. Overall you don’t design for system failure, you want to avoid it and to do so you can either drop the load or apply some form of back pressure. I think we all agree that dropping load, sometimes referred to as load-shedding is not the most elegant way of dealing with temporary overload, that’s why a lot of effort is going into back pressure features. A back pressure feature that everyone is familiar with is the memory balloon driver. Guest OS memory schedulers deal with used and free memory in such a way that this is transparent to the hypervisor. When the hypervisor is running out of physical machine memory it needs to figure out a way to retrieve memory. By using the balloon driver, the hypervisor asks the guest OS memory scheduler to provide a list of pages that it doesn’t use or doesn’t deem as important. After getting the info, the hypervisor proceeds to free up the physical memory pages to be able to satisfy incoming memory requests. Instead of dropping the new incoming workload it applies a back pressure system in the most elegant way. I don’t know why people are still talking about ballooning as bad. The feature is awesome, it’s the architect / sys admin job to come up with a plan to avoid back pressure in the system. Again the back pressure feature is not substitute for proper design and management. But the most misunderstood back pressure feature could be queue-depths. Sometimes I hear people refer to queue depths as a performance enhancement. And this is not true. It just allows you to temporarily deal with I/O overload. The best way to have a clear understanding of queue depths is to use the bathroom sink analogy. The drain is the equivalent of the data path leading to the storage array, the sink itself is the queue sitting on top of the data path / drain. The faucet represents the virtual machine workloads. Typically you open up the faucet to a level that allows the drain to cope with the flow of water. Same applies to virtual machine workloads and the underlying storage system. You run an x amount of workload that is suitable for your storage system. The moment you open up the faucet more your sink will fill up and at one point your sink will overflow. Thus you have to do some back pressure mechanism. In the bathroom sink world this typically is done by flowing the water back into the second sink. In the I/O scheduler world this typically resolves in a queue full statement. This typically bogs down the performance of the virtual machine so much that many admins/architect resolve by increasing the queue depth. Because this allows them to avoid the queue full state (temporarily) But in essence you just replaced your bathroom sink by a bigger sink, or something when people go overboard the increase the queue depth to the digital equivalent of a full size bathtub. This bathtub impacts a lot of other workloads as many workload now end up at the top of the queue instead of the deeper part, waiting their turn to go through the drain to the storage system. Result: latency increases in all applications due to improper designed systems. And remember when the bathtub overflows you typically have a bigger mess to deal with. Back pressure features are not a substitute for proper design, therefor think about implementing a bigger drain of even better multiple drains. More bandwidth or just more data paths to the same storage system lead to a short delay of seeing the same back pressure problem again, it just occurs on a different level. Typically when a storage controller fills up its cache, it sends a queue full to all the connected systems, so the problem has now evolved from a system wide problem to a cluster wide problem. This is one of the big reasons why scale out storage systems are a great fit in the virtual datacenter. You create drains to different sewer system typically in a more plan-able manner. If you are looking for more information about this topic, I published a short series on the challenge of traditional storage architectures in virtual datacenters. Quality of Service faces the same predicament. QoS is a great feature dealing with temporary overload, but again, it is not a substitute for a proper design. Back pressure features are great, it allows the system to deal with resource contentions while avoiding system failures. These features are unmissable in the dynamic virtual datacenter of today. When detecting that these features are activated on a frequent basis, one must review the virtual datacenter architecture, the current workloads and future workloads. I think overall it all boils down to understand the workload in your system and have an accurate view of the capabilities of your systems. Proper monitoring and analytics tools are therefore indispensable. Not only for daily operations but also for architects dealing with maintaining a proper service level for their current workloads while architecting an environment that can deal with unknown future workloads.

Wed, May 20, 2015 vmware

VMWARE TOOLS IS OUT OF DATE ON THIS VIRTUAL MACHINE WHILE SUMMARY STATES CURRENT

For some apparent reason all my virtual machines show an alert that the VMware tools is out of date. While the summary states that its running the current version When trying to upgrade the VMware tools, all options are grayed out: It appears to be a cosmetic error but I stil wanted to know why it shows the alert on my virtual machines. As it turns out I created my templates on my management cluster host (lab) and that host runs a newer version of vSphere 5.5 (2068190) My workload cluster host run a slightly older version of vSphere 5.5 and it uses a different version of VMtools. The VMtools version list helps you to identify which version of VMtools installed in your virtual machine belongs to which ESXi version: https://packages.vmware.com/tools/versions Hope this clarifies this weird behavior of the UI for some.

Tue, May 5, 2015 vmware

HAMMER, MAGICBANDS AND CHALLENGING STATUS QUO

One of my favorite business books is the famous book of Michael Hammer “Reengineering the Corporation”. The theme is the book is to take a hard look at business processes and radically change these old and existing processes. Hammer states that typically companies sped up their processes by implementing a newer iteration of existing technology. Many processes are dated before the advent of the computers and just by automating the process it can only optimizes performance marginally. Embedding computers in the archaic processes cannot address their fundamental performance challenges. By understanding what the process is trying to achieve one can break away from the existing design principles of the process. One great example is one of my favorite technologies is the Disney’s MagicBand and the way it’s used to radically change processes. [caption id=“attachment_5292” align=“aligncenter” width=“480”] Photo by Adam Voorhes / wired.com[/caption] Disney’s MagicBand The MagicBand is what is commonly referred to as a wearable. Inside the bracelet are a RFID chip and a radio. The parks have long range and short-range scanners along with sensors to interact with the MagicBand bracelet. One of the perks of my job is to speak to people who deliver cutting edge technology and as you can imagine I was very thrilled to speak to some of the team members who worked on the MagicBand platform. In essence the Disney MagicBand replaces every transaction between the customer and cast members or Disney parks and resorts. The MagicBand becomes your key to anything. It allows access to the park, access to the resorts and automatic payment. Its goal is to create a frictionless experience for the customer, increasing the satisfaction, which of course will increase spending. Instead of tinkering with existing processes Disney overhauled a lot of processes and many more are to follow. A great example is the overhaul of the check-in process and how it’s completely inline with the Hammer doctrine. Instead of buying newer faster desktop computers to speed up the check-in process, customers can go directly to their hotel room, bypassing the check in process all together. Disney’s sends the MagicBand to your home and prior to your visit the resort informs you via email which room is yours for your stay. Just walk up to your room and unlock the door by taping your MagicBand against the sensor on the door. MagicBands allows Disney to get rid of the ancient turnstiles that is the first port of anxiety of new parents and their strollers, instead of being funnelled into narrow cramped isles the entrance is now in the shape of a inviting V shape form with incredible process speeds. Just hold your MagicBand to the access point and wait to be greeted by an welcoming green glow, from then on its off to your favorite experience. The MagicBand platform allows for new experiences as well. What about having a more personalized interaction with cast members? What if Cinderella greets your daughter by name and tells her that she knows she is her favorite princess or that she wishes her a happy birthday? Just mind-blowing and an experience she will never forget. Range scanners, sensors, WIFI, smart phones apps, user profiles and an insane amount of data crunching make this happen. Customers use the smart phone app to access their schedules and their user profiles. Countless short and long-range sensors scattered across the park pick up the signals of the bracelets. These systems are connected to each other, they collect data, and they use the captured data to optimize the experience of the customers. Data on visitors traffic flow, food orders and waiting times can be used to realign internal resources. Ever heard about Internet of things? This is the poster child of Internet of things. Stop! Hammer Time Circling back to the opening statement, using newer iteration of existing technology only provide marginally performance increases. Advances like the MagicBand require new technologies and new ways to operationalize these technologies in datacenters. Not every company is of the same size as Disney, but one thing is certain, most companies face the same challenge Disney has. How to reduce cost, increase efficiency and provide a new experience that makes them unique in a highly competitive market? A lot of brilliant people are trying to solve these problems by creating technical solutions and it’s up to the IT team to understand if these suit their operation models. How can you reengineer your corporation and create a new service offering while your IT is stuck in the past? Stuck using systems that are designed to work in infrastructures dating back to the early ‘70’s? Where they just found out that the market was bigger than 5 computers? Everything has to align, when the business changes it models, the IT team should not take anything for granted, they too need to aim for quantum leaps of performance. Markets shift rapidly; IT needs to be able to respond almost in a way that anticipates their needs. Maybe even in a way that it doesn’t feel remarkable at all, high service standards are the norm! Within the realm of virtualized datacenters two technology advancements can create an experience that might provide a ubiquitous experience to the business but provide the magic to the IT team; Scale out storage and object based storage. Scale out storage and object based storage Proper Scale out storage systems allows you to operationalize new advancement in storage technology as soon as they are available. They allow virtual datacenters to cater to any performance requirement possible any time. While, and this is very important, without impact current workloads that are using the same platform. Many application vendors move away from a monolithic application architecture, why keep holding on to the relics of the past by using a monolithic storage architecture for performance requirements? Object based storage, such as VMware VVOLs, allows you to fundamentally change how to provide data services to systems (virtual machines). Instead of creating management constructs and aligning data services to these logical layers (LUNs and datastores), data services can be directly applied to the specific machine. Virtual machines become first class citizens on storage systems, allowing IT teams to cater to requirements that both affect the business as well as the IT team requirements. Challenge status quo Hammer stated don’t ask, “How can we do what we do faster?” But ask; Why do we do it the way we do?” In essence, challenge status quo if you want to keep on moving forward in a time that introduces new technologies and application landscapes on a daily basis!

Thu, Apr 23, 2015 miscellaneous

DB DEEPDIVE PART 6: QUERY PLANS, INTERMEDIATE RESULTS, TEMPDB AND STORAGE PERFORMANCE

Welcome to part 6 of the Database workload characteristics series. Databases are considered to be one of the biggest I/O consumers in the virtual infrastructure. Database operations and database design are a study upon themselves, but I thought it might be interested to take a small peak underneath the surface of database design land. I turned to our resident Database expert Bala Narasimhan, PernixData’s VP of products to provide some insights about the database designs and their I/O preferences. Previous instalments of the series: Part 1 - Database Structures Part 2 – Data pipelines Part 3 - Ancillary structures for tuning databases Part 4 - NoSQL platforms Part 5 - Query Execution Plans In a previous article I introduced the database query optimizer and described how it works. I then used a TPC-H like query and the SQL Server database to explain how to understand the storage requirements of a query via the query optimizer. In today’s article we will deep dive into a specific aspect of query execution that severely impacts storage performance; namely intermediate results processing. For today’s discussion I will use the query optimizer within the PostgreSQL database. The reason I do this is because I want to show you that these problems are not database specific. Instead, they are storage performance problems that all databases run into. In the process I hope to make the point that these storage performance problems are best solved at the infrastructure level as opposed to doing proprietary infrastructure tweaks or rewrites within the database. After a tour of the PostgreSQL optimizer we will go back to SQL Server and talk about a persistent problem regarding intermediate results processing in SQL Server; namely tempdb. We’ll discuss how users have tried to overcome tempdb performance problems to date and introduce a better way. What are intermediate results? Databases perform many different operations such as sorts, aggregations and joins. To the extent possible a database will perform these operations in RAM. Many times the data sets are large enough and the amount of RAM available is limited enough that these operations won’t fully fit in RAM. When this happens these operations will be forced to spill to disk. The data sets that are written to and subsequently read from disk as part of executing operations such as sorts, joins and aggregations are called intermediate results. In today’s article we will use sorting as an example to drive home the point that storage performance is a key requirement for managing intermediate results processing. The use case For today’s example we will use a table called BANK that has two columns ACCTNUM and BALANCE. This table tracks the account numbers in a bank and the balance within each account. The table is created as shown below: Create Table BANK (AcctNum int, Balance int); The query we are going to analyze is one that computes the number of accounts that have a given balance and then provides this information in ascending order by balance. This query is written in SQL as follows: Select count(AcctNum), Balance from BANK GROUP BY Balance ORDER BY Balance; The ORDER BY clause is what will force a sort operation in this query. Specifically we will be sorting on the Balance column. I used the PostgreSQL database to run this query. I loaded approximately 230 million rows into the BANK table. I made sure that the cardinality of the Balance column is very high. Below I have a screenshot from the PostgreSQL optimizer for this query. Note that the query will do a disk based merge sort and will consume approximately 4 GB of disk space to do this sort. A good chunk of the query execution time was spent in the sort operation. A disk-based sort, and other database operations that generate intermediate results, is characterized by large writes of intermediate results followed by reads of those results for further processing. IOPS is therefore a key requirement. What is especially excruciating about the sort operation is that it is a materialization point. What this means is that the query cannot make progress until the sort is finished. You’ve essentially bottlenecked the entire query on the sort and the intermediate results it is processing. There is no better validation of the fact that storage performance is a huge impediment for good query times. What is tempdb? tempdb is a system database within SQL Server that is used for a number of reasons including the processing of intermediate results. This means that if we run the query above against SQL Server the sorting operation will spill intermediate results into tempdb as part of processing. It is no surprise then that tempdb performance is a serious consideration in SQL Server environments. You can read more about tempdb here. How do users manage storage performance for intermediate results including tempdb? Over the last couple of years I’ve talked to a number of SQL Server users about tempdb performance. This is a sore point as far as SQL Server performance goes. One thing I’ve seen customers do to remediate the tempdb performance problem is to host tempdb alone in arrays that have fast media, such as flash, in them in the form of either hybrid arrays or All Flash Arrays (AFA). The thought process is that while the ‘fast array’ is too expensive to standardize on, it makes sense to carve out tempdb alone from it. In this manner, customers look at the ‘fast array’ as a performance band aid for tempdb issues. On the surface this makes sense since an AFA or a hybrid array can provide a performance boost for tempdb. Yet it comes with several challenges. Here are a few:

Thu, Apr 9, 2015 miscellaneous

…