REPEAT SESSION VSPHERE CLUSTERING DEEP DIVE AT VMWORLD EUROPE

Good news for the VMworld attendees who couldn’t sign up anymore for the vSphere Clustering Deep Dive session on Tuesday. I’m happy to announce that the VMworld team scheduled a repeat session for the vSphere Clustering Deep Dive session on Thursday 08 November at 10:30 to 11:30. Session Outline In this session, Duncan and Frank will take you through the trenches of VMware vSphere Distributed Resource Scheduler (DRS) and vSphere High Availability (HA). Find out about options to optimize your DRS settings for your specific requirements and goals, such as if you should be load balancing on active or consumed memory, as well as what has recently changed in the DRS algorithm and if it will impact DRS behavior. And for vSphere HA, you will learn about when it restarts virtual machines (VMs), what kind of restart times to expect, and where you can find evidence that a VM (or multiple) have been restarted. You will find out about all of these items and more. Prepare to dive deep, as the basics will not be covered. Don’t wait too long with registering, VMworld Europe room sizes max out at 400 people. We hope to see you there!

COMPUTE POLICY IN VMWARE CLOUD ON AWS

The latest update of VMware Cloud on AWS introduced a new feature called compute policies. In its initial release, the compute policies provide the ability to configure affinity rules and mobility control based of declarative policies and vSphere tags. Management of affinity rules Historically, affinity rules are a part of the cluster configuration. Within VMware Cloud on AWS, cluster configuration is controlled by VMware and thus customers cannot set affinity rules for virtual machines running within the SDDC. Instead of merely pulling the affinity rules configuration outside the cluster configuration, we decided to improve the affinity functionality and work towards a more uniform and consistent experience across multiple clouds. The road to declarative policies Within a declarative system, you describe what you want to happen. This is the opposite of imperative operations where you specify actions. Declarative commands define state and to some extent affinity rules are declarative statements. Let’s take VM anti-affinity rules as an example. You want to keep VM1 and VM2 separated and keep them in different fault domains. Instead of providing imperative actions of pinning VM1 to host A and pinning VM2 to host B, you create an anti-affinity rule with VM1 and VM2 as members. You state that these two VMs should not run on the same ESXi host. vCenter (DRS) controls placement and takes the necessary actions to solve any violations of this intent. We want to apply this model to other features. Instead of logging into vCenter to deal with configuration issues, and manually correct the situation, we want vCenter to manage the functions of your behalf. The way you interact with vCenter, in this more declarative way, is with policies. Instead of specifying more detailed imperative actions, you would declare your intent and the only thing you want to monitor after that is whether the policy is compliant or not. We have to start somewhere, thus we concentrated on affinity rules (VM-VM and VM-host) and anti-mobility (vMotion disabled) policies. Once we have this more abstract way of interacting with vCenter Server, it provides more advantages. One of them is an additional level of abstraction. And abstraction allows for a more uniform and consistent experience across multiple clouds. With today’s ability on-prem setup, you configure your cluster for a particular workload and this could inhibit the ability to move your workload to another cluster, on-prem or even to the cloud. To make sure you can easily burst out to VMware Cloud environments, you want this to be seamless. The directions where we are going to is that you do not need to have configurations that are specific to on-prem clusters and in-cloud or at-edge clusters. But ideally you express what you want and it should be the job of the cloud control plane, such as vCenter, to push this configuration to the environment the workload is presently in. So that could be to an on-prem cluster or an in-cloud cluster. Compute policies are active at vCenter level Due to this model, the rules are decoupled from cluster level and are now managed at vCenter level. If you would configure a VM-VM anti-affinity rule and you would move the VMs to another cluster, the policy remains active. At the time of writing, VMware Cloud on AWS allows the customer to create 10 clusters per SDDC. Clusters can span multiple AWS availability zones (AZs). The VM-Host affinity ruleset allows customers to tag the hosts per AZ and tag the VMs that needs to remain in that availability zone. You can move the VMs to hosts between clusters within the same AZ, the compute policy remains active while vCenter ensures the compliance of the rule. Introduction of firm rules An interesting fact is that the VM-Host rules are firm rules, these firm rules differ from the traditional soft (should run on) and hard (must run on). They sit in between these two rules. DRS cannot violate these rules, only if the host is placed in maintenance mode. This ensures that during normal operations the rules are never broken while providing VMware the ability to service the SDDC. The only time a host is placed into maintenance mode in VMware Cloud on AWS is during upgrades which are handled by VMware and well communicated before the service window. This allows the customer to generate a strategy for these virtual machines well ahead before the service window. In the next article, I will go through the steps on how to create a compute policy.

MY NEW ROLE

A couple of months ago I joined the Office of the CTO of the Cloud Platform Business Unit and started reporting directly to the CTO, Kit Colbert. Kit asked me to select a few areas to focus on. One of these areas is running Kubernetes on vSphere. I’ve increased my focus on Kubernetes, as this architecture becomes increasingly important in the datacenter. When talking to customers, two questions I ask is, what is the current ratio of VMs to containers in your data center and what is the most popular format of deployment today? The common response is respectively 90% VMs and net-new is 90% containers. Today’s trend moves away from installing shrink-wrapped software and more towards custom building revenue-critical applications by their development teams. The standard tool for developers is container-based infrastructure. Kubernetes is the defacto choice of orchestration of containers and consists of many infrastructure-focused options. The operations that interest me are the high availability and resource management operations. It appears these operations replace HA and DRS processes when glancing over them, but when looking more closely they strongly augment each other. At VMworld in Las Vegas, Michael Gasch and I presented the session “Deep Dive: The Value of Running Kubernetes on vSphere” (CNA1553BU). If you are not going to VMworld Europe, I recommend watching the video recording, if you are going to VMworld Europe I recommend you to sign up. One thing you can expect from me is more Kubernetes focused articles. One of the things that I noticed is that many articles are written by cloud-native natives for cloud-native natives. I.e. they rely on extensive previous exposure to this ecosystem. I’m trying to cover some of the challenges I have faced and the quirks I notice as a “newcomer”.

HELP US MAKE VMOTION EVEN BETTER

The vMotion product team is looking for input on how to improve vMotion. vMotion has proven to be a paradigm shift of datacenter management. Workload mobility is a must-have requirement in today’s datacenter operational model. vMotion handles the majority of workload flawlessly. However, there are some corner cases that introduce some challenges. The vMotion product team is interested in these corner cases, to improve the vMotion architecture bringing workload mobility to all workloads everywhere.

TERMINAL AFFINITY POLL

We are looking into the combination of licensed workload and hard-affinity rules (Must run on rule). If you deploy this in your environment right now, how do you deal with this during maintenance hours? Your input helps in shaping future features. (Scroll down in the survey window to access the done button to submit your response)

SIX INTERESTING KUBERNETES SESSIONS AT VMWORLD 2018

This year VMworld provided a broad selection of talks focusing on various forms of Kubernetes. Which is not surprising at all. Many organizations move away from buying and installing shrink-wrapped software and move towards in-house built custom applications. And what is the modern developer tool of choice? For many, it is the container. It’s expected to have 1.5 Billion containers shipped by the end of 2021. Containers are nothing more than a new format of virtualized workload. Michael Gasch explains it very well in our session Deep Dive: The Value of Running Kubernetes on vSphere (CNA1553BU), where containers are task structs in the Linux kernel, not very different than executing an LS command. Well, a bit more than that as containers require CPU, memory, network, storage, and security. Containers satisfy the developers’ need for speed, and they remove dependencies on underlying operating systems. When deploying massive amounts of containers, you need a container management platform, and Kubernetes is clearly the defacto standard in the industry. Source: [Cloud Native Computing Foundation](source: https://www.cncf.io/blog/2017-06-28-survey-shows-kubernetes-leading-orchestration-platform/) For the infrastructure team, running Kubernetes can provide a way to create an infrastructure agnostic platform. That is, it can run on any cloud. VMware is fully vested in making this happen; you can run containers natively (VIC), containers and Kubernetes in Linux VMs on vSphere. Pivotal Container Service (PKS) on-prem or in-cloud that helps customer deploy and operationalize day 1 and day 2 kubernetes solution and VMware Kubernetes Engine (VKE) (Kubernetes as a Service) for organizations who want to consume Kubernetes without owning, building or operationalizing any infrastructure. I’ve selected a few VMworld sessions that cover these container consumption models. There are many more, and please check them out at the VMworld On-Demand Video Library. Container and Kubernetes 101 for vSphere Admins (CNA1564BU) A very popular session at VMworld was the 101 session for vSphere Admins. Nathan Ness and Sachin Thatte go over the basics of Container, Kubernetes and Pivotal Container Services. A very helpful primer for the rest of the listed videos. (Watch here) Running Kubernetes on vSphere Deep Dive: The Value of Running Kubernetes on vSphere (CNA1553BU) Michael Gasch (Resident Kubernetes Expert at VMware) and I go over the reasons why vSphere and Kubernetes are better together. We provide guidelines on how to successfully run your Kubernetes environment. (Watch Here) A Deep Dive on Why Storage Matters in a Cloud-Native World (HCI1813BU) 7 out of 10 applications that run in containers are stateful applications (source: Datadog), you want to provide persistent storage. Myles and Tushar talk about project Hatchway and provide a preview of the upcoming Cloud Native Storage (CNS) Control plane. (Watch Here) Operating and Managing Kubernetes on Day 2 with PKS (CNA1075BU) If you are planning to run large-scale kubernetes deployments on-prem, you should consider Pivotal Container Service (PKS). PKS allows you to deploy multiple kubernetes clusters quite easily. Thomas Kraus and Merlin Glynn show how to tackle day 2 operations and review SDDC products, such as vRealize and Wavefront, that integrates with PKS. (Watch Here) VMware Kubernetes Engine VMware Kubernetes Engine (VKE) offers a turn-key solution of managed Kubernetes clusters that run natively on AWS. Not in VMware Cloud on AWS, not on vSphere, pure native EC2! Plans are to run VKE at multiple cloud providers, allowing you to create environments that no-other cloud provider themselves can provide. Think about an HA cluster spanning both AWS and Azure. However, we are not that far right now, but it is interesting to take a look at what VKE is and how Smart Clusters will change the way you will operate Kubernetes. Intro to VMware Kubernetes Engine-Managed K8s Service on Public Cloud (CNA2084BU) Tom and Valentina go over the concepts and customer value of VKE, including a nice demo. (Watch Here) Deep Dive: VMware Kubernetes Engine-K8s as a Service on Public Cloud (CNA3124BU) After getting familiar with VKE, I recommend to watch the session of Tom and Alain. They dive deeper into the concept of Smart Clusters. (Watch Here) I hope you enjoy watching these sessions, please leave a comment about sessions you think are worth watching.

TECH PAPER DRS ENHANCEMENTS IN VSPHERE 6.7

During VMworld, the DRS performance team released a new tech paper covering the DRS Enhancements in vSphere 6.7. It’s a short white paper uncovering the interesting improvements made to DRS. Download it here.

CATCH ME AT VMWORLD 2018

Two weeks left before the biggest VMware show is happening again, and I can’t wait for it to start. The last eight years I’ve been going to both the US and European show, and both have their own charm. But there is one thing that every VMware community member should experience, and that is the US welcome reception in the solution exchange on Sunday night. Almost every attendee in one big room, the buzz is just phenomenal. I recently joined Kit Colbert’s team, the CTO of Cloud Platform business unit. In my new role, I work on upcoming products and influence their strategy. One project I focus on is how VMware can help customers to run Kubernetes successfully on vSphere. Please reach out to me at VMworld if you have ideas or feedback. Luckily I will be presenting a few sessions this year as well, and I hope to see you there: VIN1249BU vSphere Clustering Deep Dive, Part 1: vSphere HA and DRS 2018-08-27 12:30 PM The legendary session is back, Duncan and I talking about vSphere 6.7 HA and DRS. There is so much to tell, but we are hoping to keep some time open for some questions. CNA1553BU Deep Dive: The Value of Running Kubernetes on vSphere 2018-08-27 3:30 PM I’m so much looking forward to this session, together with Michael Gasch, our resident Kubernetes expert, and popular Kubecon speaker. In this session, we will go over the reasons why vSphere and Kubernetes are better together and provide you with some guidelines on how to successfully run your kubernetes environment. VIN2256BU Tech Preview: The Road to a Declarative Compute Control Plane 2018-08-28 12:30 PM I tweeted about every session on this list except this one. The reason why I had to keep quiet about this session is that we are showing some NDA stuff. In this session, Maarten Wiggers and I look at the changes that are happening in the industry. Most companies develop their strategic apps in-house, impacting the role of the VI-admin. We will go over the transformation from VI-admin to Site Reliability Engineering. With new technologies and different Life Cycle Management strategies, different ways of managing applications and infrastructure are necessary. We go over the changes from an infrastructure that responds to Imperative statements to an environment that is controlled by declarative statements. Within the software-defined data center (SDDC), VMware vSphere offers two declarative control planes: one for networking and one for storage. However, there is no declarative control plane for compute yet. We will tech preview the capabilities introduced in the VMware Cloud SDDC as a path to achieve that goal. VIN1738BU vSphere Host Resources Deep Dive: Part 3 2018-08-29 2:00 PM The third edition of the vSphere Host Resources Deep Dive. The vSphere platform is designed to run most workloads at near bare-metal performance. More than enough for more than 95% of the workload. But what if you need to squeeze out that last bit of performance? How can you do it and how will it impact the rest of the system? Please join Niels and me on Wednesday at 2:00 PM.

VSPHERE 6.X DEEP DIVE RESOURCE KIT COMPLETED

The new version of the vSphere clustering deep dive is available on Amazon. The vSphere 6.7 Clustering Deep Dive is the fourth edition of the best selling series. Over 50.000 clustering deep dive books have been distributed, and I hope this version will find its way on your desk. The new version of the clustering deep dive covers HA, DRS, Storage DRS, Storage I/O Control and Network I/O Control. In the last part of the book, we bring all the theory together and apply it to create and describe a stretched cluster configuration. Now, why am I using the title vSphere 6.x Deep Dive Resource Kit? Well, it’s because we believe that when you pair this with the vSphere 6.5 Host Resource Deep Dive book, you get this bundle that allows you to understand the core of your virtual infrastructure. Changing the Game When Duncan and I set out to write the 4.1 HA and DRS deep dive, we wanted to change the content of technical books. Instead of having a collection of screenshots paired with the text, next, next finish, we wanted to provide a thorough explanation of what happens under the cover. When you push this button, this happens in the code. By uncovering the inside, we arm the administrator and architect with the knowledge to create or troubleshoot any architecture anywhere. When combining these books together, it creates a real end-to-end guide for your architecture. For example, in the DRS section, we explain how the cluster determines the resource entitlement of the VMs in a resource pool. In the vSphere 6.5 Host resource deep dive, we describe the inner workings of the memory and CPU scheduler and how they allocate the physical resources based on the resource entitlement of the VM. Back Side of the Book When releasing the host resource deep dive, we came up with a cool little logo of a divers helmet. If you want to get deep, you need more than a snorkel. One divers helmet to explore the host, but in the cluster deep dive, we cover multiple hosts, grouped in a cluster. What do you need when you need a lot of people to explore the deep? You need a submarine! ;) It might even end up on some T-shirt. New Name on the Cover As you might have noticed, a new name appears on the cover. We asked Niels Hagoort to help us to cover the quality of service aspect of the book. Niels dove into the deeps of Storage I/O Control and Network I/O Control and created an excellent addition to the book. Foreword And last but not least, the foreword. In the previous books, industry luminaries generously provided us with amazing forewords. This time we looked at the community. We asked Chris Wahl to write the introduction. Chris has been an early supporter of the book series, and he has helped the community in many ways. We asked him to provide us with his point of view. I hope you enjoy the book as much as we enjoyed writing it.

HOTDOG-NOT HOTDOG: THE SDDC OF VMWARE CLOUD ON AWS

Yesterday, Kenneth Hui was on stage at the VTUG providing his personal opinion about VMware Cloud on AWS. The reason I say personal is that he forgot to remove the Rubrik Logo’s from his slide (I checked with Rubrik). On one slide he mentions that the SDDC, that is the Software Defined Data Center provided by VMware Cloud on AWS (VMC) is not an SDDC out of the box. And to me, that sounds a bit weird. Let’s go over the process of spinning up an SDDC. First, you log onto vmc.vmware.com and you sign up for the service. In the console you define the number of hosts for deployment, click apply. If you select a multi-host deployment, by default an SDDC cluster contains 4 hosts, that means that VMC deploys four physical hosts (for more info: Dedicated Hardware in a Public Cloud World) on the AWS infrastructure. It installs and configures vSphere, vSAN, and NSX for you automatically. After roughly two hours you are the sole-owner of dedicated hardware with a fully software-defined data center running on top of that. Just log into your in-cloud vCenter and start to deploy your workload. So to reiterate, you just clicked a button on a website and a fully functional data center is deployed for you. https://twitter.com/kenhuiny/status/1019995175735758848 Ok so what about day 2 operations, let’s define this a bit clearer because there are multiple definitions available. Dzone provides the following definition: Once “something” goes into operations, “day 2 operations” is the remaining time period until this “something” isn’t killed or replaced with “something else.” We build a cloud management platform in AWS in order to deal with day-2 operations. VMware provides the service, we will keep the lights on for you, troubleshoot and maintain your environment. This CMP plaform allows us to provide services like automated hardware remediation. If a component inside the ESXi hosts fails, such as a NIC, or an NVMe device, the backend will detect this and it will initiate a process to replace the faulty host with a fully operational one. The customer won’t have to do a thing. Elastic DRS allows the cluster to respond to workload utilization automatically. It allows for automatic scale-out and scale-in, without the need for human intervention. Stretched Clusters protects the workload in the Cloud SDDC from AZ outages. If something happens, HA detects the failed VMs and restarts them on different physical servers in the remaining AZ without manual human involvement. Content library, allows the customer to subscribe the in-cloud SDDC to a template repository that automatically provides VM templates to the in-cloud SDDC. Read Williams post for more info Disaster Recovery as a Service, just go to the console, enable the add-on and the in-cloud components for SRM and vSphere Replication are automatically deployed and configured. Connect it to your on-prem components and you can build your DR runbooks. And there are many more functions that cover the lights-on, maintenance, housekeeping and optimize tasks of day 2. Now with that explained, the stories continue and a debate broke out on twitter. Some said it needs a form of CMP (eg. vRealize) for operating the SDDC. https://twitter.com/KenNalbone/status/1020106642287886337 This is an interesting observation, for which operation? Not for life-cycle or infrastructure management. We will take care of that for you. VMC is a fully managed service by VMware. It is responsible for the uptime and the lifecycle of the SDDC. we have built a CMP platform on the AWS infrastructure that allows us to deal with VMC. In a presentation of Chris Wegner (one of the principal engineers of VMC) the architecture is explained. The blue box is the actual SDDC, The green box is a custom-built CMP that allows VMware to identify customers, billing customers, providing support for customers (as a VMC customer, you only deal with VMware) but most importantly for this story, it allows VMware to deploy hardware and software (Fleet management). The next image provides a more detailed view of the green box. This is what you need to support hundreds of SDDCs across multiple regions (Oregon, N. Virginia, London, Frankfurt). Here you can see the bits for provisioning management, dealing with AWS services, acquiring hardware, configuring all the software and of course the ability to troubleshoot. You as a customer, do not need to worry again about ripping and replacing hardware because it failed, or because it’s nearing the end of support. You only need to care about deploying your workload. And because we took the conscious decision of using vCenter as the management structure, you can use your on-prem vRealize suite and deploy your workload on-prem or in-cloud. Using vRealize to deploy workload is the way to go forward because 80% of our customers have a hybrid cloud strategy a on-prem deployment is expected. It makes sense to run your tooling on-premises. With VMware Cloud on AWS, your responsibility shift from managing hardware to managing the consumption of resources.