CONSUMER GRADE SSD VERSUS ENTERPRISE GRADE SSD, WHICH ONE TO PICK?
Should I use consumer grade SSD drives or should I use enterprise grade SSD drives? This a very popular question and I receive it almost on a daily basis. Lab or production environment, my answer is always the same: Enterprise grade without a doubt! Why? Enterprise Grade drives have a higher endurance level, they contain power loss data protection features and they consistently provide high level of performance. All align with a strategy ensuring reliable and consistent performance. Lets expand on these three key features; Endurance Recently a lot of information is released about the endurance levels of consumer grade SSDs and tests show that they operate well beyond the claimed endurance levels. Exciting news as it shows how much progression is made during the last few years. But be aware that vendors test their consumer grade SSDs with client workloads while enterprise grade SSDs are tested with worst-case data center workload. The interesting question is whether the SSD vendor is list the rate a drive in DWPD or drive-writes per-day in a conservative manner or an aggressive manner? As I don’t want to gamble with customers’ data, I’m not planning to find out whether the consumer SSD wasn’t able to sustain high levels of continuous data center load. I believe vSphere architectures have high endurance requirements; therefore use enterprise drives as they are specifically designed and tested for this use. Power loss data protection features Not often highlighted but most enterprise SSDs contain power loss data protection features. These SSDs typically contains a small buffer or cache in which the data is stored before it’s written to disk. Enterprise SSD leverages various on-board capacitance solutions to provide enough energy for the SSD to move the data from the cache to the drive itself. Protecting the drive and the data. It protects the drive because if a sector is partially written it becomes unreadable. This can lead to performance problems, as the drive will perform time-consuming error recovery on that sector. Select Enterprise drives with power loss data protection features, it avoids erratic performance levels or even drive failure after a power-loss. Consistent performance Last but certainly not least is the fact that enterprise SSDs are designed to provide a consistent level of performance. SSD vendors expect their enterprise disks to be used intensively for an extended period of time. This means that possibility of a full disk increases dramatically when comparing it to a consumer grade SSD. As data can only be written to a cell that is in an erased state, high levels of write amplification is expected. Please read this article to learn more about write amplification (write amp). Write amp impacts the ratio of drive writes to host writes, that means that when write amp occurs the number of writes a drive needs to make increases considerably in order to execute those host writes. One way to reduce this strain is to “over-provision” the drive. Vendors, such as Intel, allocate a large amount of flash resource to allow the drive to absorb these write amp operations. This results in a more consistent rate of IOPS and predictable IOPS. Impact on IOPS and Latency I’ve done some testing in my lab, and used two enterprise flash drives, a Intel DC S3700 and a Kingston E-100. I also used two different consumer grade flash devices. I refrain from listing the type and vendor name of these disks. I ran the first test from 11:30 to 11:50 I ran the test an enterprise grade SSD drive, the rate of IOPS was consistent and predictable. The VM was migrated to the host with the consumer grade SSD and the same test was run again, not a single moment did the disk provide a steady rate of IOs. Anandtech.com performed similar tests and witnessed similar behaviour, the publish their results in the article “Exploring the Relationship Between Spare Area and Performance Consistency in Modern SSDs” An excellent read, highly recommended. [caption id=“attachment_4262” align=“aligncenter” width=“646”] Picture courtesy of Anandtech.com[/caption] Click on the different drive sizes to view their default performance and the impact of spare flash resources on the ability to provide consistent performance. Next step was to determine latency behaviour. Both Enterprise grade SSD provided an extreme case of predictable latency. To try to create an even playing field I ran read tests instead of write centric tests. The first graph was a read test on the Kingston e100. Latency was consistent providing predictable and consistent application response time. The consumer grade drive performance charts were not as pretty. The virtual machine running the read test was the only workload hitting the drive and yet the drive had trouble providing steady response times. Please note that the test were ran multiple times and the graphs shown are the most positive ones for the consumer grades. Multiple (enterprise-level) controllers were used to avoid any impact from that layer. As more and more SSD drive hit the market we decided to help to determine which drives fit in a strategy ensuring reliable and consistent performance. Therefor PernixData started the PernixDrive initiative, in which we test and approve flash devices. Conclusion Providing consistent performance is key for predictable application behaviour. This applies to many levels of operation. First of all it benefits day-to-day customer satisfaction and helps you to reduce troubleshooting application performance. Power-loss data protection features help you to cope with short-term service loss, and avoid continuous performance loss as the drive can survive power-loss situations. Reverting applications to a non-accelerated state, due to complete loss of SSD drive can result in customer dissatisfaction or neglecting your SLA. Higher levels of drive-writes per-day help you to create and ensure high levels of consistent performance for longer terms. In short, use the correct tool for the job and go for enterprise SSD drives.
WHO TO VOTE FOR?
This week Eric Siebert opened up the 2014 edition of top virtualization blog contest. For the industry this is one of the highlights and applaud the effort Eric and his team of volunteers put in to make this work. I cannot wait to the watch the show in which they unveil this years top 25 winners. A big thank you to Eric and the team! Most of the time you will see blog articles that highlight this years effort and I think they are great. As there are so many great bloggers writing and sharing their thoughts and ideas, it’s very easy to miss out on some brilliant post. A quick scan of these posts helps to (re)discover the wealth of information that is out there. Last year I was voted number 2, however this year the frequency (hopefully not the quality) of my blog articles went down. This was due to my career change and the new responsibilities my job role encompasses. Plus creating the vSphere design book took a lot of time and effort. For this years VMworld we have planned something even better, so please stay tuned for this years VMworld book! But this post is not about me as a blogger and my material, but to highlight some of the bloggers that help the community understand the product better, comprehend the behavior of the complex systems we work with every day and the insights they provide by spending a lot of their (spare) time writing and creating these great articles. Voting for them you will help them understand that their time and effort is well spend! First of all, guys like Duncan Epping, Cormac Hogan, William Lam and Eric Sloof relentlessly churn out great collateral, whether it is a written article, podcast or video. It keeps the community well fed when it comes to quality information. Writing a great article is a challenge, doing this on a continuous basis is even more impressive! But I would like to highlight some of the guys that are considered “new” guys. They are all industry veterans, but they decided to pick up blogging recently. I would like to highlight these guys, but there are many more of course. Pete Koehler - vmpete.com Pete writes a lot about PernixData, but that’s not the reason I want to highlight him. His articles are quite in-depth and I love reading those articles as I learn from them every time Pete decides to post his most recent insights. For example in the article “Observations of PernixData in a Production environment” he covers the IOPS, Throughput & Latency relationship in great detail. In this exercise he discovers that applications do not use a static block size, something you don’t read that often. He correlates specific output and explains how each metric interacts which each other, educating you along the way and helping you to do a better and more effective job in your own environment. Josh Odgers - joshodgers.com Josh is listed both on the general blogging list as well as a newcomer and I think he deserves to be “rookie of the year” Josh’s insight are very valuable and its always a joy to read his articles. His VCDX articles are top notch and are a must read for every aspiring VCDX candidate. Just too bad he decided to join Nutanix ;). Luca Dell’Oca – virtualtothecore.com Dropping knowledge both in English and Italian, Luca is covering new technologies as well as insight full tips and tricks on a frequent basis. Ranging from reclaiming space on a Windows 2012 installation to a complete write up on how to create a valuable I/O test virtual machine. A blog that should be visited regularly. Willem ter Harmsel - willemterharmsel.nl Not your average virtualization blog, Willem covers the startup world by interviewing CEO’s and CTOs of the hottest and newest startups this world currently has to offer. Willem provides insights of upcoming technology and allows its readers to place and compare different technologies. A welcome change of pace after spending a day knee-deep into the bits and bytes Consuming those stories and articles on a daily basis, are they helpful in your daily work? Please show your appreciation and vote today on your favorite blogs! Thanks! Please vote now!
VCDX DEFENCE: ARE YOU PLANNING TO USE A FICTITIOUS DESIGN?
This week the following tweet caught my eye: https://twitter.com/VirtualSnook/status/432274972992487424 Apparently Marc Brunstad (VCDX program manager) stated this fact during the PEX VCDX workshop. But what does this stat mean and to what level do you need to take this into regard when submitting your own design? During my days as a panel member, I’ve seen only a handful of fictitious designs and although they were technically sound, the reasoning and defense were usually not that strong. Be aware that the VCDX program isn’t born into existence to find the best design ever. It determines if the candidate has aligned the technical functionality with the customers’ requirements, the constraints provided by the environment and the assumptions the team made about for example future workloads or organizational growth. But does that mean that you shouldn’t use any fictitious element in your design? Are fictiticous elements inherently bad? I don’t think so. Speaking from own experience I made some adjustments to my design I submitted. My submitted design was largely based on the environment that I worked on for a couple of years. At that time the customer used rack-based systems, my design contained a blade architecture. The reason why I changed this, as it allowed me to demonstrate my knowledge of the HA stack featured in vSphere 4.1. Some might argue that I deliberately made my design more complex, but I was comfortable enough to defend my choices and explain High Availability Primary and Secondary node interaction and how to mitigate risk. More over it allowed me to demonstrate the pros and cons of such a design on various levels, such as the impact it had on operational processes, the influence on scalability and the alignment of availability policies to org-defined failure domains. Did I have these discussions in real life? Yes, with many other customers but just not with that specific customer that this design was based on. And that’s why complete fictitious designs fail and why most reasoning is incomplete. The candidate only focused on the alignment of technical specs and workload. Not the “softer” side of things. Arguing that this design element was just the wish of a customer just doesn’t cut it. Sure we all met customers that were strung on having that particular setting configured in the way they saw fit, but its your responsibility to explain to the panel which steps you took to inform the customer about the risk and potential impact that setting had. Try to explain which setting you would have used and why. Demonstrate your knowledge about feasible alternatives. My recommendation to future candidates; when incorporating a specific fictitious design element in your design, make sure you had a conversation with a customer about that element once. You can easily align this with the main design and it helps to recollect the specifics during your defense.
INSTALLING EXCHANGE JETSTRESS WITHOUT FULL INSTALLATION MEDIA.
I believe in testing environments with applications that will be used in the infrastructure itself. Pure synthetic workloads, such as IOmeter, are useful to push hardware to their theoretical limit but that’s about it. Using a real life workload, common to your infrastructure, will give you a better understanding of the performance and behavior of the environment you are testing. However, it can be cumbersome to setup the full application stack to simulate that workload and it might be difficult to simulate future workload. Simulators made by the application vendor, such as SQLIO Disk Subsystem Benchmark Tool or Exchange Server Jetstress, provide an easy way to test system behaviour and simulate workloads that might be present in the future. One of my favourite workload simulators is MS Exchange server Jetstress however its not a turn-key solution. After installing Exchange Jetstress you are required to install the ESE binary files from an Exchange server. It can happen that you don’t have the MS exchange installation media available or a live MS exchange system installed. Microsoft recommends downloading the trail version of Exchange, install the software and then copy the files from its directory. Fortunately you can save a lot of time by skipping these steps and extract the ESE files straight from an Exchange Service Pack. Added bonus, you immediately know you have the latest versions of the files. I want use Jetstress 2010 and therefor I downloaded Microsoft Exchange Server Jetstress 2010 (64 bit) and Microsoft Exchange Server 2010 Service Pack 3 (SP3). To extract the files direct from the .exe file, I use 7zip file archiver. () The ESE files are located in the following directory:
VSPHERE 5.5 VCENTER SERVER INVENTORY 0
After logging into my brand spanking new vCenter 5.5 server I was treated with a vCenter server inventory count of 0. Interesting to say the least as I installed vCenter on a new windows 2008 R2 machine, connected to a fresh MS active directory domain. I installed vCenter with a user account that is domain admin, local admin and has all the appropriate local rights (M_ember of the Administrators group, Act as part of the operating system and Log on as a Service_). The install process went like a breeze, no error messages whatsoever and yet the vCenter server object was mysteriously missing after I logged in. A mindbender! Being able to log into the vCenter server and finding no trace of this object whatsoever, it felt like someone answering the door and saying he’s not home. I believed I did my due diligence, I read the topic “Prerequisites for Installing vCenter Single Sign-On, Inventory Service, and vCenter Server” and followed every step, however it appeared I did not RTFM enough. [email protected] only Apparently vSphere will only attach the permissions and assign the role of administrator to the default account [email protected] and you have to logon with this account after the installation is complete. See “How vCenter Single Sign-On Affects Log In Behavior” for the following quote:
VCDX DEFEND CLINIC: CHOOSING BETWEEN MULTI-NIC VMOTION AND LBT
A new round of VCDX defenses will kickoff soon and I want to wish everyone that participates in the panel session good luck. Usually when VCDX panels are near, I receive questions on how to prepare for a panel. And one recommendation I usually provide is “Know why you used a specific configuration of a feature and especially know why you haven’t used the available alternatives”. Let’s have some fun with this and go through a “defend clinic”. The point of this clinic is to provide you an exercise model than you can use for any configuration, not only for a vMotion configuration. It helps you to understand the relationship of information you provide throughout your documentation set and helps you explain how you derived through every decision to come to this design. To give you some background, when a panel member is provided the participants documentation set, he enters a game of connecting the dots. This set of documents are his only view into the your world while creating the design and dealing with your customer. He needs to take your design and compare it to the requirements of the customer, the uncertainties you dealt with in the form of assumptions and the constraints that were given. Reviewing the design on technical accuracy is only a small portion of the process. That’s just basically checking to see if you are using your tools and material correctly, the remaining part is to understand if you build the house to the specification of the customer while dealing with regional laws and the available space and layout of the land. Building a 90.000 square feet single floor villa might provide you the most amount of easily accessible space, but if you want to build that thing in downtown Manhattan you’re gonna have a bad time. ;) Structure of the article This exercise lists the design goals and its influencers, requirements, constraints and assumptions. The normal printed text is architects (technical) argument while the paragraphs are displayed in Italic can be seen as questions or thoughts of a reviewer/panel member. Is this a blue print on how to beat the panel? No! It’s just an exhibition on how to connect and correlate certain statement made in various documents. Now let’s have some fun exploring and connecting the dots in this exercise. Design goal and influencers Your design needs to contain a vMotion network as the customer wants to leverage DRS load balancing, maintenance mode and overall enjoy the fantastic ability of VM mobility. How will you design your vMotion network? In your application form you have stated that the customer want to see a design that reduces complexity, increases scalability, prefers to have the best performance available as possible. Financial budget and the amount of IP-addresses are constraints and the level of expertise of the virtualization management team is an assumption. Listing the technical requirements Since you are planning to use vSphere 5.x you have the choice to create a traditional single vMotion-enabled VMKnic, Multi-NIC vMotion setup or use vMotion configuration that uses “Route based on physical NIC load” load balance algorithm (commonly known as LBT) to distribute vMotion traffic amongst multiple active NICs. As the customer does not prefer to use link aggregation, IP-hash based / EtherChannel configurations are not valid. First let’s review the newer vMotion configurations and how they differentiate from the traditional vMotion configuration, where you have one single VMKnic, a single IP address, connected to a single Portgroup which is configured to use an active and standby NIC? Multi-NIC vMotion • Multiple VMKnics required • Multiple IP-addresses required • Consistent configuration of NIC failover order required • Multiple physical NICs required Route based on physical NIC load • Distributed vSwitch required • Multiple physical NICs required It goes without saying that you want to provide the best performance possible that leads you into considering using multiple NICs to increase bandwidth. But which one will be better? A simple performance test will determine that. VCDX application form: Requirements In your application document you stated that one of the customer requirements was “Reducing complexity”. Which of the two configurations do you choose now, what are your arguments? How do you balance or prioritize performance over complexity reduction? If Multi-NIC vMotion beats LBT configuration in performance, leading to faster maintenance mode operations, better DRS load balance operations and overall reduction in lead time of a manual vMotion process, would you still choose the simpler configuration over the complex one? Simplicity is LBTs forte, just enable vMotion on a VMKnic, add multiple uplinks, set them to active and your good to go. Multi-NIC vMotion exists of more intricate steps to get a proper configuration up and running. Multiple vMotion-enabled VMKnics are necessary, each with their own IP-range configuration, secondly vMotion requires deterministic path control, meaning that it wants to know which path is selects to send traffic across. As the vMotion load balancing process is higher up in the stack, NIC failover orders are transparent for vMotion. It selects a VMKnic and assumes it resembles a different physical path then the other available VMKnics. That means its up to the administrator to provide these unique and deterministic paths. Are they capable of doing this? You mentioned the level of expertise of the admin team as an assumption, how do you guarantee that they can execute this design, properly manage it for a long period and expand the design without the use of external resources? Automation to the rescue Complexity of technology by itself should not pose a problem, its how you (are required to) interact with it that can lead to challenges. As mentioned before Multi-NIC vMotion requires multiple IP-addresses to function. On a side note this could put pressure on the IP-ranges as all vMotion enabled VMKnics inside the cluster requires being a part of the same network. Unfortunately routed vMotion is not supported yet. Every vMotion VMKnic needs to be configured properly, Pair this with availability requirements and the active and standby NIC configuration of each VMKnic can cause headaches if you want to have a consistent and identical network configuration across the cluster. Power-CLI and Host Profiles can help tremendously in this area. Supporting documents Now have you included these scripts in your documentation? Have you covered the installation steps on how to configure vMotion on a distributed switch? Make sure that these elements are included in your supporting documents! What about the constraints and limitations? Licensing Unfortunately LBT is only available in distributed vSwitches, resulting in a top-tier licensing requirement if LBT is selected. The LBT configuration might be preferred over Multi-NIC vMotion configuration because it provides the least amount of complexity increase over the traditional configuration. How does this intersect with the listed budget constraint and the customer is not able –or willing – to invest in enterprise licenses? IP4 pressure One of the listed constraints in the application form is the limited amount of IP addresses in the available IP range destined for the virtual infrastructure. This could impact your decision on which configuration to select. Would you “sacrifice” the amount of IP-s to get a better vMotion performance and all the related improvements on the remaining dependent features or is scalability and future expansion of your cluster more important? Remember scalability is also listed in the application form as a requirement. Try this at home! These are just an example of questions that can be asked during a defense. Try to find these answers when preparing for you VCDX panel. When finalizing the document set, try to do this exercise. Even better to find a group of your peers and try to review each others design while reviewing the application form and the supporting set of documents. At the Nordic VMUG Duncan and I spoke with a group of people that are setting up a VCDX study group, I think this is a great way of not only preparing for a VCDX panel but to learn and improve your skill set you can use in your daily profession.
MY LAB AND THE BIRTH OF THE PORTABLE IKEA LACK 19” DATACENTER RACK
Currently, topics about labs are hot, and when meeting people at the VMUGs or other tech conferences, I get asked a lot about my lab configuration. I’m a big fan of labs, and I think everybody who works in IT needs a lab, whether it’s at home or in a centralized location. At PernixData, we have two major labs. One on the east coast and one on the west coast of the U.S. Both these labs are shared, so you cannot do everything you like. However, sometimes you want to break stuff. You want to pull cables and disks and kill an entire server or array. To see what happens. For these reasons having a lab that is 4000 miles away doesn’t work. Enough reasons to build a small lab at home.
SEND F11 KEY TO NESTED ESXI ON MAC
I only use Mac at home, most of the time it’s great sometimes it’s not. For example when installing or configuring your remote lab. I have a windows server installed on a virtual machine that runs vCenter and the vSphere client. When I’m installing a new nested ESXi server, I connect with a remote desktop session to the Windows machine and use the VMware vSphere client. During the ESXi install process, it requires to press the F11 key to continue with the install process. However, F11 isn’t mapped by the vSphere client automatically and there isn’t a menu option in the vSphere client to send it to the client.
VCPU CONFIGURATION. PERFORMANCE IMPACT BETWEEN VIRTUAL SOCKETS AND VIRTUAL CORES?
A question that I frequently receive is if there is a difference in virtual machine performance if the virtual machine is created with multiple cores instead of selecting multiple sockets? Single core CPU VMware introduced multi core virtual CPU in vSphere 4.1 to avoid socket restrictions used by operating systems. In vSphere a vCPU is presented to the operating system as a single core cpu in a single socket, this limits the number of vCPUs that can be operating system. Typically the OS-vendor only restricts the number of physical CPU and not the number of logical CPU (better know as cores).
ONLY TWO DAYS LEFT TO SIGN UP FOR YOUR VMWORLD SPEAKER SHIRT!
During our advisory call with CloudPhysics, a great idea was born. Why not provide all the speakers at VMworld a cool speaker shirt? Unfortunately in 2011 VMworld made the decision to stop providing speaker shirts to the people on stage, so most people started wearing older speaker shirt or even RUN DRS shirts ☺. For most speakers this move by VMworld was disappointing as the speaker shirts made them more recognizable but it also served as a cool badge of honor. I think CloudPhysics stepped up big time and gave us back that cool badge of honor. If you are a speaker this year, go register here before tomorrow evening as the deadline is Tuesday August 13th end of day(pst).