vLore Blog

September 16, 2014

LACP, LAG, Etherchannel and vSphere 5.5 – a simple explanation

I have often stumbled when trying to explain the differences and the relationships between Etherchannel, LACP, and IEEE802.3ad. I began stumbling more when I learned that vSphere 5.5 supports Enhanced LACP and LAGs. Here is my best attempt to clarify.

Etherchannel: an Etherchannel is a logical channel formed by bundling together two or more links to aggregate bandwidth and provide redundancy. Another acceptable name for Etherchannel (an IOS term) is port channel (an NXOS term). Another acceptable name is Link Aggregation Group (LAG)

LACP: a standards based negotiation protocol used to dynamically build an Etherchannel. It is known as the IEEE 802.1ax (or IEEE 802.3ad) Link Aggregation Control Protocol (LACP). It is a protocol used to build Etherchannels (LAGs) dynamically. LAGs (Etherchannels) can be be also be built statically without using LACP.

IEEE 802.1ax: The IEEE working group that defines port channel, EtherChannels and link aggregation. Orinally, the working IEEE group was 802.3ad, but in 2008 it was replaced by 802.1ax.

IEEE 802.3ad: the original IEEE working group for port channel, EtherChannel, and link aggregation. Although it has been replaced with 802.1ax, referring to IEEE 802.3ad is typically acceptable. So references to IEEE 802.3ad LACP are common.

vSphere pre version 5.1: the standard virtual switches and distributed virtual switches provided natively by VMware vSphere 5.0 and earlier do not support LACP (dynamic LAG / Etherchannel creation); however, they support statically built LAGs (or this may be called static LAGs or static Etherchannels)

vSphere 5.1: the distributed virtual switches provided natively by VMware vSphere 5.1 support LACP (dynamic LAG / Etherchannel creation). The support is limited to one LAG per ESXi host and per dvSwitch

vSphere 5.5: the distributed virtual switches provided natively by VMware vSphere 5.5 support LACP (dynamic LAG / Etherchannel creation). It supports 64 LAGs per ESXi host and 64 LAGs per dvSwitch.

September 14, 2014

VMworld 2014 Souvenirs

I attended my 10th consecutive VMworld this year. Here are some of my main take-aways.

VMware vSphere 5.5 Update 2: is now GA and available for download. Some features are:

Support for ESXi hosts with up to 6TB RAM
VMware vShield Endpoint driver is bundled with VMware Tools and called Guest Introspection
VMware vCenter Server now supports these databases: Oracle 12c, MS SQL Server 2014. It drops support for IBM DB2.
vCenter Server Appliance meets high-governance compliance standards through the enforcement of DISA STIG.
Resolves several known issues
For details, see the VMware Blog.

Additionally, VMware now offers a very attractive edition of vSphere 5.5 for remote offices and branch offices (ROBO). This edition is aimed at distributed deployments, where the Essentials and Essentials Plus editions have previously been implemented. The licensing for this new edition is offer in packs for 25 VMs. Details are on the VMware Blog.

VMware vSphere 6.0 Beta: as a reminder, on June 30th, VMware announced the availability of the VMware vSphere 6.0 Beta program that it opened to the entire VMware community.

VMware acquires CloudVolumes: CloudVolumes’ technology, which is focused on virtualization above the OS, is sort of a hybrid of other technologies including application virtualization, layering, and containers. It installs applications in virtual disks (VMDK or VHD), records dependencies in an AppStack volume, and provides the VMDK/VHD as a read-only volume that can be instantly assigned to multiple VMs. Details are here on VMware blog.

Fault Tolerance (FT) in vSphere 6 supports VMs with up to four vCPU cores. The underlying code has been re-written. It uses Checkpointing instead of record / replay. The secondary VM has its own virtual disk, allowing FT to protect against datastore failures as well as host failures. It no longer requires eager-zeroed thick virutal disks. To get started, here is a good article on WoodITWork (not a VMware provided article) to get familiar.

VMware EVO: Rail Deployment Configuration and Management (VMware EVO Rail): combines compute, networking, storage, and software into a hyper-converged infrastructure appliance. It is a scalable Software Defined Data Center (SDDC) building block that includes VMware vSphere Enterprise Plus, VMware vCenter Server , VMware Virtual SAN, VMware Log Insight, and VMware EVO Rail deployed within a 2U 4-node hardware platform provided by a VMware qualified partner. Each node, which is optimized for VMware EVO Rail, provides:

Two 6-core CPUs
192 GB RAM
Three SAS 1.2 TB HDD and one 400 GB SSD for VMware Virtual SAN
Two 10 GE NIC ports

The VMware EVO Rail greatly simplifies the deployment, configuration, and management of SDDC. It enables you to create your first VM within minutes following the initial power-on of the solution. It is sized to run approximately 100 average sized, general purpose VMs or 250 virtual desktops (provided by VMware View), but naturally, the VM density depends on each use case.

See the Introduction to VMware EVO: RAIL

VMware EVO: Rack: This is built on the same concepts as VMware EVO Rail, but is aimed at a different customer base. VMware EVO Rack is aimed at private clouds for medium to large enterprises, where VMware EVO Rail is aimed at mid-size companies, remote office / branch office (ROBO) and VDI solutions. EVO Rack includes VMware NSX. For details, see the note from the CTO on the VMworld 2014 announcement of the Tech Preview of VMware EVO Rack.

My TV Debut … Actually, this is my interview on VMware TV on the Official VCAP5-DCA Cert Guide that I wrote with Steve Baca. I am pleased that the guide was the 2nd best seller at VMworld.

VMware Hybrid Cloud is now VMware vCloud Air: VMware vCloud Air, which is built upon a vSphere foundation, allows you to integrate your private cloud with an public cloud and allows you to easily migrate workloads between the two clouds. Actually, it provides you a hybrid cloud where you can easily deploy, manage, and migrate workloads that are running on-premises and off-premises. Details at vCloud.vmware.com.

VMware Integrated OpenStack (Beta): VMware Integrated OpenStack is designed for enterprises that want to provide an environment that is similar to public clouds to the developers that are actually using a private VMware virtual infrastructure. VMware Integrated OpenStack provides cloud-style APIs in an infrastructure built on VMware vSphere. The main goals are to allow VMware customers to successfully deploy OpenStack while leveraging their existing VMware investments and to allow them to confidently deliver production-grade OpenStack with full support from VMware. See details on the VMware Integrated OpenStack Beta at http://www.vmware.com/products/openstack.

VMware NSX 6.1: VMware NSX, which allows you to configure virtual networks (logical switches, logical routers, logical firewalls, logical balances, logical VPN, etc) in software, independently of the physical network, has been upgraded to version 6.1. VMware NSX implements network layers 2 through 7 components in software and uses the physical network as a tranport mechanism. Although it is integrated with VMware vSphere, VMware vCloud Director, and VMware vCloud Automation Center, it can also be deployed in multi-hypervisor environments, such as those that utilize Xen Server and KVM. See this URL to get familiar with NSX: http://www.vmware.com/products/nsx/

Some new feature in NSX 6.1 include:

Highly available NSX Edge clusters
DHCP Relay
Improved load balancing that now includes UDP and FTP load balancing, which can support services such as NTP and DNS
See the NSX 6.1 Release Notes for more details on what’s new in NSX 6.1.

VMware vCloud Suite 5.8: this version includes these new features:

Support Assistant, which is a tool that can be configured to automatically, proactively collect log bundles and transmit them to VMware support.
Expanded big data support, which now includes Hadoop 2 distributions.
Policy-based provisioning in vCloud Automation Center’s (vCAC) blueprints for DR protection tiers that are provided by VMware Site Recovery Manager (SRM) viaa vCenter Orchestrator plug-in.
Other DR improvements involving better SRM integration and scalability
See the Release Notes for more details.

VMware Realize Air: is basically vCloud Automation Center (vCAC) presented as a SaaS-based application. To get started, see these links in the specified order:

What does VMware Realize Air Automation really mean by Nick Colyer
VMware Realize Air Automation website
Introducing VMware vCloud Automation Center (vCAC) 6.1

VMware Virtual Volumes: This feature allows you to use a Storage Policy Based Management (SPBM) mechanism per virtual machine, or actually per virtual disk. Each storage system can automatically present a unique set of storage capabilities to vSphere, which can be used used to apply storage policies per VM. The concept is similar to the concept used by VMware Virtual SAN, whose capabilities and policies that can be used with local storage, but VMware Virtual Volumes extends the concept to your FiberChannel, iSCSI and NAS storage. Instead of configuring logical units (LUNs) in your SAN, you will simply present a pool of array based storage to vSphere and let vSphere do the work. Typically, the policies provided to vSphere using the VMware APIs for Storage Awareness (VASA) are focused on performance (such as disk stripes), redundancy (similar to RAID parity), and replication. Here are the details on the VMware Virtual Volumes Public Beta.

VMware Certification: VMware recently announced a new certification track in Network Virtualization, which includes an advanced level called the VMware Certified Implementation Expert – Network Virtualization (VCIX-NV) and the VMware Certified Design Expert – Network Virtualization (VCDX-NV) . Details are at http://mylearn.vmware.com/portals/certification/.

VMware vSphere Client: is not going away quite yet. Instead, it has been improved in vSphere 5.5 U2 to support VMware hardware version 10. Be careful though, the vSphere Client 5.5 U2 can be used to edit VMs that use VM hardware version 10, but it can only change features that are available in version VM hardware version 8. See the VMware Blog for details.

VMware Workspace Suite: a suite that combines AirWatch and VMware Horizon to provide a virtual workspace that unifies mobile, desktop and data. See the the Workspace Suite Introduction on the VMware blog.

NVDIA Grid vGPU on vSphere: NVIDIA and VMware announced an early access program for NVIDIA GRID™ vGPU on VMware vSphere. See details on the program and see the video on the technology.

VMware and Docker partnership: they see a world where VMs and containers play nicely together. See Dockers announcement.

VMware Authorized Training Center (VATC) changes: VMware Education recently changed their VATC program, such that only three VATCs remain, who can offer the vSphere Install Configure Manage class and Horizon View Install Configure Manage class for open enrollment. These VATCs can still offer any authorized VMware class for private delivery. VMware Education still recognizes several VMware education distributors, who can provide all authorized classes for open enrollment and VMware education resellers, who can resell classes for the distributors. Generally speaking, VMware customers should still be able to reach out to their current VMware training providers for guidance.

June 20, 2014

Addressing the Short Password Expiration in vCenter Server Appliance

Many vSphere administrators have learned the hard way that with default settings, the root account in the vCenter Server Appliance 5.5 expires after 90 days. The VMware KB article 2069041 addresses how to change the root account password after it expired, which requires rebooting the appliance, modifying the grub boot parameters and using the passwd command.

To avoid this issue, you could consider modifying the ESXi host policies, such that it forces the user to change the root account password whenever it expires rather than locking the root account. The KB article 2069041 also discusses how to adjust the expiration policy.

Likewise, some vSphere administrators have unexpectedly experienced situations where the VMware vSphere Single Sign-On (SSO) administrator account is locked due to password expiration. See VMware KB article 2034608 for details on resolving the issue in SSO 5.1 and SSo 5.5.

Naturally, you can avoid certain issues by configuring a solid SSO Password Policy by implementing the procedure found here.

May 22, 2014

Tips for VCP5-DCV Exam Preparation

I recently submitted a white paper on Tips for VCP5-DCV Preparation that contains a lot of details that I expect VCP candidates will find useful, such as:

Benefits and requirements of VCP5-DCV
Differences in the VCP510 and VCP550 exams
General preparation tips
URLs to unofficial practice exams
Link to Measure-Up, the official source for practice exams
Special considerations for the new VCP550 exam
Details on about 50 items (objectives, skills, knowledge) that deserve special attention
Information on related, official VMware courses

If you are preparing for the VCP5-DCV exam, please take a look and provide feedback.

March 29, 2014

Proposal for Providing AD and DNS for SRM Test Failover

This is an idea that I have implemented for DR test purposes for a couple of customers. I have not seen this idea documented anywhere in the community, so I thought I would post it here for discussion.

Scenario:

For some of my VMware SRM customers, the network allows production VLANs and subnets to be extended to the recovery site, which allows us to keep the IP settings of VMs during a planned migration or disaster recovery. This certainly simplifies our DR planning. For example, DNS records do not have to be updated during a disaster recovery or planned migration, because the IP addresses of the VMs will not be changed. We can run a set of Active Directory (AD) controllers and Domain Name Service (DNS) servers at the recovery site, where they stay synchronized with their counterparts that run at the protected site. This means that current AD and DNS data is available at the recovery site and that SRM does not have to recover AD and DNS during a real disaster recovery.

However, some challenges may exist while performing non-disruptive test recoveries in this scenario, which typically requires isolated test network networks. The first concern is whether or not the IP settings for VMs must be changed during non-disruptive tests, to allow the original VMs and applications to continue to run undisturbed. Preferably, the test networks will allow us to keep the original IP settings of the VMs without being visible to the production network, where the IP addresses are currently in use. If this can be achieved, then the next issue is how to provide the required AD controllers and DNS servers for test purposes. Ideally, the AD controllers and DNS servers would provide current data (current at the moment the test began), would run in the test network with no concern that they can be seen by the production network, and would be easily removed after the test is complete.

Proposal:

To facilitate non-disruptive testing, we could use vSphere Replication (VR), a recovery plan, and the Test Recovery option in SRM to failover AD, DNS, and other required infrastructure severs to the test network, prior to using Test Recovery to failover any other test plan. This ensures that the test network has access to recently updated AD, DNS and other infrastructure servers, which are isolated from production networks during the test, but are available to the VMs being tested. These infrastructure servers can harmlessly be modified in any manner during the test period. After the test is complete, the Cleanup operation in SRM can be used to clean up the VMs involved both recovery plans.

NOTE-1:

The infrastructure (AD, DNS, DHCP, etc) VMs should Not be recovered when performing actual Disaster Recovery migrations. Instead, peers of these servers, which are kept consistent via the application (such as AD synchronization), should reside at the recovery site.

NOTE-2:

In some cases, the state of the AD controller could be inconsistent and produce errors when it is brought up at the recovery site during a Test Recovery operation. This could occur if the AD controller was in the midst of an AD synchronization when the VR replication occurred. In this rare case, just use the Cleanup operation and repeat the Test Recovery operation.

NOTE-3:

This proposal should only be implemented if the Test Network is completely isolated from the Production Network. It is acceptable that the test network be comprised of multiple networks (VLAN / subnets) that communicate with each other, as long as none of these networks can communicate with any production network.

Note-4: In most cases, only a single AD controller should be included in the test recovery. VR does not quiesce multiple VMs simultaneously, even if they are part of the same protection group, so two or more domain controllers may not be in sync if recovered together.

March 15, 2014

Sample Patching Policy – VMware Update Manager – ESXi Hosts

I typically recommend that administrators establish a policy for using VMware Update Manager to patch and update their ESXi hosts. Frequently, I help them write such a policy. The policy tends to vary greatly from one environment to the next. Sometimes, it varies from one ESXi cluster to the next within a single environment. The policy depends on many factors. Several of my customers are required to install new operating system patches (including ESXi patches) within 14 days of their release. Several of my larger customers have one or more clusters dedicated to development and test, where they are free to immediately install and test new patches without concern of impacting production services. Other customers only have three or fours ESXi hosts, which are all running critical VMs. Some customers are very concerned about patching aggressively due to fear of vulnerabilities. Some customers have little interest in patching and they ask, “If it works, why risk breaking it”? Some customers seldom patch, except immediately after installing an update or performing an upgrade.

Typically, my goal is to help the customer create a Patch Policy that well suits them and to help them develop the specific procedure for implementing the policy. Here is sample of a policy and procedure that I recently helped develop for a customer. The customer uses two vSphere clusters to run an application, whose SLA requires 99.99% plus availability. The application utilizes active and passive sets of virtual machines. The Active set of VMs run in Cluster-A and the Passive set of VMs run in Cluster-B. The administrators can instantly fail the application from Cluster-A to Cluster-B using a simple user interface provided by the application. They visualize vSphere simply as a solid, resilient platform to run this application. They make very few changes to the environment. They are very concerned that changing anything may disrupt the application or introduce new risk. Each cluster is composed of multiple blades and blade chassis.

In this particular use case, we developed the following policy and procedure:

Policy: Plan to patch once per quarter and only install any missing Critical patches that are at least 30 days old. Initially, apply new patches to a single ESXi host in the B Cluster. The next day, apply new patches to second host in the same chassis. The third day, apply the new patches to the remaining hosts in the chassis. On the fourth day, apply the patches to the remaining hosts in the entire cluster. On the following day apply the new patches to all the hosts in one chassis in the Cluster A. On the final day, apply the new patches to the remaining hosts in Cluster A.
Procedure:
- Download all available patches from VMware’s website and manually copy the zip file to a location that is accessible from the vCenter Server.
- Use the Import Patches link on the Update Manger configuration tab to import all patches from the zip file.
- Create a new Dynamic baseline. Set the Severity to Critical, check On or Before, and the Release Date to the specific date that is 30 days prior to the current date.
- Attach the Baseline to Cluster B and Scan the entire cluster for compliance with the baseline.
- Select one non-compliant ESXi host to upgrade first. Select Enter Maintenance Mode on that host.
- Edit the DRS Settings in the Cluster and change the Automation Level to Manual.
- Remediate the host to install the missing patches.
- Restart the host. Examine its Events and logs and verify no issues exist.
- Migrate a single, non-critical VM to the host. Test various administration functions, such as console interaction, power on, and vMotion.
- Select the cluster and the DRS tab. Use the Run DRS to generate recommendations immediately. If any recommendations appear, use the Apply Recommendations button to start the migrations.
- Following the order and schedule that is established in the policy, continue upgrading the remaining hosts in Cluster B.
- After all hosts in Cluster B are patched, then change the DRS Automation back to Fully Automated
- Update Cluster A by applying the previous steps.

February 23, 2014

VMware SRM Custom Install – Shared Recovery Site

In a few of my customer’s VMware Site Recovery Manager (SRM) implementations, we needed to configure a single recovery site to support two protected sites. SRM does permit this, but it requires a custom installation. Early in my SRM engagements, I take steps to determine if a shared recovery site is needed or may be needed. In either case, I perform the custom installation that permits the shared recovery site. Here are a few keys to configuring a shared recovery site in SRM.

Planning

The first key is planning. The main difference in planning for a shared recovery site versus a standard recovery site is that SRM must be installed twice at the recovery site (once for each protected site). SRM must be installed into separate Windows servers at the shared recovery site. These two SRM instances represent a single site, but will have unique SRM-IDs. The SRM-ID can be thought of as the name of an SRM instance at the shared recovery site. A common convention is to set each SRM-ID value to a string that combines the recovery site name and the site that the instance it protects.

For example, consider a case where the shared recovery site is called Dallas, which protects two sites called Denver and Seattle. At the Dallas site, SRM must be installed in two Windows servers. One SRM instance will be used to protect the Denver site and the other instance will protect the Seattle site. In this case, a sensible choice for the SRM-IDs may be DAL-DEN and DAL-SEA.

Custom Installation

The second key is to perform the custom installation. To perform the custom installation:

Using one of the Windows servers at the shared recovery site, where an SRM instance will be implemented to protect one specific site, download the SRM installer.
In a command prompt, change the default directory to the location of the installer.
Run this command to launch the wizard for the custom installer:VMware-srm-5.1.1-1082082.exe /V”CUSTOM_SETUP=1”
The custom installation wizard should look much like the standard installation wizard, except that it includes some extra pages and options. The first additional page is the VMware SRM Plugin Identifier page. On this page, select Custom_SRM Plugin Identifier.
The second additional page prompts the user to provide the SRM ID, Organization, and Description. The critical value to provide is the SRM_ID, which should be set to the value that is planned for one of the SRM instances at the shared recovery site. (For example, DAL-DEN).
The remainder of the installation process is identical to a standard installation. Be sure to repeat these steps for the second SRM instance at the shared recovery site.

Connecting the Protected Site to the Correct SRM Instance

The third key is to connect each protected site to the correct SRM instance at the shared recovery site. The main difference in connecting each protected site to shared recovery site versus connecting to a standard recovery site is to select the appropriate SRM-ID. Begin by performing the typical steps to connect the protected site to the recovery site, which requires using the SRM plugin for the vSphere Client to select the first protected site and click Configure Connection. In the Connection wizard, an extra page will appear after selecting the vCenter Server at the recovery site. The extra page identifies the two SRM instances at the recovery site by displaying a list providing the SRM-ID, Organization, and Description of each SRM instance. Choose the SRM-ID that corresponds to the SRM instance that should be used to protect the first site. Naturally, this process should be repeated for the second protected site.

February 21, 2014

New Exam for VCP5-Cloud Certification – VCPC550

VMware just released a new exam that can be used to qualify for VCP5-Cloud. It is the VCPC550 exam. It covers vCloud Director 5.5 and vCloud Automation Center (vCAC) 5.2. Previously, VCP5-Cloud candidates had to pass the VCPC510 exam, which covers vCloud Director 5.1. Now, candidates for VCP5-Cloud have a choice. They can pass either the new VCPC550 exam or the original VCPC510 exam. In either case, the candidate will earn the same certification, VCP5-Cloud.

For details on the certification, such as exam choices and blueprint, see the VCP-Cloud Certification webpage.

February 8, 2014

Welcome to vLoreBlog 2.0!

After eighteen months of dedicating this blog to students to provide supplemental data and certification preparation advice, I decided to expand its goals. This year, I plan to include categories of articles related to professional services focused on VMware.

In addition to having years of experience teaching official VMware training classes, I also have years of experience in delivering professional services on vSphere, View, vCloud and other VMware technologies. In addition to being a VMware Certified Instructor (VCI) Level 2, I also am a VMware Certified Professional (VCP) on datacenter virtualization (VCD-DCV), desktop (VCP-DT) and cloud (VCP-Cloud). I am also a VMware Certified Advanced Professional (VCAP) on datacenter design (VCAP-DCD), desktop design (VCAP-DTD) , cloud design (VCAP-CID), and datacenter administration (VCAP-DCA). In addition to providing training via the VMware Authorized Training Center (VATC) program, I also provide professional services via the VMware Authorized Consultant (VAC) program. VMware utilizes me, as well as other authorized consulting partners, to deliver professional services in the field to their customers. VMware works hard to ensure that any sub-contrator delivering their professional services are as capable as their own engineers. In my case, I have worked longer at delivering VMware focused professional services than most of engineers, who are directly employed by VMware. I have trained several of these engineers.

I plan to begin using vLoreBlog to share advice and experience related to my professional services delivers. This will include areas such as:

New features, products, and announcements from VMware
3rd party products and technologies
Architect and Design advice
Real field examples including challenges, decision justification, and gotchas

The articles will concentrate on details from my actual experience with the intent of providing details that may be lacking in the community.

Today, I posted my first article under the new category: Professional Services Tips.

I hope you find it useful.

February 7, 2014

Custom vCenter Server Alarms and Actions

As part of many of my vSphere related professional services engagements, such jumpstarts, designs, upgrades and health-checks, I typically address the alarms provided by VMware vCenter Server. Frequently, I recommend creating some custom alarms and configuring specific actions on some alarms to meet customer needs. Although my recommendations are unique for each customer, they tend to have many similarities. Here I am proving a sample of the recommendations that I provided to a customer in Los Angeles, whose major focus is to ensure high availability. In this scenario, the customer does not use an SNMP management system, so we decided to use the option to send emails to the administration team, instead of sending SNMP traps. Also, in this scenario, the customer planned to configure Storage DRS in Manual mode, instead of Automatic mode.

vCenter Alarms and Email Notifications

Configure the Actions for the following pre-defined alarms to send email notifications. I consider each of these alarms to be unexpected and worthy of immediate attention if they trigger in this specific vSphere environment. Unless otherwise stated, configure the Action to occur only when the alarm changes to the Red state.

Host connection and power state (alerts if host connection state = “not responding” and host power state is NOT = Standby)
Host battery status
Host error
Host hardware fan status
Host hardware power status (HW Health tab indicates UpperCriticalThreshold = 675 Watts, UpperThresholdFatal=702 Watts)
Host hardware system board status
Host hardware temperature status
Host hardware voltage
Status of other host hardware object
vSphere HA host status
Cannot find vSphere master agent
vSphere HA failover in progress
vSphere HA virtual machine failover failed
Insufficient vSphere HA failover resources
Storage DRS Recommendation (if the decision is made to configure Storage DRS in a Manual Mode)
Datastore cluster is out of space
Datastore usage on disk (Red state is triggered at 85% usage)
Cannot connect to storage (triggered if host loses connectivity to a storage device)
Network uplink redundancy degraded
Network uplink redundancy lost
Cannot connect to storage (triggered if host loses connectivity to a storage device)
Health status monitoring (triggers if changes occur to overall vCenter Service status)
Virtual Machine Consolidation Needed status (triggered if a Delete Snapshot task failed for a VM)

Consider creating these custom alarms on the folders where critical VMs. Optionally, define email actions on some of these.

Datastore Disk Provisioned (%) (set yellow trigger to 100%, where the provisioned disk space meets or exceeds the capacity.)
VM Snapshot size (set to trigger at 2 GB)
VM Max Total Disk Latency (set trigger at 20 ms for 1 minute)
VM CPU Ready Time – assign these to individual VMs or folders, depending on the number of vCPUs (total virtual cores) assigned to each VM