Custom vCenter Server Alarms and Actions

February 7, 2014

As part of many of my vSphere related professional services engagements, such jumpstarts, designs, upgrades and health-checks, I typically address the alarms provided by VMware vCenter Server. Frequently, I recommend creating some custom alarms and configuring specific actions on some alarms to meet customer needs. Although my recommendations are unique for each customer, they tend to have many similarities. Here I am proving a sample of the recommendations that I provided to a customer in Los Angeles, whose major focus is to ensure high availability. In this scenario, the customer does not use an SNMP management system, so we decided to use the option to send emails to the administration team, instead of sending SNMP traps. Also, in this scenario, the customer planned to configure Storage DRS in Manual mode, instead of Automatic mode.

vCenter Alarms and Email Notifications

Configure the Actions for the following pre-defined alarms to send email notifications. I consider each of these alarms to be unexpected and worthy of immediate attention if they trigger in this specific vSphere environment. Unless otherwise stated, configure the Action to occur only when the alarm changes to the Red state.

Host connection and power state (alerts if host connection state = “not responding” and host power state is NOT = Standby)
Host battery status
Host error
Host hardware fan status
Host hardware power status (HW Health tab indicates UpperCriticalThreshold = 675 Watts, UpperThresholdFatal=702 Watts)
Host hardware system board status
Host hardware temperature status
Host hardware voltage
Status of other host hardware object
vSphere HA host status
Cannot find vSphere master agent
vSphere HA failover in progress
vSphere HA virtual machine failover failed
Insufficient vSphere HA failover resources
Storage DRS Recommendation (if the decision is made to configure Storage DRS in a Manual Mode)
Datastore cluster is out of space
Datastore usage on disk (Red state is triggered at 85% usage)
Cannot connect to storage (triggered if host loses connectivity to a storage device)
Network uplink redundancy degraded
Network uplink redundancy lost
Cannot connect to storage (triggered if host loses connectivity to a storage device)
Health status monitoring (triggers if changes occur to overall vCenter Service status)
Virtual Machine Consolidation Needed status (triggered if a Delete Snapshot task failed for a VM)

Consider creating these custom alarms on the folders where critical VMs. Optionally, define email actions on some of these.

Datastore Disk Provisioned (%) (set yellow trigger to 100%, where the provisioned disk space meets or exceeds the capacity.)
VM Snapshot size (set to trigger at 2 GB)
VM Max Total Disk Latency (set trigger at 20 ms for 1 minute)
VM CPU Ready Time – assign these to individual VMs or folders, depending on the number of vCPUs (total virtual cores) assigned to each VM