Skip to content

4 Node VSAN Cluster: RAID-1 vs RAID-5

February 3, 2020

In the past few years, I have encountered a specific scenario several times concerning different customers who are looking to reduce VSAN storage consumption in a 4 node cluster by migrating VMs to use a RAID-5 (Erasure Enclosure) policy from the RAID-1 (Mirror) policy. Here is a brief statement summarizing my opinion on the topic.

You should reexamine the requirements and decisions that were made during the design of the cluster.  The decision to configure a 4 node cluster with a specific set of cache drives and capacity drives are typically based on requirements to deliver to a specific amount of usable storage with a specific level of availability.  It coincides with a decision to apply a VSAN RAID-1 policy to the VMs

The VSAN RAID-1 policy means that Failures to Tolerate (FTT) = 1 and Fault Tolerance Method = performance.  VSAN RAID-1 means that for each data item written to capacity drives in one ESXi host, a duplicate is placed on a second host.  The minimum number of hosts required in a VSAN RAID-1 cluster is three, due to the need for an odd number of nodes for quorum.  VMware recommends having N+1 nodes (4) in a VSAN cluster to allow you to rebuild data (vSAN self-healing) in case of a host outage or extended maintenance.  In other words, whenever a host is offline for a significant amount of time, you can rebuild data and be protected in case of the failure of another host.

You can elect to use VSAN RAID-5 (Erasure Coding) on all or some VMs in the cluster to reduce the used VSAN space.  VSAN RAID-5 means FTT=1 and Fault Tolerance Method = Capacity.  Its required minimum number of nodes is 4, which your cluster has.  But, VMware recommends at least 5 (N+1) nodes to allow you to rebuild data due to host outage or extended maintenance.  Your cluster does not meet VMware recommendation for RAID-5.

If you do elect to use VSAN RAID-5 in the 4-node cluster, be aware of the risk during periods of a host outage or extended maintenance.  In other words, whenever a host is offline for a significant amount of time, you will not be able to rebuild data and you are not fully protected in event of the failure of another host. If you decide that the risk is acceptable for some subset of your VMs and not for others, you can apply the VSAN  RAID-1 and RAID-5 policies accordingly.  If you want the benefit of reduced storage consumption but want to maintain the current level of availability, consider adding a 5th node to the cluster prior to implementing VSAN RAID-5.

Reference:  https://blogs.vmware.com/virtualblocks/2018/05/24/vsan-deployment-considerations/

From → Uncategorized

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: