Skip to content

4 Node VSAN Cluster: RAID-1 vs RAID-5

February 3, 2020

In the past few years, I have encountered a specific scenario several times concerning different customers who are looking to reduce VSAN storage consumption in a 4 node cluster by migrating VMs to use a RAID-5 (Erasure Enclosure) policy from the RAID-1 (Mirror) policy. Here is a brief statement summarizing my opinion on the topic.

You should reexamine the requirements and decisions that were made during the design of the cluster.  The decision to configure a 4 node cluster with a specific set of cache drives and capacity drives are typically based on requirements to deliver to a specific amount of usable storage with a specific level of availability.  It coincides with a decision to apply a VSAN RAID-1 policy to the VMs

The VSAN RAID-1 policy means that Failures to Tolerate (FTT) = 1 and Fault Tolerance Method = performance.  VSAN RAID-1 means that for each data item written to capacity drives in one ESXi host, a duplicate is placed on a second host.  The minimum number of hosts required in a VSAN RAID-1 cluster is three, due to the need for an odd number of nodes for quorum.  VMware recommends having N+1 nodes (4) in a VSAN cluster to allow you to rebuild data (vSAN self-healing) in case of a host outage or extended maintenance.  In other words, whenever a host is offline for a significant amount of time, you can rebuild data and be protected in case of the failure of another host.

You can elect to use VSAN RAID-5 (Erasure Coding) on all or some VMs in the cluster to reduce the used VSAN space.  VSAN RAID-5 means FTT=1 and Fault Tolerance Method = Capacity.  Its required minimum number of nodes is 4, which your cluster has.  But, VMware recommends at least 5 (N+1) nodes to allow you to rebuild data due to host outage or extended maintenance.  Your cluster does not meet VMware recommendation for RAID-5.

If you do elect to use VSAN RAID-5 in the 4-node cluster, be aware of the risk during periods of a host outage or extended maintenance.  In other words, whenever a host is offline for a significant amount of time, you will not be able to rebuild data and you are not fully protected in event of the failure of another host. If you decide that the risk is acceptable for some subset of your VMs and not for others, you can apply the VSAN  RAID-1 and RAID-5 policies accordingly.  If you want the benefit of reduced storage consumption but want to maintain the current level of availability, consider adding a 5th node to the cluster prior to implementing VSAN RAID-5.

Reference:  https://blogs.vmware.com/virtualblocks/2018/05/24/vsan-deployment-considerations/

From → Uncategorized, vTips

7 Comments
  1. Jose permalink

    Hi, if you have a cluster with for instance 10 ESXi, policy RAID-1 with one FTT, a lot of free space and fail one host, the component will be rebuilt in another host of the cluster. In this case, is it possible another ESXi fail in the cluster? Thax

    • Yes. In your scenario, after the rebuild, assuming you now have 9 surviving hosts and plenty of space, your VMs should be fully protected again and ready to survive another single host failure.

  2. Jose permalink

    Thanks johnnyadavis. So, how many failure hosts could support the cluster? Are there any way to do this calculate?

    • The number of supported hosts failures is a per-VM consideration. At any moment, in a healthy environment, all of your VSAN-stored VMs should be compliant with their failures to tolerate (FTT) settings. (You can use the vSphere Client to check compliance and make use of associated alarms that are triggered when an object is not in compliance.) In this case, you can expect each VM protected with FTT=1 to tolerate a single host failure. You can you can expect each VM protected with FTT=2 to tolerate the simultaneous failure two hosts. You should NOT expect VMs protected with FTT=1 to tolerate the simultaneous failure two hosts. (Depending on which hosts failed and some other factors, some FTT=1 VMs can survive the failure of two specific hosts, but you should not expect it.)

      You should first size and configure your cluster to support the FTT levels that you need for your VMs, prior to deploying the VMs. Then, as you can deploy your VMs, you can configure different FTT per VM.

      For example, I used the vSAN Sizer today for 50 VMs with a consistent, specific set of specs, chose DELL hardware, and ran it multiple times changing the RAID type and FTT. I got the following results:
      RAID-1 – FFT=1: results = 6 hosts
      RAID-1 – FFT=2: results = 7 hosts
      RAID-1 – FFT=3: results = 9 hosts
      RAID-5 – FFT=1: results = 6 hosts
      RAID-6 – FFT=2: results = 8 hosts

      NOTE: FTT=3 requires RAID-1

      If you build a cluster that meets or exceeds the vSAN recommendations, the you should expect each VM that is compliant with its storage policy to be protected against host failures per its FTT setting. For example, if a VM is set with FTT=2 and the VM is compliant with its storage policy, then that VM should survive 2 simultaneous host failures.

      Details: https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vsan-planning.doc/GUID-1EB40E66-1FBD-48A6-9426-B33F9255B282.html
      vSAN Sizer: https://storagehub.vmware.com/t/vmware-r-vsan-tm-design-and-sizing-guide-2/vsan-sizing-tool/
      new vSAN Sizer (requires a login): https://vsansizer.vmware.com/login?src=%2Fhome

      When you attempt to place a host into maintenance mode, the wizard will prompt you concerning evacuating data. If you successfully, fully evacuate, then all your VMs should still be fully protected at their FTT level, but the operation could take a long time. If you choose not to fully evacuate, the wizard warns you concerning the number of VMs that will become inaccessible and the number of VMs that will become non-compliant.

  3. Jose permalink

    Many thanks! You have solved my doubt. Very good explanation!

  4. Pablo permalink

    Thanks johnnyadavis.

    We have a vsan cluster with 5 nodes. We have two vsan policys: RAID 1 and RAID 5, but we need more storage, and we are thinking about change all VMs to RAID 5 policy. The performance (I/O) is similiar in vSAN RAID1 and vSAN RAID5? Doyou think is a good idea?

    Thanks

    • Well, it at least seems reasonable to consider your proposal to migrate some VMs from RAID 1 to RAID5; but, I do not want to give professional advice based on a small amount of data. For many of my customers, whenever multiple storage policies meet the performance and other requirements for a specific VM, we typically apply another factor (such as storage space usage or cost) when making the policy decision.

Leave a Reply to johnnyadavis Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: