Proposal for Providing AD and DNS for SRM Test Failover
This is an idea that I have implemented for DR test purposes for a couple of customers. I have not seen this idea documented anywhere in the community, so I thought I would post it here for discussion.
For some of my VMware SRM customers, the network allows production VLANs and subnets to be extended to the recovery site, which allows us to keep the IP settings of VMs during a planned migration or disaster recovery. This certainly simplifies our DR planning. For example, DNS records do not have to be updated during a disaster recovery or planned migration, because the IP addresses of the VMs will not be changed. We can run a set of Active Directory (AD) controllers and Domain Name Service (DNS) servers at the recovery site, where they stay synchronized with their counterparts that run at the protected site. This means that current AD and DNS data is available at the recovery site and that SRM does not have to recover AD and DNS during a real disaster recovery.
However, some challenges may exist while performing non-disruptive test recoveries in this scenario, which typically requires isolated test network networks. The first concern is whether or not the IP settings for VMs must be changed during non-disruptive tests, to allow the original VMs and applications to continue to run undisturbed. Preferably, the test networks will allow us to keep the original IP settings of the VMs without being visible to the production network, where the IP addresses are currently in use. If this can be achieved, then the next issue is how to provide the required AD controllers and DNS servers for test purposes. Ideally, the AD controllers and DNS servers would provide current data (current at the moment the test began), would run in the test network with no concern that they can be seen by the production network, and would be easily removed after the test is complete.
To facilitate non-disruptive testing, we could use vSphere Replication (VR), a recovery plan, and the Test Recovery option in SRM to failover AD, DNS, and other required infrastructure severs to the test network, prior to using Test Recovery to failover any other test plan. This ensures that the test network has access to recently updated AD, DNS and other infrastructure servers, which are isolated from production networks during the test, but are available to the VMs being tested. These infrastructure servers can harmlessly be modified in any manner during the test period. After the test is complete, the Cleanup operation in SRM can be used to clean up the VMs involved both recovery plans.
The infrastructure (AD, DNS, DHCP, etc) VMs should Not be recovered when performing actual Disaster Recovery migrations. Instead, peers of these servers, which are kept consistent via the application (such as AD synchronization), should reside at the recovery site.
In some cases, the state of the AD controller could be inconsistent and produce errors when it is brought up at the recovery site during a Test Recovery operation. This could occur if the AD controller was in the midst of an AD synchronization when the VR replication occurred. In this rare case, just use the Cleanup operation and repeat the Test Recovery operation.
This proposal should only be implemented if the Test Network is completely isolated from the Production Network. It is acceptable that the test network be comprised of multiple networks (VLAN / subnets) that communicate with each other, as long as none of these networks can communicate with any production network.
Note-4: In most cases, only a single AD controller should be included in the test recovery. VR does not quiesce multiple VMs simultaneously, even if they are part of the same protection group, so two or more domain controllers may not be in sync if recovered together.