Domain Controllers and Snapshots

I was sharing some tips on Active Directory yesterday when the person I was speaking to mentioned that they would take a snapshot of their domain controller before applying the changes we were discussing.  The conversation came to a dead stop.   I could actually hear the robot from Lost in Space wailing “Danger Will Robinson! Danger!” 

My friend is running his DC’s in VMWare, as am I.  That wasn’t the problem.  I actually prefer to  run DC’s as virtual servers, but there is one golden rule you must always keep in mind if you’re going to do that.  NEVER take a snapshot of a domain controller.

As you would expect, he was alarmed by my reaction, so I explained the problem. 

Update Sequence Numbers

You see, some directory services use timestamps to track changes that need to be replicated to other systems.  Newest timestamp wins when there is contention on which change should apply.  Active Directory uses a different approach.  Instead of timestamps, AD uses Update Sequence Numbers (USN’s). 

An exhaustive explanation of the process would be pretty interesting, but isn’t really germane to this conversation.  If you’re interested in reading up on it you can find a great explanation here.   For now, I’ll provide a high level overview.  Basically, there are three key components to the replication process you should understand: the High-Watermark value, the Upd-to-Dateness Vector, and the Database Identity.

High-Watermark Value

Each DC keeps track of changes using an local USN counter.  The USN is incremented whenever changes are made to an object and stored in a usnChanged attribute.  When another DC requests an update from the source DC, the latest USN from the source is passed along to the destination.  This USN record is referred to as the High Watermark value.  The next time it requests an update it sends that High Watermark USN value back to the source DC.  The source DC only sends over information newer than the High Watermark USN. 

Up-to-Dateness Vector

The destination DC also keeps track of a value for each DC that it has ever replicated with.  This Up-to-Dateness Vector is also passed back to the source DC when replication is requested.  After reducing the scope of objects to replicate using the High-Watermark value, the source DC can further reduce the replication set by using the Up-to-Dateness vector to determine which attributes in that set should be replicated. 

Database Identity

Each DC has its own server identity, but each instance of the AD database also has a Database Identity, stored as an InvocationID.  The server identity, never changes, but the InvocationID does change IF AD is properly restored from backup.  For now, suffice it to say that the destination DC keeps track of the source DC’s InvocationID.   

USN Rollback (a.k.a. Why Snapshots are evil)

Here’s where things go awry.  Lets say DC1 makes changes to a user account.  Those changes are tracked using a USN.  Now lets say that DC2 requests a replication from DC1.  It passes back the High-Watermark value and Up-to-Dateness vector that it has on record for DC1.  DC1 uses that information to determine a replca set and passes that information on to DC2.  For the sake of argument we will say that DC2 now has a High-Watermark value of 10, an Up-to-Dateness value of 100, and an InvocationID value of X on record for DC1 (that’s an oversimplification, but is good enough for this explanation).

At this point, we go off and take a snapshot of DC1.  Everything seems ok, so we make some more changes, and those changes are replicated.  Lets assume that DC2 now has a High-Watermark value of 20, an Up-to-Dateness value of 200, and an InvocationID of X on record for DC1.

Now lets say we revert to a previous snapshot.  This is where the InvocationID comes into play.  When we revert to an older snapshot, our High-Watermark and Up-to-Dateness Vector values go back to 10 and 100 respectively.  If we restored AD correctly then our High-Watermark and Up-to-Dateness Vectors would still have reverted, but our InvocationID would have changed.  DC2 would detect the change and replicate everything correctly.  By reverting to a snapshot, we circumvent that process.  The values are decremented, but the InvocationID stays the same.  We are now in a USN Rollback state, described in greater detail here.  DC2 detects the problem and DC1 is isolated from replicating data to the rest of the domain to preserve database integrity.

Practically speaking, there is really only one fix in this situation.  Demote DC1 and then promote it again.

Conclusion

And that, my friends, is why we don’t snapshot our DC’s, but there is a light at the end of the tunnel.  Server 2012 will be implementing a solution to this problem called the VM GenerationID that will trigger a reset of the InvocationID after reverting to a snapshot.  It looks like it will only be available out the gate for Hyper-V, but they are supposedly working with other vendors on implementing the solution.  For now, however, don’t do it.  It’s a really really really bad idea.

Advertisements
    • Omar
    • June 10th, 2012

    Nice post

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: