Missing management NIC in Citrix XenServer

After a lengthy proof of concept we finally started our production deployment of Citrix XenDektop.  I’m a VMWare fanboy, but I really like anything that has to do with virtualization, so I was excited to get started on it.  To top it off, we’re deploying it in a Cisco UCS environment, which is pretty cool stuff. 

I started by deploying the Citrix XenServers themselves, along with the web and licensing servers, and built out my clusters.  In Citrix they are referred to as resource pools, but they’re really clusters.  Shortly after completing that, however, we realized that we had to upgrade a lot of the firmware on the UCS chassis and blades. 

I assumed there would be some risk, but the results surprised me, for after completing the firmware updates I found that the Citrix resource pools/clusters were inaccessible.  Going in through the UCS KVM manager I found that my management NIC’s had disappeared!  After a little research in the forums, and some trial and error, I found that I could recover the pool using the following method.  I thought I would record the process for posterity’s sake.

First, I needed to recover the master server in each pool.  To do this I logged in through the command line on the first pool’s master and tried to perform an emergency transition to master.  I found that I couldn’t, however, because HA was enabled. To disable HA on the Host I executed the following command:

xe host-emergency-ha-disable –force

Note the “-force” switch.  Also note that the command itself warns that this is a dangerous thing to do.  It doesn’t actually say why it’s dangerous though, and it did allow me to move on so I’m taking the warning with a grain of salt.  I tried the emergency transition to master again:

xe pool-emergency-transition-to-master

This time it worked.  I was able to see the management NIC again and view the host in XenCenter console.  On to the second host (there were three in all).  This time I started by disabling HA.

xe host-emergency-ha-disable –force

That worked as expected, so I moved on to the master transition:

xe pool-emergency-transition-to-master

That worked as well, but when checking the management NIC I found that it now had an old IP I had accidentally used earlier.  Weird. I wonder where this got cached?  Oh well. Move on.  I tried to change IP, but found that I couldn’t because external authentication was enabled.  This was because I had enabled AD authentication earlier.  Another thing to disable.  Fortunately, there is a command for that as well:

xe pool-disable-external-auth

This took a while to run and reported errors because it couldn’t disable external authentication on other systems it knew to be in the same cluster, but it actually did disable it on the host in question.  Now I could change the IP on the second host.  I still couldn’t join pool so I had to perform the emergency master transition again:

xe pool-emergency-transition-to-master

Now the second host thinks it’s the master so it won’t join the cluster… err… pool.  To fix this, I have to tell it to point to another master instead.  I find a command to do this, but it doesn’t work because couldn’t resolve hostname.  After some testing it seems to be network related, so I tried to restart management agent (xapi):

xe-toolstack-restart

I gave it 30 minutes to do its thing, then tried to reset the master again.

xe pool-emergency-reset-master master-address x.x.x.x

Woot!  That worked!  The second host appeared in XenCenter as a part of the cluster… umm… pool.  I repeated the same thing on the third host in the pool, and then the entire process on the second pool and I was back up and running.

    • Saadallah Chebaro
    • July 10th, 2012

    Gr8 article 🙂

  1. June 11th, 2012

Leave a comment