So, about 2 years ago I was with a customer who had opted to purchase UCS over their incumbent HP hardware for their private cloud build. As a first step, we upgraded the firmware on the UCS system. What I did not know at the time was that the mgmt0 cable plugged into the “B” Fabric Interconnect (FI) was showing link, but was not on the right vlan (or wasn’t passing traffic). When it came time in the upgrade to failover the management instance of UCSM to the “B side”, we lost access completely to UCS manager. This and other seemingly related events (but were actually totally unrelated in hindsight) led me to believe that UCSM had failed in some manner and started me down a multi-hour troubleshooting session that I really wished had never happened. I opened an enhancement request to allow UCSM to detect this situation in the future and move UCSM back to the originating FI if it is unable to find the default gateway. Had I known this trick that I am about to tell you concerning the UCS shells, I might have been smart enough to get out of my situation much faster. The sad thing is I actually did know this – it was just knowledge from so early on in my UCS learning curve that I didn’t fully absorb the importance of it. So, now is your chance to start absorbing…
If you have spent any time around UCS (and if you are reading this, you probably have), you know that there is a command-line interface in addition to the provided GUI. The actual “UCS” command line is the starting point “shell” that you are automatically in when you ssh to the UCSM Virtual IP (VIP). We’ll refer to this as the root shell for the purposes of this document. Although root is the main shell, there are many sub-shells available to you in UCS that accomplish various tasks. This post will focus on accessing two specific sub-shells, local-mgmt and NXOS. This article assumes you have knowledge of what each of these shells is for and will not discuss the details of these sub-shells, but will give you an understanding of how to navigate the root shell to gain access to these other sub-shells.
It helps if you think of the shells in hierarchical manner (such as the graphic above). As I mentioned, there are additional sub-shells beyond what are listed above, but NXOS and local-mgmt are by far the most-used, and they are unique in how you can access them. Because the root shell sits above the sub-shells of both fabrics, it allows you to access either sub-shell of either fabric (assuming you are connected to the UCSM VIP and not an individual FI). For instance:
Notice that I started out on Fabric B because that was the controlling instance (FI) of UCSM (you can flip the controlling instance back and forth without data plane disruption – a post for another day). While on Fabric B, I typed connect local-mgmt A. The UCSM root shell then connected me to the local-mgmt sub-shell on fabric A. Had I typed just connect local-mgmt (omitting the “A”), it would default to the fabric that the VIP is currently on (in this case, B). From the root shell, you can do the same type of connection to the NXOS sub-shell on either fabric as well. You cannot jump from a sub-shell to any other sub-shell. You must “exit” back to the root shell to enter any sub-shell.
Back to my bad day story…had I remembered this trick, how would I have avoided the issue? Well, I could always access the A Fabric Interconnect. From there, I could have run connect local-mgmt B and
accessed UCSM which was running just fine on Fabric Interconnect B, and flipped UCSM back to Fabric Interconnect A using local mgmt commands. The success in doing that would have instantly led me to the mgmt0 connection on the B fabric. Things like this are much easier to spot the second time around though – and I saw it again at a customer in production who had a faulty connection to FI-B. In that instance, fixing it was really easy (and they thought I was really smart – no, I didn’t tell them the truth).
That’s pretty much all there is to it. If you want to play around with the various other shells, you can type connect ? at the root shell and it will return all the possible devices you can connect to.
P.S. Ironically, the same day I wrote this article, I got a call from a co-worker who “could not connect back to UCSM after the primary FI rebooted during a firmware upgrade”. We used this trick (which he thought was way cool) and then discovered later that he had a flaky Ethernet cable in mgmt0 in the (formerly) subordinate FI. If you’re curious about why the enhancement I referenced above didn’t help here, it’s because the enhancement (mgmt0 interface monitoring) is enabled by default on all NEW installations but left at the previous setting on any UPGRADES (because change is a bad thing). I believe that change went into the 2.0 release.
Thanks for your time.