54 68 65 20 43 61 73 65 20 6f 66 20 74 68 65 20 4d 79 73 74 65 72 69 6f 75 73 20 53 79 73 74 65 6d 20 44 65 6c 61 79 73
Posted by William Diaz on November 14, 2010
The complaint: Excel would hang while opening. When it finally did come to life, it would keep getting hung doing almost any task. Often, these issues are troubleshot in a predictable manner: Repair Office, Uninstall Office, Reinstall Office, and, finally, take a shotgun to the user’s Windows profile and blow it away. Personally, I frown on these approaches; they only serve to make the user start from scratch while doing nothing in revealing the problem. Another downside to this approach is you may be dealing with something impacting several users/workstations, each being handled independently of an identical issue by the tech in the next cubicle. And, being that the problem was related to the user profile, specifically in HKCU, it is likely each one would have resulted in a Windows Profile recreation. If you have had to recreate a profile, these can be timely since not all setting’s in today’s large enterprise environments are copied into the roaming profile (think folder exemptions, for example).
I started my methodical approach after hours from home by connecting remotely to the user’s system, logging in as her, and copying an arsenal of troubleshooting tools to the user desktop, which included SysInternals Process Explorer, Process Monitor, and Autoruns. Unless I am going to use ADPlus to dump hung processes, I normally do not install the Windows Debugging Tools on a user’s system. To dump the hung process I would use Process Explorer. I could then copy the file to my workstation and examine it there with the WinDbg from the Windows Debugging Tools.
It wasn’t soon before I noticed other problems on the workstation:right-clicking the Start menu to open Windows Explorer would cause the operating system to hang for more than a minute.To make sure I was not dealing with a user session environment that might not be updated, patched, or was fragmented, I wanted to log the user off to make sure I could recreate the issue after a clean logon. The reason I do this is because I have seen several instances of user systems who’s only problem was they had not logged off or rebooted their systems in days or weeks because they either work so late that the scheduled nightly reboots are aborted, their PCs sit inside the No Reboot Container in Active Directory, or because they always lock their systems, preventing whatever waiting updates from being pushed to them. This is where I would notice another hiccup: Selecting Log Off from the Start menu was also lagging and did nothing for at least two minutes.
In short, I was dealing with 3 types of system hangs.Hangs opening and using Excel, hangs opening right-click context Start menu (which also applied to right-clicking files and folders), and delays trying to log off. I bet that I would see the same delay when I logged on as the user, which I did. These symptoms were probably all the result of a common cause.
To start, when troubleshooting ambiguous Windows issues I like to look for the obvious first. SysInternal Process Explorer is a good start. I am looking for any suspicious processes or resource hogs. I don’t see anything abnormal. So I quickly moved on anddecided to tackle the issue with the right-click context menu handlers (I never get tired of saying that, it sounds impressive). In short, context menus are shell extensions, which are part of the Windows shell, explorer.exe. You can see all the various context menu handlers loaded by opening Autoruns from SysInternals.Autoruns does more than look for startup items that reside in the Run key of HKCU, HKLC and the Startup folder (msconfig shows you only these). What I was looking for was any unapproved 3rd party context menu items. Autoruns organizes these by the registry startup location they are loaded from:
All the menu handlers here were either Microsoft or 3rd party that were part of our standard images so identifying the problem item would not be straight forward. To troubleshoot this we could uncheck handlers one by one to disable them and see which one(s) did not lead to a right-click hang, but there are a lot of handlers and each right-click leaves us hanging. Another faster way of finding the problem handler would be to recreate the hang and from Process Explorer right-click the explorer.exe process and select Create Dump. After I did this, I took the mini-dump I created and copied to my desktop where I opened with WinDbg. Hangs can be analyzed the same way as crashes but you want to add an additional switch so the command output is !analyze –v -hang:
The stack is pretty definitive. In this dump, I produced the hang by right-clicking the start menu, which starts with explorer (note its function as StartMenuContextMenu) calling another function in shell32.dll, which then calls WSContext.dll and function CRightClick. The only thing missing here were those impressive sounding words, “context menu handler”. I already knew the module being called here is for our document comparison software, Workshare Compare (DeltaView), which allows you to select documents and compare directly from the context menu. However, if you are not sure, use the lmvm command in Windbg to look up info on the module (or click the blue hyperlink module name):
To confirm this was our problem, I disabled the context menu handler (DeltaView) with AutoRuns:
I tested by right-clicking and the hang no longer surfaced. I re-enabled it again, the hang came back. Since it wouldn’t be practical to leave this disabled, the next course of action taken here to correct was to reinstall the application, which ended up resolving the right-click folder-file-Start menu hangs.
But the same was not true for Excel, itwas still hanging excessively. To get an idea of what was happening when Excel was getting hung, I went back to Process Explorer. The CPU was not taking a hit, which told me that Excel was waiting for something. To find out what, I double-clicked Excel.exe to view the properties and selected the Threads tab. There was a single thread started in Excel. I wanted to see what this thread was waiting on. To find out I selected the thread and then clicked the Stack button. Here is what I saw each time Excel became unresponsive:
You can see Mup.sys several times in the stack1. What is Mup.sys? To find out I searched the local drive, found it in c:\windows\system32\drivers, right-clicked and viewed the Properties.:
This is very revealing because it tells us that this sys file is used for Universal Naming Conventions, i.e. Excel was waiting for acknowledgement from across the network from some other system. What system was it trying to contact? To find out, I turned to another SysInternals tool, TCPView. After downloading and running (btw, if you don’t already know, you don’t need to install any of these tools, they are all “self-contained” exe’s) I opened Excel and waited for a new connection to open. New connections are denoted in green as they start and red as they close so it was easy to spot the connection:
I opened a command shell and attempted to ping ny3871a-xp. There was no reply. I checked Active Directory and could not find the system so it seems to have been decommissioned. Notice the state of the connection attempt, SYN_SENT–this is our hung workstation sending a “handshake” request to the remote system. It waits for a reply known as an ACK, or acknowledgement. Each of these connection requests was waiting about 20 seconds before closing and there were 5 connection attempts, one after the other, contributing to the 1-2 minute delay. When they closed, Excel was back to normal. To further test, I launched Excel again and this time manually closed the connection request by right-clicking and closing in TCPView, which took Excel out of it’s hung state.
I assumed the registry was the best place to look so I started here. No doubt I was going to find several references to ny3871a-xp. I didn’t want to guess where was the best place to look so I simply ran a search for the PC name in the registry. I found dozens of hits and methodically went through and deleted each one. After deleting all references (including searches for the IP address the PC name tried to resolve to) I rebooted.
To my dismay, during log off and logon, I noticed the same delays. I looked at the System Process in the TCP/IP tab of Process Explorer and saw the same connection attempts to ny3871a-xp:
Next I directed my attention to the file system. Maybe some program had created a local file that was telling Windows and Excel to connect to the remote system. I had taken screen shots of some of the not so common locations where the registry found these references, mainly a 3rd party search program.
A little bit of research revealed this program also searched remote systems that its host connected to and indexed files and folders. With that, I found a local cache of search index files in the user’s local profile folder in xml format that pointed to our non-existent remote system:
Convinced this had to be the problem, I deleted all the index and search files and even went into the application interface itself and deleted all the local cache files it pointed to. There was even a nice little GUI that had a button that allowed you to clear all registry keys, which I clicked. Amazingly, there were still noticeable shutdown and logon delays and Excel was still getting hung. In an attempt to get Excel responsive at a minimum, I uninstalled office completely and reinstalled … but to no avail.
Next, I tested the issue as test domain account and was happy, at least, to see the issue did not occur there, so I knew we were dealing with a profile issue and there had to be something I missed in HKCU. I turned to Process Monitor to assist, set a filter for excel.exe, but interestingly I didn’t see any signs of what was being queried in the registry when Excel was getting hung, there were only network operations showing connection attempts to ny3871a-xp . I tried again without the filter, looking at all processes except the ones that procmon already ignores by default, but still no luck. Later, I would realize why Process Monitor was failing me2.
At this point, I was close to assigning the issue back to our help desk and having them blow away the network and local profile of the user. But the thought of this really began to bother me. Like I said earlier, it does nothing but mask the problem and without ascertaining what created the problem in the first place, there is always the chance the user may end up recreating it. I waited a few hours and approached I the problem again and while researching the issue on the Internet, I came across a random registry image in an unrelated search hit, that of a REG_BINARY value in the registry. If you poke around a lot in the Windows registry, you may have run across these a few times. I had seen these many times while troubleshooting all flavors of issues in past troubleshooting experiences but never really paid attention to them, mostly because they are just a series of hexadecimal numbers. But every now and then, you see one of these data values with a column of plain text next to the hex data. Often, the text is a program name, file name or path, and even network paths to servers and, yes, you guessed it, other PCs you communicate with.
Now this is important because it turns out that the Windows Registry does not actually search text inside the data of a REG_BINARY itself. But what about the actual hex data in the value? Perhaps I could locate the non-existent system by its hex name. To find out, I needed to convert the PC name to hex. I used a hex to ASCII converter from the Internet and typed in the name of the missing computer name. Hex data is case sensitive so I made sure to convert the actual name found in the registry earlier, which was Ny3871a-xp. Here is what this name looks like when converted:
So I searched the registry for 4e 79 33 38 37 31 61 2d 78 70. No luck, the registry did not return a search hit for this hex data string! This is kind of misleading because the search box in the registry gives you this impression:
I was going to have to find the computer name by manually looking into the REG_BINARY data for its name3. The big question was “Where?” Now this may sound daunting considering the size of the registry. I could narrow this down though with some deductive knowledge. The issue could not be recreated under a test domain account so I knew I was dealing with something that resided in HKEY_CURRENT_USER. And if we go back to the beginning where we had delays with right-click context menu handlers and the file menu in Windows Explorer, then a good place to start would be the Explorer key under HKCU\Software\Microsoft\Windows\CurrentVersion. But there are a lot of subkeys under the Explorer key. You have to be a bit experienced (or lucky) figure out where to look then. Excel was likely tying to connect to a file on the remote system so file streams came to mind. Looking in the Streams\Desktop subkey of Explorer I found this:
There was the pesky offender! Or at least one of them. What was it doing in the Taskbar? It’s not uncommon for users to drag files into the Quick Launch for quick access; perhaps it was dragged from Explorer via a UNC or some other unknown application. The odd thing is that entries in this binary should not cause this behavior on their own. Something else likely was involved.
Further down in the data we can see the name of a file:
You can actually edit hex data inside the open binary value by simply highlighting the lines it is on and deleting, but this a somewhat forensic approach. At this point, I really didn’t care for preserving the data of one minor value so I just deleted the Desktop key altogether;iIt would do no harm and would be recreated. Additionally, since I were also dealing with shell extensions, why not clean house? I decided to next look up one level above inside the Shell key in HKCU\Software\Microsoft\Windows\. The key of my interest here was BagMRU. MRU translates into Most Recently Used. You see these all over the registry for all programs that you open and create files with.Within the BagMRU key are literally dozens of subkeys with dozens of other subkeys. I wanted to clean the bases, so I simply deleted all keys under BagMRU. They would recreate themselves as the user began working normally in Windows again.
The issue withwith the Workshare context menu handlers kind of made some sense now: it must have been looking at its own MRUs (or whatever equivalent) in the registry that pointed to Ny3871a-xp. When Workshare was reinstalled, these registry keys or values were recreated, overwriting the existing values in the registry.
With my work done in the registry, I opened TCPView and monitored for any attempted connection attempts to ny3871a-xp wile opening Excel and testing the Log Off from the Start menu, along with various other apps and was delighted to see the problem not return. In all I spent several hours on this one problem. Yes a profile recreation would have resolved all the same but nothing would have been gained (except some extra sleep) and this proved to be an excellent learning experience.As it turned out, the day after writing this, we did, in fact, run into a few more user workstation complaints reporting this issue in the same office, all attempting to connect to Ny3871a-xp. Over the course of the next two weeks a few more calls trickled in, and all the users were part of a group collaborating on the same project and used a 3rd party storage, search, and indexing application to access files.
Upon investigating further, the closest thing I could find to this MRU for Workshare Compare is a key called Right Click Files that contains a sub key to a file path. The key name ties itself to whatever file is right-clicked on where its context menu handlers are present:
I didn’t get a chance to examine this key on any more than one of the affected profiles on the workstations, however, I was unable to expand this key and to reveal the file path. I suspect that this is where the reference to Ny3871a-xp was coming from. I can only assume the above registry key was corrupted and incapable of being scanned. Deleting the key cleared up the right-click delays but I elected to reinstall the app afterwards to make sure any unseen residue problems were corrected.
1There is an old Mark Russinovich blog on a similar issue where mup.sys and some shell routines tipped him off to a similar problem. Read it here.
2The reason Process Monitor was not revealing the responsible operations was because the process initiating the operations was System. The System process itself is filtered out by default when Process Monitor is started, along with various other common file operations that you normally do not want in a trace.
3Since coming across other workstations exhibiting the same behavior, I discovered RegScanner from Nirsoft, which does search for the text data in Hex data.