Process Monitor & the BSOD
Posted by William Diaz on August 24, 2010
Very rarely do I ever experience a Blue Screen of Death. In fact, I can’t recall the last time I did, so it was worth taking a photo of:
Normally, I look forward to troubleshooting system or application crashes. I like it even more when I can reproduce the crash on my workstation. I have many troubleshooting tools at my disposal and can start digging down into the problem without the need to set anything up. However, one of these tools, Process Monitor, would bug check my workstation shortly after starting it.
Sometimes you get lucky and the Stop code (0x0000008E…) could be of help by simply searching for it on the Internet. But no luck, my search was coming up blank. I dived into C:\Windows, located the MEMORY.DMP file and double-clicked the file to launch the WinDbg*. When you open a kernel dump, the !analyze –v command is waiting for you automatically as a hyperlink command:
I waited for a few seconds for the symbols to load:
When reading a dump, gather some basic information. Start with the type of bug check, e.g. KERNEL_MODE_EXCEPTION_NOT_HANDLED_M in the excerpt above. Use WinDbg help to gather some details about the bug check. In this case, the bug check “indicates that a kernel-mode program generated an exception which the error handler did not catch.” I suppose this means that WinDbg’s heuristics cannot definitively isolate what caused the system to bug check.
Going down further, we have the exception code:
c0000005 tells us this is an access violation, which occurs when a process attempts to access a portion of memory assigned to another application, or an unused memory area, without having permission to do so. This can be caused by any number of things, from bad RAM to buggy drivers and applications.
Further down, we have the call stack:
You can see that PROCMON20 (the process monitor sys file) is called right before passing off to nt to bug check. All the other modules being called here were Microsoft so it didn’t seem to be an issue with any 3rd party device drivers or application components. Further down, the debugger points to the faulty module as memory corruption:
When analyzing, be wary of false-positives. You will always see them. Although memory can go bad at any time for any reason, this was not likely the cause. To be sure, though, I replaced the RAM sticks and tested the issue again. After starting Process Monitor, the system crashed again and I opened the dump created in C:\Windows\Minidump. Stack-wise, it was pretty much the same as the previous dump, but with offender listed this time as Process Monitor:
To confirm I didn’t have a bad image of Process Monitor running on my workstation, I ran it live from the SysInternals site and the system still crashed. With two false-positives, I would need to dig deeper to find the cause.
Several days had passed since I first encountered the problem. I would need to go back to the day I first started seeing this to find out what changes were made on my workstation and isolate the responsible culprit. Luckily, I had just the tool for this, the Change Analysis Diagnostic for Windows XP. I started the utility and selected the 18th of August as the date to go back and scan for changes. There was very little that had changed since and here is what I first noticed:
You can see that on the 19th a new file was installed in the windows\system32\drivers folder called SysTrace.sys, which is part of the so-called “MSN Flight Data Recorder” (yes, that’s a real Microsoft typo–”Microsof”):
Knowing I had just seen that in the stack above earlier, I went back to the dump and opened it:
Curious still, I went back to the Change Analysis report and scanned further up to gather more details and saw this:
If you are not familiar with Process Monitor, you should know it works with filters and plugs into the Windows system filter driver, fltmgr.dll. You can see this in the call stack tab of the properties of any operation:
It looks like the SysTrace.sys filter is conflicting filter driver Process Monitor is using and was causing the system to bug check.
Where did Systrace.sys and MSN Flight Data Recorder come from? Ironically, enough, from this earlier blog: Windows System State Analyzer. I had installed this utility to test it as a troubleshooting tool. It installs under the guise of the Software Certification Toolkit.
I uninstalled the toolkit from Add/Remove, but annoyingly I was still was running into the BSOD. It turns out that the SysTrace.sys file stays in place even when you uninstall the package it came with. To correct the issue, I had to manually delete the file.
*Tip: you can associate all dumps to the Windows Debugging Tools by going to the program directory for the Windows Debugging Tools and running windbg –IA from the command shell.