Troubleshooting Excessive Interrupts & DPCs
Posted by William Diaz on May 14, 2012
After logging onto my main home PC and opening IE, I noticed lag while repositioning the window around the screen. I opened the Task Manager, sorted by the CPU column and saw no single process reporting excessive usage:
Nor was the hard disk light blinking or solid. However, looking at the Performance tab revealed two of the CPU cores hovering around 100%:
This is not actually being caused a phantom process; the real culprit is excessive Interrupts, which means some hardware resource is spending too much time trying to get the attention of CPU. To see this, you can use Process Explorer:
The odd thing here was that I didn’t make any hardware changes to the system and, as far as I knew, nothing was installed since the last time I used it (although the kids are known offenders to clicking yes to everything they encounter on the Internet). Earlier, though, I did move a couple memory modules to different slots, but moving them back did not correct the problem I was encountering.
This is a Windows 7 OS, so to find the culprit, I turned to the Windows Performance Toolkit1; its part of the Microsoft Windows SDK for Windows 7 and can be downloaded from http://www.microsoft.com/en-us/download/details.aspx?id=8279. After running the setup, check the option to install the toolkit. I already had it installed since I make it a practice to install the Windows Debugging Tools on all my systems via the SDK package and usually elect this option. Also known as Xperf, the performance toolkit is command line tool that uses Events Tracing for Windows to collect kernel and application data that can then be analyzed graphically via the Windows Performance Analyzer.
I’ve actually only used the WPT a couple times so I needed to pour through some of the documentation to find the specific command line syntax for tracing kernel events. Luckily, MSDN already sample-documents Stack Walking for kernel events. To start tracing, open an elevated command prompt and type the following to begin the trace:
xperf -on latency -stackwalk Profile
You will encounter the following dialog after running this command if DisablePageExecutive is disabled:
I don’t know if it is really necessary to disable this or not (read here for a short MSDN article on DisablePageExcutive). The trace will still run if you elect not to make this change. If you do edit the registry as outlined above, don’t forget to change this back when done.
If the Interrupt-DPC usage is fairly intense and consistent, the trace only needs to be run for a short time. In my case, I left it running for about 40 seconds. Close any unnecessary applications to mitigate background “noise”. To stop the trace, use the following command:
xperf –stop -d Interrupt.etl
This stops the trace and saves the trace file as Interrupt.etl. By default, the trace is saved in the root of C: but you can create a folder elsewhere and save your trace files there instead by changing to the desired directory before running the trace or pointing the directory when you stop the trace. When done, navigate to the location of the etl file and double-click it to open it in the Windows Performance Analyzer GUI. Before starting, like the Procexp, Procmon, and WinDbg, configure WPA to point to the Microsoft Symbol location and/or local symbol cache if you already have them stored locally. You can do this from the Trace > Configure Symbol Paths > _NT_SYMBOL_PATH. MS Symbol server path:
When the trace opens, a couple passes will be run. By default, the trace included 17 different provider types. I could have just limited the trace to Interrupts and DPs but I maybe I might want to look at some other providers later. If you did want to just limit the trace to Interrupts and DPC, you can look up their names (kernel flags) by running the following command xperf –providers kf:
You can then use Xperf.exe –start "NT Kernel Logger" –on INTERRUPT+DPC –f filename.etl –stackwalk Profile in place of the earlier command (Xperf documentation is hard to come by, so the use xperf command line help for more info or search MSDN).
Once open, start by removing all the providers except Interrupts and DPCs from the Graphs menu option or by clicking the left side arrow and unchecking the unwanted providers:
When done, right click in either graph and select Summary Table. This will generate a report tree listing all the hardware resources utilizing the CPU. IN my case, the summary shows a lot of USB activity:
This is somewhat of a disappointment because all that I have attached is a mouse, kb, and headphones. I was not moving the mouse during the trace so this could not explain the high count for USBPORT.SYS; and even if I were, this would not cause such a high number of interrupts. To troubleshoot, I removed each of the devices but the two CPU cores remained pegged at 100%. I also tried disabling the USB controllers and hubs via the device manager to no avail (avoid doing this to each item before re-enabling the previously disabled one otherwise you will not have both mouse or keyboard at the same time and wont be able to re-enable the other) .
With noting else to go on, I went back into the case and started poking around. To make some room for my hands, I unplugged one of the case fans from the three-pronged case fan connector on the motherboard and reseated all the other cables except for the case fan and started up the system. To my surprise, the CPU was back to normal. Loose cable? I thought so, so I shut the system down, plugged the fan back into the motherboard, powered the system back on, and thought all was fixed. To my surprise, the problem was back.
Ack! But in hindsight I had just realized what was causing the problem: the case fan and where I was plugging it into. In my rush to button up the case when I had moved the memory modules earlier, I had plugged the case fan into a different motherboard fan connector, CS_FAN. Up until then, I was actually using the power fan connector, PWR_FAN, because it was easier to access and my power supply doesn’t use this anyway.
For the heck of it, here is a picture of the problem connector:
The one I had been using up until earlier in the day when I opened the case to move the memory modules to different slots:
The interesting thing here is how something so benign as plugging a case fan into a certain motherboard connector could cause excessive CPU interrupts. Although bad USB hardware could be at fault, this case demonstrates that you can’t necessarily blame a USB device on its own (or even the USB subsystem) without taking into consideration all the other hardware components involved, including a lowly case fan. Luckily for me, I was able to notice this only because I remembered the other available case fan connector, but you can imagine how frustrating this might be for someone who is focusing only on a USB issue.
Now, if this were a new system I had just put together, I may have taken a different approach by, for example, updating the BIOS first (which I should do anyways). But even with all that I know from building my own systems for years, this is one thing thought I would not have considered in the first place, and leave me possibly with several days of a mystery on my hands. So, the lesson here is treat all hardware as suspect even when your troubleshooting tools might point to some completely different driver or hardware subsystem.
By the way, I should mention this is a ASUS M3A motherboard and it has been nothing but pain for me since putting it together a few weeks ago. Yeah, I know its old but I recycle as many computer parts as I can.
1 Windows XP users can use the Kernrate Viewer.