Your Hard Disk is Failing
Posted by William Diaz on October 5, 2011
Sometimes a BSOD is not a sign of a software issue but instead points to a hardware problem and might help explain the symptoms of bad system performance. That was the case recently when a user complained that she was having troubles trying to logon. The workstation was amazingly slow (can slow be amazing?) and then blue-screened on her randomly. My co-worker was handling this but he happens to sit right next to me and I jumped in when I heard the words “blue screen”. I unkindly interjected with “Lets get a minidump.” While he chatted her up, I went about getting her IP, connected via the UNC, went into C:\Windows\Minidump and grabbed the last two mini dumps for that day.
Minidumps excite me. To understand why, you need to have come across a great amount of support calls that usually end up trumping first tier technical support. Often times, these issues are too vague to narrow down if you don’t know how to handle a BSOD, and the incident remains open longer than it needs because it can’t be explained or reproduced immediately. The mini dump provides a means to sometimes quickly resolve what might otherwise become an unexplained system problem.
Minidumps are small, too. Between 64 and 256KB, they only record the smallest set of useful information that could help identify why the system stopped unexpectedly so there would be no problem copying from over a WAN. Once copied over to my workstation, I opened with WinDbg, clicked the !analyze -v hyper command. Both dumps produced identical results:
A process or thread crucial to system operation has unexpectedly exited or been
Several processes and threads are necessary for the operation of the
system; when they are terminated (for any reason), the system can no
Arg1: 00000003, Process
Arg2: 8a760670, Terminating object
Arg3: 8a7607e4, Process image file name
Arg4: 805d29b4, Explanatory message (ascii)
unable to get nt!KiCurrentEtwBufferOffset
EXCEPTION_RECORD: ba1e79d8 — (.exr 0xffffffffba1e79d8)
EXCEPTION_CODE: (NTSTATUS) 0xc0000006 – The instruction at 0x%p referenced memory at 0x%p. The required data was not placed into memory because of an I/O error status of 0x%x.
ERROR_CODE: (NTSTATUS) 0xc0000006 – The instruction at 0x%p referenced memory at 0x%p. The required data was not placed into memory because of an I/O error status of 0x%x.
IO_ERROR: (NTSTATUS) 0xc000000e – A device which does not exist was specified.
BSODs of this type, especially when they happen more than once in a short period (along with sluggish system performance), are usually a clear sign of a failing or failed hard drive. And don’t necessarily rely on the Windows Event Viewer to warn you of this. In the case here, there were no reports of those common disk errors you see with bad blocks. Hard drives can fail for any number of reasons, not specifically because of surface issues on the platter; read/write heads and actuator arms are other points of failure. SMART monitoring may also alert you to this as well if it is running. Waste no time in replacing the drive and trying to retrieve the important data when this is encountered. In fact, you are probably better off removing the drive and attaching it to an enclosure to retrieve data than trying to get it pass a Windows reboot and logon.