Solving Windows 7 crashes with debugging tools
Tackle that blue screen of death with free Microsoft software
By Dirk A. D. Smith | Network World US | Published: 14:12, 18 April 2011
There are hundreds of commands to control WinDbg, it is a very capable tool. Fortunately... we only need one. To get fancy, we'll use two more, bringing the total to three.
They are !analyze -v, lmv, and lmvm. If you want to sound like this is not the first time you've used a debugger, here is how you pronounce the first command: "bang analyze dash vee".
Type !analyze -v on the command line at the bottom of the Command window (note the space between the command and the "-v"). The "v" or verbose switch tells WinDbg that you want all the details. The explanation it gives is a combination of English and programmer-speak, but it is nonetheless a great start. In fact, in many cases you may not need to go any further. If you recognise the cause of the crash, you're probably done.
Here's an example for the analysis of our crash using the NotmyFault driver.
An important feature of the debugger’s output using !analyze –v is the stack text. Whenever looking at a dump file always look at the far right end of the stack for any third party drivers. In this case we see myfault. Note that the chronologic sequence of events goes from the bottom to the top, as each new task is performed by the system it shows up at the top, pushing the previous actions down.
In this rather short stack you can see that myfault was active, then a page fault occurred, and the system declared a BugCheck which is when the system stopped (Blue Screened). Note that some data was removed to fit this exhibit on a page as indicated by the "truncated" comments).
Analysis with lmv
The next step is to confirm the suspect's existence and find any details about him. Typing lm in the command line displays the loaded modules, v instructs the debugger to output in verbose (detail) mode, showing all known details for the modules.
Don't worry if, after running the command lmv, you see the message *BUSY*in the bottom left of WinDbg's interface. This is because it is gathering detailed information for modules loaded when the system failed and it may take a couple of minutes. When done you will see kd> back where BUSY was.
This is a lot of information. Locating the driver of interest can take a while, so simplify the process by selecting:
Edit > Find
and enter the suspect driver, in this case myfault. The amount of information you see depends upon the driver vendor. Some vendors put little information in their files, others such as Microsoft tend to be thorough.
Analysis with lmvm
A great way to get right to a specific module is the lmvm command. In this case, enter lmvm myfault and the debugger will only return data specific to that module.
After you find the vendor's name, go to its website and check for updates, knowledge base articles and other supporting information. If such items do not exist or do not resolve the problem, contact them. They may ask you to send along the debugging information (it is easy to copy the output from the debugger into an email message or Word document) or they may ask you to send them the memory dump (zip it up first, both to compress it and protect data integrity).
The other third
Fortunately, in about two out of three cases you'll know the cause as soon as you open a dump file. But sometimes the information it provides is misleading or insufficient. What do you do then?
Sometimes it is the hardware
If you have recurring crashes but no clear or consistent reason, it may be a memory problem. Download the free test tool, Memtest86. This simple diagnostic tool is quick and works great. Many people discount the possibility of a memory problem, because they account for such a small percentage of system crashes. However, they are often the cause that keeps you guessing the longest.
Is Windows the culprit?
Sorry... this is NOT likely! As surprising as it may seem, the operating system is rarely at fault. If ntoskrnl.exe (Windows core) or win32.sys (the driver that is most responsible for the "GUI" layer on Windows) is named as the culprit, and they often are, don't be too quick to accept it. It is far more likely that some errant third party device driver called upon a Windows component to perform an operation and passed a bad instruction, such as telling it to write to non-existent memory. So, while the operating system certainly can err, exhaust all other possibilities before you blame Microsoft.
Wrong driver named
Often you will see an antivirus driver named as the cause. For instance, after using !analyze -v, the debugger reports a driver for your antivirus program at the line "IMAGE_NAME". This may well be the case, but bear in mind that such a driver can be named more often than it is guilty.
Here's why: For antivirus code to work it must watch all file openings and closings. To accomplish this, the code sits at a low layer in the operating system and is constantly working. In fact, it is so busy it will often be on the stack of function calls that was active when the crash occurred, even if it did not cause it. Because any third party driver on that stack immediately becomes suspect, it will often get named. From a mathematical standpoint it is easy to see how it will so often be on the stack whether it actually caused a problem or not.
Missing vendor information?
Some driver vendors don't take the time to include sufficient information with their modules. So if lmv doesn't help, try looking at the subdirectories on the image path (if there is one). Often one of them will be the vendor name or a contraction of it. Another option is to search Google. Type in the driver name and/or folder name. You'll probably find the vendor as well as others who have posted information regarding the driver.
Now that you have taken the time to prepare for the next BSOD, remember that in most cases you will be able to open the dump file and know the cause in less than one minute. To nail the cause of two out of three critical failures that fast and that easily is gratifying, especially to your users.