Windows Explored

Everyday Windows Desktop Support, Advanced Troubleshooting & Other OS Tidbits

Troubleshooting Office 2007/2010 Files

Posted by William Diaz on December 13, 2010

The Office 2007/2010 native file format is now xml based. What does this mean to you troubleshooters? Earlier versions of Office files, for example Word, saved all the formatting, text, images, etc as a single binary file. With the latest Office offering, all those document “elements” are now created as xml files and then compressed into a single file. To see this, take any docx, pptx, xlsx file and change the extension to zip. When you open with Windows compressed files or WinZip, you will see different folders that contain several xml files, which contain the formatted text, pictures or other media. Here is an example of one of my blog Word 2010 articles dissected:

Document.xml is the main body of the docx file here. By default, xml will open with IE. Here is a sample of what it looks like in its raw form:

The images inside are stored under the media folder:

This would be helpful when dealing with a document that fails to open for the purpose of recovering text or images. However, you may also get some clues to the problem when you try to open the file. For example, you may encounter the following error message when trying to open a file: “The file filename.docx cannot be opened because there are problems with the contents.”

Clicking details reveals where to look for the problem. You can then use an xml editor (Notepad++) to attempt to correct. In this case, I went to Line 1, Column 55. You don’t need to be well versed in xml to figure out what may be at fault here. For this demonstration, I intentionally “sabotaged” the xml code by removing a closing “greater than” (>) symbol:

After inserting the missing “>”, save the xml file and drag it back into the zip. The document should open now. Of course, the vast majority of document problems may not be that easy to recognize. Here’s an example of a document where the error is unspecified and trying to make sense of where column 0 is or what this means is beyond me:

In this case, a workaround was to use latest Open Office Writer offering (free), which supports xml documents and open it with Writer, save as Word 2007/10, then opening with Word 2007/10 and reconstructing it , albeit with some loss in formatting. Another thing you can try (although it did not work in the last example) would be to change the extension to doc and/or open with Word 2003 and hope the Office converter pack could recover as a single binary file. Last, after encountering this issue again, I was able to recover with Windows 7 offering of WordPad as it now supports xml. From here, it could then be saved to a binary file type format.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: