Zipity Do Da .Docx
Posted by William Diaz on December 28, 2010
It’s a mystery how some files get their file extensions changed or removed. I have no doubt that some users think that manually changing it makes it compatible with whatever program they want it to open in. We often get files that contain a .doc extension but fail to open in Word. Sometimes they are corrupted, sometimes they are in a newer version of Word, and sometimes they are not Word documents at all. When you run into these, the easiest way to determine what kind of file it is is to open it with Notepad (or any text reader for that matter) and look for readable text in the data. Here is an example:
If you assumed this is a TIFF image file, you would be wrong. The PK actually tells us this is a zip file (old-schoolers know this is PKZip, or Phil Katz, the name behind file compression). You will find this signature PK in all zip files.
Replacing the doc extension with zip allows us to open the file and reveal, among other files, a tiff image:
Looking at the PLIST file and xml file revealed this was compressed from a Mac. Perhaps further encoding afterwards changed the file extension.
By the way, Word 2007/2010 compresses files that are saved in the docx format. To see this, change the docx extension to zip and open with WinZip or Windows Compression and you can see all the different elements of a document. You can also change the extension to xml and after it opens in IE (or tries to) you will see a data signature for PK: