Saturday, January 19, 2008

Microsoft Office Binary File Format Specifications Coming to a Download Near You...

Brian Jones: Open XML Formats - Mapping documents in the binary format (.doc; .xls; .ppt) to the Open XML format

"I wanted to call everyone's attention to a few interesting developments in Ecma's proposed disposition document related to the Office binary formats. There were a few comments from national bodies that asked about the documentation of the Office binary formats and the availability of those documents. We had already been talking about these issues in TC45 where there were a number of existing experts in the binary formats (including Apple, Novell, and Microsoft). Based on the feedback from the national bodies, Microsoft decided last week to take some additional steps in this area.

The first issue National Bodies were interested in was easier availability of the documentation of the binary formats (.doc; .xls; .ppt). It sounded like the main concern here was around the extra steps required to get the binary documentation. The current form of the documentation has been available since 2006, where anyone could get the documentation by sending an email to Microsoft as described as http://support.microsoft.com/kb/840817/en-us. The documents were available royalty-free under RAND-Z....

...

  • Initiate a Binary Format-to-ISO/IEC JTC 1 DIS 29500 Translator Project on the open source software development web site SourceForge (http://sourceforge.net/ ) in collaboration with independent software vendors.  The Translator Project will create software tools, plus guidance, showing how a document written using the Binary Formats can be translated to DIS 29500.  The Translator will be available under the open source Berkeley Software Distribution (BSD) license, and anyone can use the mapping, submit bugs and feedback, or contribute to the Project.  The Translator Project will start on February 15, 2008. 
  • Make it even easier to get access to the  Binary Formats documentation by posting it and making it available for a direct download on the Microsoft web site no later than February 15, 2008.  The Binary Formats have been under a covenant not to sue and Microsoft will also make them available under its Open Specification Promise (see www.microsoft.com/interop/osp) by the time they are posted.

..."

The fact that the MS Office Binary File Format doc's were just an email away was news to me...sigh... Still even better, come the middle of next month I won't even have to send an email Microsoft to get those doc's.

I'll be interested to see the scope of this documentation. It a perfect world, it will provide me enough information that I could write code to spelunk Doc's, Xls's, Ppt's without Office, and do so in an "official, non-reverse engineering the file format" kind of way.

Kind of like taking the "openness" of OOXML (i.e. unzip it and it's all there... no hidden stuff) and extending that to the binary formats (with a ton of work of course, since a good bit of code will need to be written to parse and understand the binary files, which is where the Translator project comes in... :)

Added to my mental watch list...

(via Doug Mahugh - Binary documentation and translator project)

No comments: