Monday, April 23, 2012

Using native OpenXML to create an XlsX (Which provides an example of why I highlight tools that make OpenXML easier...)

CodeProject - Creating basic Excel workbook with Open XML

The purpose of this article is to describe how to create an Excel workbook using solely DocumentFormat.OpenXml.dll (namespace is DocumentFormat.OpenXml).

In order to test the samples you have to download and install the Open XML SDK 2.0 from Download Center.

The demo is created for both C# and Visual Basic.

These standards define the structure and the elements for the Office files. The Office files (like xlsx for Excel) themselves are zipped files that contain a specific directory and file structure. The files that hold the content of a spreadsheet are xml files like any other xml files.

In case of Excel files a basic xlsx file contains for example following files:

  • /[Content_Types].xml: Defines parts and extensions for the spreadsheet
  • /xl/workbook.xml: For e xample sheets that are included in the workbook
  • /xl/styles.xml: Styles used in the worksheets
  • /xl/sharedStrings.xml: Strings that are shared among cells
  • /xl/worksheets/sheet1.xml...: The actual worksheets

The actual package contains more files but in the scope of this article these are the most interesting ones. The demo projects included show few operations that are done to produce and modify these files.

About the project

The project itself is very simple. It consists of two classes: MainWindow class and a static Excel Class. The Excel class is responsible of all the operations done against the Excel spreadsheet. It's kinda utility class, but note that it's nowhere near ready. It's supposed to be used as a learning tool or a seed to an actual implementation.

When writing this demo I found out that Excel is very picky on the XML files. One surprise was that the order of the elements in XML files is very important. For example elements in style sheet such as fonts, fills, borders, cellStyleXfs, cellXfs etc must be in specific order. Otherwise the document is interpreted as corrupted.

Another observation was that the indexes of the elements are quite often used (for example the index of a shared string). However there is no support in the library to fetch the indexes so the collections have to be looped in order to calculate the index of a desired element.

So one of the best tools when building this was a utility to extract data from the xlsx (=zip) file to see what is the actual content.

image

Yes, that creates an XlsX. Stuff that anyone who has used the Excel Object Model will cause a minor brain explosion. Parts, Packaging, ShareStringTables, oh my...

This is a great example of why I keep my eyes open for examples and wrappers that make the promise of Open XML a little more accessible to mere mortals.

 

Related Past Post XRef:
Generating Xlsx's on the Server? You're using OpenXML, right? With help from the PowerTools for OpenXML?

No comments: