##########
openxmllib
##########

openxmllib is a set of tools that deals with the new ECMA 376 office
file formats known as OpenXML.

http://www.ecma-international.org/publications/standards/Ecma-376.htm

OpenXML format is actually used by Microsoft Office 2007. Apple
iWork'08 and OpenOffice 2.2 have filters to use this format too.

Features
########

Tested features
===============

* Extract words from a document for indexing purpose.
* Get metadata from a document

Planned features
================

* Transform a document to HTML

Public API
##########

  >>> import openxmllib
  >>> doc = openxmllib.openXmlDocument('office.docx')
  >>> # Raises a ValueError on not supported office files.
  >>> doc.mimeType
  'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
  >>> doc.coreProperties # Keys may depend on application
  {'title': u'blah...', u'creator': u'John Doe', ...}
  >>> doc.extendedProperties # Keys may depend on application
  {'Words': u'312', 'Application': u'Your favorite word processor', ...}
  >>> doc.customProperties # May return an empty mapping
  {'My property': u'My value', ...}
  >>> doc.allProperties # Merges core+extended+custom properties (see above)
  {...}
  >>> doc.indexableText(include_properties=False)
  u'all the words of that document body'
  >>> doc.indexableText(include_properties=True)
  u'all the words of that document body and all properties values'

Copying and License
###################

Copyright (c) 2008 Gilles Lenfant

This software is subject to the provisions of the GNU General Public
License, Version 2.0 (GPL).  A copy of the GPL should accompany this
distribution.  THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL
EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF TITLE, MERCHANTABILITY,
AGAINST INFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE

More details in the ``COPYING`` file included in this package.

Status
######

This software is in alpha quality, has been tested only on Mac OSX
with Python 2.4 and lxml 1.3.6.

It should work on other platforms, with Python 2.5, perhaps with
other versions of lxml.

Requirements
############

* lxml 1.3.6: get lxml with easy_install. e.g::

  $ easy_install lxml==1.3.6

Warning: openxmllib is untested with the new lxml 2 (alpha state when
writing this line). It may or may not work with this lxml 2, but
please don't report bugs found in such situation until lxml 2
officially required here.

Installation
############

$ python setup.py install

From now you can "import openxmllib" in your Python apps and use the
"openxmlinfo.py" command line utility.

Gotchas
#######

Be aware that most text data coming from the various openxmllib
services might be us-ascii or Unicode. This is a side effect of lxml
(bug or feature ?). It's up to your application to convert these texts
to the appropriate charset.

TODO: File this to lxml tracker or ML

We do not actually handle exceptions due to malformed XML or various
unexpected structures. You should handle the various (potential)
problems in a try (...) except (...) block in your application.

Testing
#######

Note that testing does not require the installation::

  $ cd tests
  $ python runalltests.py

Credits
#######

Gilles Lenfant <gilles dot lenfant at gmail dot com>
