=============
Pygenx Manual
=============

:author: Michael Twomey
:contact: mick@translucentcode.org
:copyright: Michael Twomey 2004
:license: http://software.translucentcode.org/pygenx/LICENSE (MIT style)
:version: 0.6
:Date: 2005-08-22

.. contents::

Overview
========

Pygenx_ is a Python_ wrapper for the Genx_ library. It is intended to
be a light weight way of generating correct, canonical XML with the
minimum of fuss.

Installation
============

Installation is done via the normal python distutils mechanism. After
downloading and unpacking the source tarball, pygenx can be installed
by following the following instructions::

  $ cd pygenx-0.6
  $ python setup.py build
  $ sudo python setup.py install

If you aren't using sudo you need to perform the install step as root
(or if you are installing to a python which is writable, omit the
sudo altogether).

Basic Usage
===========

Pygenx can be used for generating very trivial XML quite easily.

A simple example:

.. include:: basic_example.py
   :literal:

When run this should produce output like the following:

.. include:: basic_example.xml
   :literal:

In the above example a `genx.Writer`_ object is being created, then
elements are started, text written, and elements closed, before the
document itself is closed.

This example restricts itself to using the
`genx.Writer.startElementLiteral`_ 

More Advanced Usage
===================

Using `genx.Writer.startElementLiteral`_ is ok for simple cases, but
when you have multiple namespaces or many elements, it is both
inefficient and tedious to use. A better method is to pre-declare
namespaces, attributes and elements for later use. This allows genx
to perform in a more optimised manner.

Classes
=======

genx.Writer
-----------

All pygenx operation centres around the `genx.Writer`_ class, which
typically represents a single document, though there is nothing
stopping you creating a new document after you have finished working
on one. The only restriction with a Writer instance is that you work
on a single document at a time.

You could create a `genx.Writer`_ instance and configure the various
namespaces and elements, then re-use it for different documents. This
reduces the overhead required when writing the documents.

genx.Writer.addAttribute
~~~~~~~~~~~~~~~~~~~~~~~~

:params: `genx.Attribute`_ attribute, `String`_ value
:returns: `None`_

This adds the given `genx.Attribute`_ object to the currently active
element.

A simple example:

>>> import genx
>>> w = genx.Writer()
>>> attr = w.declareAttribute("href")
>>> fp = file("/tmp/text.xml", "w")
>>> w.startDocFile(fp)
>>> w.startElementLiteral("a")
>>> w.addAttribute(attr, "http://example.com/")
>>> w.endElement()
>>> w.endDocument()
>>>

This produces the following output::

  <a href="http://example.com/"></a>

genx.Writer.addAttributes
~~~~~~~~~~~~~~~~~~~~~~~~~

:params: `dict`_ attributes
:returns: `None`_

This is a simple convenience function which will iterate over the items in the
given dictionary, adding them using `genxAddAttributeLiteral` and the default
namespace.

This is currently more of an experimental method, it should be faster than
iteratively adding attributes from python code as the loop is executed in C.
This method doesn't support namespaces or use any declared Attribute objects.

A simple example:

>>> import genx
>>> w = genx.Writer()
>>> fp = file("/tmp/text.xml", "w")
>>> w.startDocFile(fp)
>>> w.startElementLiteral("a")
>>> w.addAttributes({"href": "http://example.com/"})
>>> w.endElement()
>>> w.endDocument()
>>>

This produces the following output::

  <a href="http://example.com/"></a>


genx.Writer.addAttributeLiteral
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:params: `String`_ name, `String`_ value, `String`_ namespace = `None`_
:returns: `None`_

This adds an attribute with the given name and value to the currently
active element. An optional namespace string can be passed in
too. This method is slower than using `genx.Writer.addAttribute`_
with a defined `genx.Attribute`_ object.

Typical use:

>>> import genx
>>> writer = genx.Writer()
>>> fp = file("/tmp/test.xml", "w")
>>> writer.startDocFile(fp)
>>> writer.startElementLiteral("elem")
>>> writer.addAttributeLiteral("attr", "value")
>>> writer.endElement()
>>> writer.endDocument()
>>> 

And the content of test.xml::

  <elem attr="value"></elem>

genx.Writer.addNamespace
~~~~~~~~~~~~~~~~~~~~~~~~

:params: `genx.Namespace`_ namespace, `String`_ prefix = `None`_
:returns: `None`_

Adds the given `genx.Namespace`_ object to the currently active
element, using an optional prefix.

A simple example with an element:

>>> import genx
>>> writer = genx.Writer()
>>> fp = file("/tmp/test.xml", "w")
>>> ns = writer.declareNamespace("http://example.com", "foo")
>>> writer.startDocFile(fp)
>>> writer.startElementLiteral("elem")
>>> writer.addNamespace(ns)
>>> writer.endElement()
>>> writer.endDocument()
>>> 

And the output in test.xml::

  <elem xmlns:foo="http://example.com"></elem>

genx.Writer.addText
~~~~~~~~~~~~~~~~~~~

:params: `String`_ text
:returns: `None`_

Adds the specified text to the currently active element. The text
will be encoded as `UTF-8`_, so ensure that the text has the correct
encoding (this can usually be achieved when reading in the string 
using the decode method, e.g. ``s = fp.read().decode('ISO-8859-1')``).

Basic usage:

>>> import genx
>>> w = genx.Writer()
>>> fp = file("/tmp/test.xml", "w")
>>> w.startDocFile(fp)
>>> w.startElementLiteral("elem")
>>> w.addText("some text")
>>> w.endElement()
>>> w.endDocument()
>>> 

The output::

  <elem>some text</elem>

genx.Writer.checkText
~~~~~~~~~~~~~~~~~~~~~

:params: `String`_ text
:returns: `int`_ status

This is a function for sanity checking strings. It checks to see if
the given string is valid UTF-8 and if it contains any invalid XML
characters.

The return codes are based on genx's error codes, currently I'm not
exposing these codes, so currently the three relevant return codes
are:

0
  The text is ok.

1
  The text is invalid UTF-8.

2
  The text is invalid XML.

Simple usage:

>>> import genx
>>> w = genx.Writer()
>>> w.checkText("This is a plain string")
0
>>> w.checkText("This is an invalid unicode string. \xff\x01")
1
>>> w.checkText("This is an invalid XML string\x01")
2
>>> 

genx.Writer.comment
~~~~~~~~~~~~~~~~~~~

:params: `String`_ comment
:returns: `None`_

This adds an XML comment (e.g. ``<!-- my comment -->``) in the
generated XML.

For example:

>>> import genx
>>> w = genx.Writer()
>>> fp = file("/tmp/test.xml", "w")
>>> w.startDocFile(fp)
>>> w.comment("A comment") 
>>> w.startElementLiteral("elem")
>>> w.comment("Another comment")
>>> w.endElement()
>>> w.endDocument()
>>> 

test.xml::

  <!--A comment-->
  <elem><!--Another comment--></elem>

genx.Writer.PI
~~~~~~~~~~~~~~

:params: `String`_ target, `String`_ text
:returns: `None`_

Adds an XML Processing Instruction (e.g. ``<?foo bar?>``) to the file.

For example:

>>> import genx
>>> w = genx.Writer()
>>> fp = file("/tmp/test.xml", "w")
>>> w.startDocFile(fp)
>>> w.PI("foo", "bar")
>>> w.startElementLiteral("elem")
>>> w.endElement()
>>> w.endDocument()
>>> 

Produces::

  <?foo bar?>
  <elem></elem>

genx.Writer.declareAttribute
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:params: `String`_ name, `genx.Namespace`_ namespace = None
:returns: `genx.Attribute`_

This creates a `genx.Attribute`_ object with the given name, and an
optional `genx.Namespace`_ object. This object can then be used with
`genx.Writer.addAttribute`_ calls to add attributes to the current
document.

A simple example:

>>> import genx
>>> w = genx.Writer()
>>> attr = w.declareAttribute("attr")
>>> ns = w.declareNamespace("http://example.com/ns")
>>> attr2 = w.declareAttribute("attr2", ns)
>>>

genx.Writer.declareElement
~~~~~~~~~~~~~~~~~~~~~~~~~~

:params: `String`_ name, `genx.Namespace`_ namespace = None
:returns: `genx.Element`_

Declare a new element, using the optional `genx.Namespace`_ object.

A trivial example:

>>> import genx
>>> writer = genx.Writer()
>>> elem = writer.declareElement("element")
>>> ns = writer.declareNamespace("http://example.com/ns")
>>> elem_with_ns = writer.declareElement("anotherelem", ns)
>>>

genx.Writer.declareNamespace
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:params: `String`_ uri, `String`_ prefix = None
:returns: `genx.Namespace`_

Declare a new `genx.Namespace`_ object, using the given namespace URI
and an optional prefix. Use a prefix of ``""`` to declare the the
default namespace.

A simple example:

>>> import genx
>>> w = genx.Writer()
>>> ns = w.declareNamespace("http://example.com/ns")
>>> ns2 = w.declareNamespace("http://example.com/ns2", "myprefix")
>>>

genx.Writer.endDocument
~~~~~~~~~~~~~~~~~~~~~~~

:params: none
:returns: `None`_

Finish writing the current document. When this is called all the
elements should have been previously closed using
`genx.Writer.endElement`_ calls. After this is called the
`genx.Writer`_ instance can be re-used with another file.

An example of re-using a `genx.Writer`_ instance:

>>> import genx
>>> w = genx.Writer()
>>> fp = file("/tmp/test.xml", "w")
>>> elem = w.declareElement("elem")
>>> w.startDocFile(fp)
>>> w.startElement(elem)
>>> w.endElement() 
>>> w.endDocument()
>>> fp2 = file("/tmp/test2.xml", "w")
>>> w.startDocFile(fp2)
>>> w.startElement(elem)
>>> w.startElement(elem)
>>> w.endElement()
>>> w.endElement()
>>> w.endDocument()
>>> 

The content of test.xml::

  <elem></elem>

The content of test2.xml::

  <elem><elem></elem></elem>

genx.Writer.endElement
~~~~~~~~~~~~~~~~~~~~~~

:params: none
:returns: `None`_

Finish writing the current element. This needs to be called to close
each corresponding element created with `genx.Writer.startElement`_ or
`genx.Writer.startElementLiteral`_ calls.

genx.Writer.scrubText
~~~~~~~~~~~~~~~~~~~~~

:params: `String`_ text
:returns: `String`_

This silently scrubs any invalid characters out of the given string.

A simple example:

>>> import genx
>>> writer = genx.Writer()
>>> writer.scrubText("A string")
'A string'
>>> writer.scrubText("A string |\x01|")
'A string ||'
>>> 

genx.Writer.startDocFile
~~~~~~~~~~~~~~~~~~~~~~~~

:params: `File`_ file
:returns: `None`_

This starts a new document using the given `File`_ object. It should
be called only once for each document. If called again on an active
document you will get a `genx.SequenceError`_.

.. Note::

   The file object passed in should either be a standard python
   `File`_ object, in which case the C FILE pointer it contains will
   be passed to genx's genxStartDocFile.

   If the object passed in is a normal python object, then it must
   have a `write` method and a `flush` method, which perform buffer
   style operations. An example of this object would be
   `StringIO.StringIO`_.

genx.Writer.startElement
~~~~~~~~~~~~~~~~~~~~~~~~

:params: `genx.Element`_
:returns: `None`_

Starts a new XML element using the given `genx.Element`_ object. This
is the preferred way to write elements into an XML document, as it is
faster to re-use premade `genx.Element`_ objects than to use
`genx.Writer.startElementLiteral`_ calls.

A simple example:

>>> import genx
>>> w = genx.Writer()
>>> elem = w.declareElement("foo")
>>> fp = file("/tmp/test.xml", "w")
>>> w.startDocFile(fp)
>>> w.startElement(elem)
>>> w.endElement()
>>> w.endDocument()
>>>

This writes::

  <foo></foo>

genx.Writer.startElementLiteral
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:params: `String`_ name, `String`_ namespace = None
:returns: `None`_

Starts a new XML element with the given name and an optional
namespace URI string. This is the most straight forward way to create
elements, but it isn't the fastest, and can get unwieldy when
compared to `genx.Writer.startElement`_.

A variation of the example in `genx.Writer.startElement`_:

>>> import genx
>>> w = genx.Writer()
>>> fp = file("/tmp/test.xml", "w")
>>> w.startDocFile(fp)
>>> w.startElementLiteral("foo")
>>> w.endElement()
>>> w.endDocument()
>>>

This writes::

  <foo></foo>

genx.Writer.unsetDefaultNamespace
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:params: none
:returns: `None`_

This clears the default namespace declaration. This is slightly
tricky to explain, probably easier to demonstrate.

.. include:: unset_default_namespace_example.py
   :literal:

Using xmllint and diff to compare the outputs::

  $ xmllint --format /tmp/test2.xml >test2.xml
  $ xmllint --format /tmp/test.xml >test.xml
  $ diff -u test.xml test2.xml 
  --- test.xml    2004-05-31 23:44:50.000000000 +0100
  +++ test2.xml   2004-05-31 23:44:45.000000000 +0100
  @@ -1,7 +1,7 @@
   <?xml version="1.0"?>
   <edef xmlns="http://default" xmlns:pref="http://pref">
     <e xmlns=""/>
  -  <pref:epref xmlns="">
  -    <e/>
  +  <pref:epref>
  +    <e xmlns=""/>
     </pref:epref>
   </edef>

The lines with the - characters in front of them represent text.xml
(with the unsetDefaultNamespace call) and the lines with the +
character in front represent test2.xml (without the call). As you can
see the unsetDefaultNamespace call forcibly resets the namespace.

genx.Attribute
--------------

This represents an XML attribute, which can be attached to any
`genx.Element`_. This is created using
`genx.Writer.declareAttribute`_.

genx.Element
------------

This represents an XML element, which can be used with
`genx.Writer.startElement`_. This is created using
`genx.Writer.declareElement`_.

genx.Namespace
--------------

This represents an XML namespace, which can be used with
`genx.Element`_ and `genx.Attribute`_ objects. This is created using
`genx.Writer.declareNamespace`_.


Functions
=========

genx.get_version
----------------

:params: none
:returns: `String`_

This returns the version of genx as reported by genx's
`genxGetVersion`_ function.

Exceptions
==========

These are normally thrown based on the status codes genx returns.

genx.AttributeInDefaultNamespaceError
-------------------------------------

This occurs when an attribute is declared or used in the default
namespace.

For example:

>>> import genx
>>> w = genx.Writer()
>>> ns = w.declareNamespace("http://example.com/ns", "")
>>> a = w.declareAttribute("a", ns)
Traceback (most recent call last):
  ...
AttributeInDefaultNamespaceError: 'Attribute cannot be in default namespace'
>>>

genx.BadDefaultDeclarationError
-------------------------------

Can't say it better than Tim:

  You tried to declare some namespace to be the default on an element
  which is in no namespace.

To trigger this:

>>> import genx
>>> w = genx.Writer()
>>> fp = file("/tmp/text.xml", "w")
>>> ns = w.declareNamespace("http://example.com/ns", "")
>>> elem = w.declareElement("elem")
>>> w.startDocFile(fp)
>>> w.startElement(elem)
>>> w.addNamespace(ns)
Traceback (most recent call last):
...
BadDefaultDeclarationError: 'Declared a default namespace on an element which is in no namespace'
>>>

genx.BadNameError
-----------------

This occurs when an invalid XML name is used.

For example:

>>> w.startElementLiteral("<foo")
Traceback (most recent call last):
...
BadNameError: 'Bad NAME'
>>>

genx.BadNamespaceNameError
--------------------------

This is raised when you try to declare a `genx.Namespace`_ using None
or an empty string.

Some examples:

>>> w.declareNamespace("")
Traceback (most recent call last):
...
BadNamespaceNameError: 'Bad namespace name'
>>> w.declareNamespace(None)
Traceback (most recent call last):
...
BadNamespaceNameError: None is an invalid namespace
>>>

genx.BadUTF8Error
-----------------

This is raised when invalid UTF-8 is passed to a genx call. However
it is unlikely that this will get raised, with python raising it's
own encoding errors before genx is reached.

genx.DuplicateAttributeError
----------------------------

This happens when you try to add an attribute with the same name to
an element more than once.

For example:

>>> w.addAttributeLiteral("a", "foo")
>>> w.addAttributeLiteral("a", "foo")
Traceback (most recent call last):
...
DuplicateAttributeError: 'Same attribute specified more than once'
>>>

genx.DuplicateNamespaceError
----------------------------

This occurs when you add the same namespace to an element more than once.

For example:

>>> w = genx.Writer()
>>> fp = file("/tmp/text.xml", "w")
>>> w.startDocFile(fp)
>>> w.startElementLiteral("a")
>>> ns2 = w.declareNamespace("http://example.com/2", "ns2")
>>> w.addNamespace(ns2)
>>> ns3 = w.declareNamespace("http://example.com/2", "ns3")
>>> w.addNamespace(ns3)
Traceback (most recent call last):
...
DuplicateNamespaceError: 'Declared namespace twice with different prefixes on one element.'
>>>

genx.DuplicatePrefixError
-------------------------

This is raised when two namespaces are declared with the same prefix.

For example:

>>> ns1 = w.declareNamespace("http://example.com/ns1", "ns1")
>>> ns2 = w.declareNamespace("http://example.com/ns2", "ns1")
Traceback (most recent call last):
...
DuplicatePrefixError: 'Duplicate prefix'
>>>

genx.GenxError
--------------

This is a catch all error, if after checking the various error codes
genx returns pygenx can't find a matching exception this is raised
with the error string included.

genx.IOError
------------

This usually occurs when genx has problems writing to the file, the
most common cause is some other part of the python code closing the
file object.

A typical example:

>>> import genx
>>> w = genx.Writer()
>>> fp = file("/tmp/test.xml", "w")
>>> w.startDocFile(fp)
>>> fp.close()
>>> w.startElementLiteral("example")
>>> w.addText("foo")
Traceback (most recent call last):
...
IOError: 'I/O error'
>>>  

In the above example the exception isn't raised until the
`genx.Writer.addText`_ call as genx hasn't tried writing to the file
yet, it only does so when the addText call is made.

genx.MalformedPIError
---------------------

This is raised when an invalid string is passed to `genx.Writer.PI`_,
usually when there is a "?>" in the string.

For example:

>>> w = genx.Writer()
>>> fp = file("/tmp/text.xml", "w")
>>> w.startDocFile(fp)
>>> w.startElementLiteral("a")
>>> w.PI("foo", "bar?>")
Traceback (most recent call last):
...
MalformedPIError: '?> in PI'
>>>

genx.NonXMLCharacterError
-------------------------

This is raised when a character which violates the XML 1.0 Character
rules is passed into genx. The string can be perfectly valid UTF-8
but still be invalid XML.

For example:

>>> w.addAttributeLiteral("bar", "text \x01")
Traceback (most recent call last):
...
NonXMLCharacterError: 'Non XML Character'
>>>

genx.SequenceError
------------------

This is the most commonly seen error, it occurs when a calls are made
in an incorrect order.

For example, this code closes the document before closing the
element:

>>> import genx
>>> w = genx.Writer()
>>> fp = file("/tmp/text.xml", "w")
>>> w.startDocFile(fp)
>>> w.startElementLiteral("foo")
>>> w.endDocument()
Traceback (most recent call last):
...
SequenceError: 'Call out of sequence'
>>>

Developing pygenx
=================

Pygenx is written as a `Pyrex`_ wrapper to `Genx`_. I'm using setuptools_ to 
package everything up (it greatly simplifies things like handling pyrex).

About this manual
=================

This manual is written in restructured text, and converted to HTML
using Docutils_. Docutils has proven to be a joy to use, especially
for python programming, I'd recommend it to anyone.

.. _Pygenx: http://software.translucentcode.org/pygenx/
.. _Python: http://www.python.org/
.. _Genx: http://tbray.org/ongoing/When/200x/2004/02/20/GenxStatus
.. _genxGetVersion: http://tbray.org/ongoing/genx/docs/Guide.html#genxGetVersion
.. _UTF-8: http://www.utf-8.com/
.. _Docutils: http://docutils.sourceforge.net/
.. _Pyrex: http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/
.. _setuptools: http://peak.telecommunity.com/DevCenter/setuptools

..
  Python references

.. _String: http://docs.python.org/lib/typesseq.html#l2h-153
.. _None: http://docs.python.org/lib/node34.html#l2h-320
.. _int: http://docs.python.org/lib/typesnumeric.html#l2h-117
.. _File: http://docs.python.org/lib/bltin-file-objects.html#l2h-229
.. _StringIO.StringIO: http://docs.python.org/lib/module-StringIO.html
.. _dict: http://docs.python.org/lib/typesmapping.html

