                                 Howto
                                 =====

   XIST is an extensible HTML/XML generator written in Python. It was
   developed as a replacement for an HTML preprocessor named HSC and
   borrows some features and ideas from it. It also borrows the basic
   ideas (XML/HTML elements as Python objects) from HTMLgen or
   HyperText.

   (If you're impatient, there's also a list of examples that shows
   what can be done with XIST.)

Overview
========

   XIST can be used as a compiler that reads an input XML file and
   generates a transformed output file, or it could be used for
   generating XML dynamically inside a web server (but note that
   handling object trees is slower than simply sending string
   fragments). In either case generating the final HTML or XML output
   requires the following three steps:

     * Generating a source XML tree: This can be done either by
       parsing an XML file, or by directly constructing the tree --
       as HTMLgen and HyperText do -- as a tree of Python objects.
       XIST provides a very natural and pythonic API for that.
     * Converting the source tree into a target tree: This target
       tree can be a HTML tree or a SVG tree or XSL-FO tree or any
       other XML tree you like. Every node class provides a convert
       method for performing this conversion. For your own XML
       element types you have to define your own element classes and
       implement an appropriate convert method. This is possible for
       processing instructions and entity references too.
     * Publishing the target tree: For generating the final output a
       Publisher object is used that generates the encoded byte
       string fragments that can be written to an output stream (or
       yielded from a WSGI application, etc.).

Constructing XML trees
======================

   Like any other XML tree API, XIST provides the usual classes:

     * Element for XML elements;
     * Attr for attributes;
     * Attrs for attribute mappings;
     * Text for text data;
     * Frag for document fragments, (a Frag object is simply a list
       of nodes);
     * Comment for XML comments (e.g. <!-- the comment -->);
     * ProcInst for processing instructions (e.g. <?php echo
       $spam;?>);
     * Entity for entity references (e.g. &parrot;) and
     * DocType for document type declarations (e.g. <!DOCTYPE html
       PUBLIC ...>).

  XML trees as Python objects
  ---------------------------

   XIST works somewhat different from a normal DOM API. Instead of
   only one element class, XIST has one class for every element type.
   All the elements from different XML vocabularies known to XIST are
   defined in modules in the ll.xist.ns subpackage. (Of course it's
   possible to define additional namespaces for your own XML
   vocabulary). The definition of HTML can be found in
   ll.xist.ns.html for example.

   Every element class has a constructor of the form:

 __init__(self, *content, **attrs)

   Positional arguments (i.e. items in content) will be the child
   nodes of the element node. Keyword arguments will be attributes.
   You can pass most of Python's builtin types to such a constructor.
   Strings (str and unicode) and integers will be automatically
   converted to Text objects. Constructing an HTML element works like
   this:

 from ll.xist.ns import html

 node = html.div(
    "Hello ",
    html.a("Python", href="http://www.python.org/"),
    " world!"
 )

   For attribute names that collide with Python keywords or are not
   legal identifiers (most notably class in HTML) the attribute name
   must be slightly modified, so that it's a legal Python identifier
   (for class an underscore is appended):

 node = html.div(
    "Hello world!",
    class_="greeting"
 )

   (Don't worry: This modified attribute name will be mapped to the
   real official attribute name once the output is generated.)

   You can pass attributes as a dictionary too:

 node = html.div(
    "Hello world!",
    {
       "class_": "greeting",
       "id": 42,
       "title": "Greet the world"
    }
 )

  Generating XML trees from XML files
  -----------------------------------

   XML trees can also be generated by parsing XML files. For this the
   module ll.xist.parsers provides several functions:

 def parseString(text, base=None, sysid=None, **parserargs)
 def parseURL(url, base=None, sysid=None, **parserargs)
 def parseFile(filename, base=None, sysid=None, **parserargs)
 def parse(stream, base=None, sysid=None, **parserargs)

   parseString is for parsing strings (str and unicode) and parseURL
   is for parsing resources from URLs. With parseFile you can parse
   local files and with parse you can directly parse from a file-like
   object.

   All four functions create a parser internally, parse the supplied
   source document and return the resulting object tree.

   For example, parsing a string can be done like this:

 from ll.xist import parsers
 from ll.xist.ns import html

 node = parsers.parseString(
    "<p>Hello <a href="http://www.python.org/">Python</a> world!</p>"
 )

   For further info about the rest of the arguments to the parsing
   functions, see the documentation for ll.xist.parsers.Parser.

Defining new elements and converting XML trees
==============================================

   To be able to parse an XML file, you have to provide an element
   class for every element type that appears in the file. These
   classes either come from modules provided by XIST or you can
   define your own. Defining your own element class for an element
   named cool works like this:

 class cool(xsc.Element):
    def convert(self, converter):
       node = html.b(self.content, u" is cool!")
       return node.convert(converter)

   You have to derive your new class from xsc.Element. The name of
   the class will be the element name. For element type names that
   are invalid Python identifiers, you can use the class attribute
   xmlname in the element class to overwrite the element name.

   To be able to convert an element of this type to a new XML tree
   (probably HTML in most cases), you have to implement the convert
   method. In this method you can build a new XML tree from the
   content and attributes of the object.

   Using this new element is simple

 >>> node = cool("Python")
 >>> print node.conv().asBytes()
 <b>Python is cool!</b>

   conv simply calls convert with a default converter argument. We'll
   come to converters in a minute. asBytes is a method that converts
   the node to a byte string. This method will be explained when we
   discuss the publishing interface.

   Note that it is vital for your own convert methods that you
   recursively call convert on your own content, because otherwise
   some unconverted nodes might remain in the tree. Let's define a
   new element:

 class python(xsc.Element):
    def convert(self, converter):
       return html.a(u"Python", href=u"http://www.python.org/")

   Now we can do the following:

 >>> node = cool(python())
 >>> print node.conv().asBytes()
 <b><a href="http://www.python.org/">Python</a> is cool!</b>

   But if we forget to call convert for our own content, i.e. if the
   element cool was written like this:

 class cool(xsc.Element):
    def convert(self, converter):
       return html.b(self.content, " is cool!")

   we would get:

 >>> node = cool(python())
 >>> print node.conv().asBytes()
 <b><python /> is cool!</b>

   Furthermore convert should never modify self, because convert
   might be called multiple times for the same node.

  Converters
  ----------

   conv is a convenience method that creates a default converter for
   you and calls convert. This converter is created once and is
   passed to all convert calls. It is used to store parameters for
   the conversion process and it allows elements to pass information
   to other nodes. You can also call convert yourself, which would
   look like this:

 from ll.xist import converters
 from ll.xist.ns import html

 node = cool(python())
 node = node.convert(converters.Converter())

   You can pass the following arguments to the Converter constructor:

   root
           root (which defaults to None) is the root URL for the
           conversion process. When you want to resolve a link in
           some of your own convert methods, the URL must be
           interpreted relative to this root URL (You can use
           URLAttr.forInput for that).

   mode
           mode (which defaults to None) works the same way as modes
           in XSLT. You can use this for implementing different
           conversion modes.

   stage
           stage (which defaults to "deliver") allows you to
           implement multi stage conversion: Suppose that you want to
           deliver a dynamically constructed web page with XIST that
           contains results from a database query and the current
           time. The data in the database changes infrequently, so it
           doesn't make sense to do the query on every request. The
           query is done every few minutes and the resulting HTML
           tree is stored in the servlet (using any of the available
           Python servlet technologies). For this conversion the
           stage would be "cache" and your database XML element would
           do the query when stage=="cache". Your time display
           element would do the conversion when stage=="deliver" and
           simply returns itself when stage=="cache", so it would
           still be part of the cached XML tree and would be
           converted to HTML on every request.

   target
           target (which defaults to ll.xist.ns.html) specifies what
           the output should be. Values must be namespace subclasses
           (see below for an explanation of namespaces).

   lang

           lang (which defaults to None) is the language in which the
           result tree should be. This can be used in the convert
           method to implement different conversions for different
           languages, e.g.:

 class note(xsc.Element):
    def convert(self, converter):
       if converter.lang==u"de":
          title = u"Anmerkung"
       elif converter.lang==u"en":
          title = u"Note"
       else:
          title = u"???"
       node = xsc.Frag(
          html.h1(title),
          html.div(self.content)
       )
       return node.convert(converter)

   Additional arguments are passed when a converter is created in the
   context of a make script.

  Attributes
  ----------

   Setting and accessing the attributes of an element works via the
   dictionary interface:

 >>> node = html.a(u"Python", href=u"http://www.python.org/")
 >>> print node[u"href"].asBytes()
 http://www.python.org/
 >>> del node[u"href"]
 >>> print node[u"href"].asBytes()

 >>> node[u"href"] = u"http://www.python.org"
 >>> print node[u"href"].asBytes()
 http://www.python.org/

   All attribute values are instances of subclasses of the class
   Attr. Available subclasses are:

     * TextAttr, for normal text attributes;
     * URLAttr, for attributes that are URLs;
     * BoolAttr, for boolean attributes (for such an attribute only
       its presence is important, it's value will always be the same
       as the attribute name when publishing);
     * IntAttr, for integer attributes;
     * ColorAttr, for color attributes (e.g. #ffffff).

   IntAttr and ColorAttr mostly serve as documentation of the
   attributes purpose. Both classes have no added functionality.

   Attr itself is derived from Frag so it is possible to use all the
   sequence methods on an attribute.

   Unset attributes will be treated like empty ones so the following
   is possible:

 del node["spam"]
 node["spam"].append("ham")

   This also means that after del node["spam"][:] the attribute will
   be empty again and will be considered to be unset. Such attributes
   will be skipped when publishing.

   The main purpose of this is to allow you to construct values
   conditionally and then use those values as attribute values:

 import random

 if random.random() < 0.5:
    class_ = None
 else:
    class_ = u"foo"

 node = html.div(u"foo", class_=class_)

   In 50% of the cases the generated div element will not have a
   class attribute.

    Defining attributes
    -------------------

   When you define a new element you have to specify the attributes
   allowed for this element. For this use the class attribute Attrs
   (which must be a class derived from xsc.Element.Attrs) and define
   the attributes by deriving them from one of the existing attribute
   classes. We could extend our example element in the following way:

 class cool(xsc.Element):
    class Attrs(xsc.Element.Attrs):
       class adj(xsc.TextAttr): pass

    def convert(self, converter):
       node = xsc.Frag(self.content, u" is")
       if u"adj" in self.attrs:
          node.append(u" ", html.em(self[u"adj"]))
       node.append(u" cool!")
       return node.convert(converter)

   and use it like this:

 >>> node = cool(python(), adj=u"totally")
 >>> node.conv().asBytes()
 <a href="http://www.python.org/">Python</a> is <em>totally</em> cool!

    Default attributes
    ------------------

   It is possible to define default values for attributes via the
   class attribute default:

 class cool(xsc.Element):
    class Attrs(xsc.Element.Attrs):
       class adj(xsc.TextAttr):
          default = u"absolutely"

    def convert(self, converter):
       node = xsc.Frag(self.content, u" is")
       if u"adj" in self.attrs:
          node.append(u" ", html.em(self[u"adj"]))
       node.append(u" cool!")
       return node.convert(converter)

   Now if we instantiate the class without specifying adj we'll get
   the default:

 >>> node = cool(python())
 >>> print node.conv().asBytes()
 <a href="http://www.python.org/">Python</a> is <em>absolutely</em> cool!

   If we want a cool instance without an adj attribute, we can pass
   None as the attribute value:

 >>> node = cool(python(), adj=None)
 >>> print node.conv().asBytes()
 <a href="http://www.python.org/">Python</a> is cool!

    Attribute value sets
    --------------------

   It's possible to specify that an attribute has a fixed set of
   allowed values. This can be done with the class attribute values.
   We could extend our example to look like this:

 class cool(xsc.Element):
    class Attrs(xsc.Element.Attrs):
       class adj(xsc.TextAttr):
          default = "absolutely"
          values = (u"absolutely", u"totally", u"very")

    def convert(self, converter):
       node = xsc.Frag(self.content, u" is")
       if u"adj" in self.attrs:
          node.append(" ", html.em(self[u"adj"]))
       node.append(u" cool!")
       return node.convert(converter)

   These values won't be checked when we create our cool instance.
   Only when this node is parsed from a file will the warning be
   issued:

 >>> s = '<cool adj="pretty"><python/></cool>'
 >>> node = parsers.parseString(s)
 /home/walter/pythonroot/ll/xist/xsc.py:1665: IllegalAttrValueWarning: Attribute value 'pretty' not allowed for __main__:cool.Attrs.adj.
   warnings.warn(errors.IllegalAttrValueWarning(self))

   The warning will also be issued if we publish such a node, but
   note that for warnings Python's warning framework is used, so the
   warning will be printed only once (but of course you can change
   that with warnings.filterwarnings):

 >>> node = cool(python(), adj=u"pretty")
 >>> print node.asBytes()
 /home/walter/pythonroot/ll/xist/xsc.py:1665: IllegalAttrValueWarning: Attribute value 'pretty' not allowed for __main__:cool.Attrs.adj.
   warnings.warn(errors.IllegalAttrValueWarning(self))
 <cool adj="very"><python /></cool>

    Required attributes
    -------------------

   Finally it's possible to specify that an attribute is required.
   This again will only be checked when parsing or publishing. To
   specify that an attribute is required simply add the class
   attribute required with the value True. The attribute alt of the
   class ll.xist.ns.html.img is such an attribute, so we'll get:

 >>> from ll.xist.ns import html
 >>> node = html.img(src="eggs.png")
 >>> print node.asBytes()
 /home/walter/pythonroot/ll/xist/xsc.py:2046: RequiredAttrMissingWarning: Required attribute 'alt' missing in ll.xist.ns.html:img.Attrs.
   warnings.warn(errors.RequiredAttrMissingWarning(self, attrs.keys()))
 <img src="eggs.png" />

  Namespaces
  ----------

   Now that you've defined your own elements, you have to tell the
   parser about them, so they can be instantiated when a file is
   parsed. This is done with namespace classes.

   Namespace classes can be thought of as object oriented versions of
   XML namespaces. Two class attributes can be used to configure the
   namespace: xmlname specifies the default namespace prefix to use
   for the namespace and xmlurl is the namespace name. All element
   classes nested inside the namespace class belong to the namespace.

   It's also possible to define your namespace class without any
   nested element classes and later add those classes to the
   namespace by attribute assignment or with the class method update,
   which expects a dictionary as an argument. All objects found in
   the values of the dictionary will be added to the namespace as
   attributes. So you can put the namespace class at the end of your
   Python module after all the element classes are defined and add
   all the objects from the local scope to the namespace. Your
   complete namespace might looks like this:

 class python(xsc.Element):
    def convert(self, converter):
       return html.a(
          u"Python",
          href=u"http://www.python.org/"
       )

 class cool(xsc.Element):
    def convert(self, converter):
       node = html.b(self.content, u" is cool!")
       return node.convert(converter)

 class __ns__(xsc.Namespace):
    xmlname = "foo"
    xmlurl = "http://www.example.com/foo"
 __ns__.update(vars())

   All defined namespace classes will be registered with the parser
   automatically, so all elements belonging to the namespace will be
   used when parsing files.

    Namespaces as modules
    ---------------------

   It is convenient to define all classes that belong to one
   namespace in one Python module. However this means that the
   resulting module will only contain one "interesting" object: the
   namespace class. To make using this class more convenient, it's
   possible to turn the namespace class into a module by using the
   class method makemod instead of update:

 class python(xsc.Element):
    def convert(self, converter):
       return html.a(
          u"Python",
          href=u"http://www.python.org/"
       )

 class cool(xsc.Element):
    def convert(self, converter):
       node = html.b(self.content, u" is cool!")
       return node.convert(converter)

 class __ns__(xsc.Namespace):
    xmlname = "foo"
    xmlurl = "http://www.example.com/foo"
 __ns__.makemod(vars())

   Suppose that the above code is in the file foo.py. Doing an import
   foo will then give you the namespace class instead of the module:

 >>> import foo
 >>> foo
 <foo:__ns__ namespace name=u"foo" url=u"http://www.example.com/foo" with 2 elements from "foo.py" at 0x87654321>
 >>> foo.python
 <foo:python element at 0x87654321>

    Global attributes
    -----------------

   You can define global attributes belonging to a certain namespace
   in the same way as defining local attributes belonging to a
   certain element type: Define a nested Attrs class inside the
   namespace class (derived from ll.xist.xsc.Namespace.Attrs):

 from ll.xist import xsc

 class __ns__(xsc.Namespace):
    xmlname = "foo"
    xmlurl = "http://www.example.com/foo"

    class Attrs(xsc.Namespace.Attrs):
       class foo(xsc.TextAttr): pass
 __ns__.makemod(vars())

   Setting and accessing such an attribute can be done like this:

 >>> from ll.xist.ns import html
 >>> import foo
 >>> node = html.div(u"foo", {(foo, u"foo"): u"bar")
 >>> str(node[foo, u"foo"])
 'bar'

   An alternate way of specifying a global attribute in a constructor
   looks like this:

 >>> from ll.xist.ns import html
 >>> import foo
 >>> node = html.div(u"foo", foo.Attrs(foo=u"baz"))
 >>> str(node[foo, u"foo"])
 'baz'

    Subclassing namespaces
    ----------------------

   Each element class that belongs to a namespace can access its
   namespace via the class attribute __ns__. When you're subclassing
   namespace classes, the elements in the base namespace will be
   automatically subclassed too. Of course you can explicitly
   subclass an element class too. The following example shows the
   usefulness of this feature. Define your base namespace like this
   and put it into navns.py:

 from ll.xist import xsc
 from ll.xist.ns import html

 languages = [
    (u"Python", u"http://www.python.org/"),
    (u"Perl", u"http://www.perl.org/"),
    (u"PHP", u"http://www.php.net/"),
    (u"Java", u"http://java.sun.com/")
 ]

 class navigation(xsc.Element):
    def convert(self, converter):
       node = self.__ns__.links()
       for (name, url) in languages:
          node.append(self.__ns__.link(name, href=url))
       return node.convert(converter)

 class links(xsc.Element):
    def convert(self, converter):
       node = self.content
       return node.convert(converter)

 class link(xsc.Element):
    class Attrs(xsc.Element.Attrs):
       class href(xsc.URLAttr): pass

    def convert(self, converter):
       node = html.div(html.a(self.content, href=self[u"href"]))
       return node.convert(converter)

 class __ns__(xsc.Namespace):
    xmlname = "nav"
    xmlurl = "http://www.example.com/nav"
 __ns__.makemod(vars())

   This namespace defines a navigation element that generates divs
   with links to various homepages for programming languages. We can
   use it like this:

 >>> import navns
 >>> print navns.navigation().conv().asBytes()
 <div><a href="http://www.python.org/">Python</a></div>
 <div><a href="http://www.perl.org/">Perl</a></div>
 <div><a href="http://www.php.net/">PHP</a></div>
 <div><a href="http://java.sun.com/">Java</a></div>

   (Of course the output will all be on one line.)

   Now we can define a derived namespace (in the file nav2ns.py) that
   overwrites the element classes links and link to change how the
   navigation looks:

 from ll.xist import xsc
 from ll.xist.ns import html

 import navns

 class __ns__(navns):
    class links(navns.links):
       def convert(self, converter):
          node = html.table(
             self.content,
             border=0,
             cellpadding=0,
             cellspacing=0,
             class_=u"navigation",
          )
          return node.convert(converter)

    class link(navns.link):
       def convert(self, converter):
          node = html.tr(
             html.td(
                html.a(
                   self.content,
                   href=self[u"href"],
                )
             )
          )
          return node.convert(converter)
 __ns__.makemod(vars())

   When we use the navigation element from the derived namespace
   we'll get the following output:

 >>> import nav2ns
 >>> print nav2ns.navigation().conv().asBytes()
 <table border="0" cellpadding="0" cellspacing="0" class="navigation">
 <tr><td><a href="http://www.python.org/">Python</a></td></tr>
 <tr><td><a href="http://www.perl.org/">Perl</a></td></tr>
 <tr><td><a href="http://www.php.net/">PHP</a></td></tr>
 <tr><td><a href="http://java.sun.com/">Java</a></td></tr>
 </table>

   (again all on one line.)

   Notice that we automatically got an element class
   nav2ns.navigation, that this element class inherited the convert
   method from its base class and that the call to convert on the
   derived class did instantiate the link classes from the derived
   namespace.

    Namespaces as conversion targets
    --------------------------------

   The converter argument passed to the convert method has an
   attribute target which is a namespace class and specifies the
   target namespace to which self should be converted.

   You can check which conversion is wanted with issubclass. Once
   this is determined you can use element classes from the target to
   create the required XML object tree. This makes it possible to
   customize the conversion by passing a derived namespace to the
   convert method. To demonstrate this, we change our example
   namespace to use the conversion target like this:

 import navns

 class __ns__(navns):
    class links(nav.links):
       def convert(self, converter):
          node = converter.target.table(
             self.content,
             border=0,
             cellpadding=0,
             cellspacing=0,
             class_=u"navigation",
          )
          return node.convert(converter)

    class link(nav.link):
       def convert(self, converter):
          target = converter.target
          node = target.tr(
             target.td(
                target.a(
                   self.content,
                   href=self[u"href"],
                )
             )
          )
          return node.convert(converter)

   What we might want to do is have all links (i.e. all
   ll.xist.ns.html.a elements) generated with an attribute
   target="_top". For this we derive a new namespace from
   ll.xist.ns.html and overwrite the a element:

 from ll.xist.ns import html

 class __ns__(html):
    class a(html.a):
       def convert(self, converter):
          node = html.a(self.content, self.attrs, target=u"_top")
          return node.convert(converter)

   Now we can pass this namespace as the conversion target and all
   links will have a target="_top".

  Validation and content models
  -----------------------------

   When generating HTML you might want to make sure that your
   generated code doesn't contain any illegal tag nesting (i.e.
   something bad like <p><p>Foo</p></p>). The module ll.xist.ns.html
   does this automatically:

 >>> from ll.xist.ns import html
 >>> node = html.p(html.p(u"foo"))
 >>> print node.asBytes()
 /home/walter/pythonroot/ll/xist/sims.py:238: WrongElementWarning: element <ll.xist.ns.html:p> may not contain element <ll.xist.ns.html:p>
   warnings.warn(WrongElementWarning(node, child, self.elements))
 <p><p>foo</p></p>

   For your own elements you can specify the content model too. This
   is done by setting the class attribute model inside the element
   class. model must be an object that provides a checkvalid method.
   This method will be called during parsing or publishing with the
   element as an argument. When a validation violation is detected,
   the Python warning framework should be used to issue a warning.

   The module ll.xist.sims contains several classes that provide
   simple validation methods: Empty can be used to ensure that the
   element doesn't have any content (like br and img in HTML). Any
   does allow any content. NoElements will warn about elements from
   the same namespace (elements from other namespaces will be OK).
   NoElementsOrText will warn about elements from the same namespace
   and non-whitespace text content. Elements will only allow the
   elements specified in the constructor. ElementsOrText will only
   allow the elements specified in the constructor and text.

   None of these classes will check the number of child elements or
   their order.

   For more info see the sims module.

  Entities
  --------

   In the same way as defining new element types, you can define new
   entities. But to be able to use the new entities in an XML file
   you have to use a parser that supports reporting undefined
   entities to the application via skippedEntity (SGMLOPParser and
   ExpatParser in the module ll.xist.parsers do that). The following
   example is from the module ll.xist.ns.abbr:

 from ll.xist import xsc
 from ll.xist.ns import html

 class html(xsc.Entity):
    def convert(self, converter):
       return html.abbr(
          u"HTML",
          title=u"Hypertext Markup Language",
          lang=u"en"
       )

   You can use this entity in your XML files like this:

 <cool adj="very">&html;</cool>

  Processing instructions
  -----------------------

   Defining processing instructions works just like elements and
   entities. Derive a new class from ll.xist.xsc.ProcInst and
   implement convert. The following example implements a processing
   instruction that returns an uppercase version of its content as a
   text node.

 class upper(xsc.ProcInst):
    def convert(self, converter):
       return xsc.Text(self.content.upper())

   It can be used in an XML file like this:

 <cool><?upper foo?></cool>

   There are namespaces containing processing instruction classes
   that don't provide a convert method. These processing instruction
   objects will then be published as XML processing instructions. One
   example is the namespace ll.xist.ns.php.

   Other namespaces (like ll.xist.ns.jsp) contain processing
   instruction classes, but they will be published in a different
   (not XML compatible) format. For example
   ll.xist.ns.jsp.expression("foo") will be published as <%= foo>.

Publishing XML trees
====================

   After creating the XML tree and converting the tree into its final
   output form, you have to write the resulting tree to a file. This
   can be done with the publishing API. Three methods that use the
   publishing API are bytes, asBytes and write. bytes is a generator
   that will yield the complete 8-bit XML string in fragments.
   asBytes returns the complete 8-bit XML string.

   You can specify the encoding with the parameter encoding (with
   "utf-8" being the default). Unencodable characters will be escaped
   with character references when possible (i.e. inside text nodes,
   for comments or processing instructions you'll get an exception):

 >>> from ll.xist import xsc
 >>> from ll.xist.ns import html
 >>> s = u"A\e4\u03a9\u8a9e"
 >>> node = html.div(s)
 >>> print node.asBytes(encoding="ascii")
 <div>A&#228;&#937;&#35486;>
 >>> print node.asBytes(encoding="iso-8859-1")
 <div>A&#937;&#35486;>
 >>> print xsc.Comment(s).asBytes()
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File "/home/walter/pythonroot/ll/xist/xsc.py", line 600, in asBytes
     publisher.publish(stream, self, base)
   File "/home/walter/pythonroot/ll/xist/publishers.py", line 205, in publish
     self.node.publish(self)
   File "/home/walter/pythonroot/ll/xist/xsc.py", line 1305, in publish
     publisher.write(self.content)
   File "/usr/local/lib/python2.3/codecs.py", line 178, in write
     data, consumed = self.encode(object, self.errors)
 UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-4: ordinal not in range(128)

   When you include an XML header or an HTML meta header, XIST will
   automatically insert the correct encoding when publishing:

 >>> from ll.xist import xsc
 >>> from ll.xist.ns import xml, meta
 >>> e = xsc.Frag(xml.XML10(), u"\n", meta.contenttype())
 >>> print e.conv().asBytes(encoding="iso-8859-15")
 <?xml version='1.0' encoding='iso-8859-15'?>
 <meta content="text/html; charset=iso-8859-15" http-equiv="Content-Type" />

   Another useful parameter is xhtml, it specifies whether you want
   pure HTML or XHTML as output:

   xhtml==0
           This will give you pure HTML, i.e. no final / for elements
           with an empty content model, so you'll get e.g. <br> in
           the output. Elements that don't have an empty content
           model, but are empty will be published with a start and
           end tag (i.e. <div></div>).

   xhtml==1
           This gives HTML compatible XHTML. Elements with an empty
           content model will be published like this: <br /> (This is
           the default).

   xhtml==2
           This gives full XML output. Every empty element will be
           published with an empty tag (without an additional space):
           <br/> or <div/>.

   Writing a node to a file can be done with the method write:

 >>> from ll.xist.ns import html
 >>> node = html.div(u"", html.br(), u"")
 >>> node.write(open("foo.html", "wb"), encoding="ascii")

   All these methods use the method publish internally. publish gets
   passed an instance of ll.xist.publisher.Publisher.

Searching trees
===============

   There are two methods available for iterating through an XML tree
   and finding nodes in the tree: The walk method and XFind
   expressions.

  The walk method
  ---------------

   The method walk is a generator. You pass a callable object to walk
   which is used for determining which part of the tree should be
   searched and which nodes should be returned.

   ll.xist.xsc provides several useful predefined classes for
   specifying what should be returned from walk: FindType will search
   only the first level of the tree and will return any node that is
   an instance of one of the classes passed to the constructor. So if
   you have an instance of ll.xist.ns.html.ul named node you could
   search for all ll.xist.ns.html.li elements inside with the
   following code:

 for cursor in node.content.walk(xsc.FindType(html.li)):
    print unicode(cursor.node)

   FindTypeAll can be used when you want to search the complete tree.
   The following example extracts all the links on the Python home
   page:

 from ll.xist import xsc, parsers
 from ll.xist.ns import html

 node = parsers.parseURL("http://www.python.org/", tidy=True)

 for cursor in node.walk(xsc.FindTypeAll(html.a)):
    print cursor.node[u"href"]

   This gives the output:

 http://www.python.org/
 http://www.python.org/search/
 http://www.python.org/download/
 http://www.python.org/doc/
 http://www.python.org/Help.html
 http://www.python.org/dev/
 ...

   The following example will find all external links on the Python
   home page:

 from ll.xist import xsc, parsers
 from ll.xist.ns import html

 node = parsers.parseURL("http://www.python.org/", tidy=True)

 def isextlink(cursor):
    if isinstance(cursor.node, html.a) and not unicode(cursor.node[u"href"]).startswith(u"http://www.python.org"):
       return (True, xsc.entercontent)
    return (xsc.entercontent,)

 for cursor in node.walk(isextlink):
    print cursor.node[u"href"]

   This gives the output:

 http://www.jython.org/
 http://sourceforge.net/tracker/?atid=105470&group%5fid=5470
 http://sourceforge.net/tracker/?atid=305470&group%5fid=5470
 http://sourceforge.net/cvs/?group%5fid=5470
 http://www.python-in-business.org/
 http://www.europython.org/
 mailto:webmaster@python.org
 ...

   The callable (isextlink in the example) will be called for each
   node visited. The cursor argument has an attribute node that is
   the node in question. For the other attributes see the Cursor
   class.

   The callable must return a sequence with the following entries:

   ll.xist.xsc.entercontent
           enter the content of this element and continue searching;

   ll.xist.xsc.enterattrs
           enter the attributes of this element and continue
           searching;

   boolean value
           If true, the node will be part of the result.

   The sequence will be "executed" in the order you specify. To
   change the top down traversal from our example to a bottom up
   traversal we could change isextlink to the following (note the
   swapped tuple entries):

 def isextlink(node):
    if isinstance(node, html.a) and not unicode(node[u"href"]).startswith(u"http://www.python.org"):
       return (xsc.entercontent, True)
    return (xsc.entercontent,)

   Note that the cursor yielded from walk will be reused by
   subsequent next calls, so you should not modify the cursor and you
   can't rely on attributes of the cursor after reentry to walk.

  XFind expressions
  -----------------

   A second method exists for iterating through a tree: XFind
   expressions. An XFind expression looks somewhat like an XPath
   expression, but is implemented as a pure Python expression
   (overloading the division operators).

   Our example from above that searched for lis inside uls can be
   rewritten as follows:

 for child in node/html.li:
    print unicode(child)

   A XFind expression returns an iterator for certain parts of the
   XML tree. In an XFind expression a/b, a must be either a node or
   an iterator producing nodes (note that an XFind expression itself
   is such an iterator, so a itself might be a XFind expression). b
   must be an XFind operator.

   Every subclass of ll.xist.xsc.Node is a XFind operator. If b is
   such a subclass, a/b will produce any child nodes of the nodes
   from a that is an instance of b. If b is an attribute class, you
   will get attribute nodes instead of child nodes. Other XFind
   operators can be found in the module ll.xist.xfind. The all
   operator will produce every node in the tree (except for
   attributes):

 from ll.xist import xfind
 from ll.xist.ns import html

 node = html.div(
    html.div(
       html.div(id=3),
       html.div(id=4),
       id=2,
    ),
    html.div(
       html.div(id=6),
       html.div(id=7),
       id=5,
    ),
    id=1
 )

 for child in node/xfind.all:
    print child["id"]

   The output of this is:

 1
 2
 3
 4
 5
 6
 7

   The following example demonstrates how to find all links on the
   Python homepage via an XFind expression:

 from ll.xist import xfind, parsers
 from ll.xist.ns import html

 node = parsers.parseURL("http://www.python.org/", tidy=True)
 for link in node/xfind.all/html.a:
    print link["href"]

   An all operator in the middle of an XFind expression can be
   abbreviated. The XFind expression from the last example
   (node/xfind.all/html.a) can be rewritten like this: node//html.a.

   Another XFind operator is contains. It acts as a filter, i.e. the
   nodes produced by a/xfind.contains(b) are a subset of the nodes
   produced by a, those that contain child nodes of type b. Searching
   for all links on the Python home page that contain images can be
   done like this:

 >>> from ll.xist import xfind, parsers
 >>> from ll.xist.ns import html
 >>> node = parsers.parseURL("http://www.python.org/", tidy=True)
 >>> for link in node//html.a/xfind.contains(html.img):
 ...     print link["href"]
 ...
 http://www.python.org/
 http://www.python.org/psf/donations.html
 http://www.opensource.org/

   Note that using the all operator twice in an XFind expression
   currently won't give you the expected result, as nodes might be
   produced twice.

   Calling __getitem__ on an XFind operator gives you an item
   operator. Such an item operator only returns a specific item (or
   slice) of those nodes returned by the base iterator. An example:

 >>> from ll.xist.ns import html
 >>> e = html.table(html.tr(html.td(j) for j in xrange(i, i+3)) for i in xrange(1, 10, 3))
 >>> print e.pretty().asBytes()
 <table>
    <tr>
       <td>1</td>
       <td>2</td>
       <td>3</td>
    </tr>
    <tr>
       <td>4</td>
       <td>5</td>
       <td>6</td>
    </tr>
    <tr>
       <td>7</td>
       <td>8</td>
       <td>9</td>
    </tr>
 </table>
 >>> # Every cell
 >>> for td in e/html.tr/html.td:
 ...     print td
 ...
 1
 2
 3
 4
 5
 6
 7
 8
 9
 >>> # Every first cell in each row
 >>> for td in e/html.tr/html.td[0]:
 ...     print td
 1
 4
 7
 >>> # Every cell in the first row
 >>> for td in e/html.tr[0]/html.td:
 ...     print td
 1
 2
 3
 >>> # The first of all cells
 >>> for td in e/(html.tr/html.td)[0]:
 ...     print td
 1

Manipulating trees
==================

   XIST provides many methods for manipulating an XML tree.

   The method withsep can be used to put a seperator node between the
   child nodes of an Element or a Frag:

 >>> from ll.xist import xsc
 >>> from ll.xist.ns import html
 >>> node = html.div(*xrange(10))
 >>> print node.withsep(", ").asBytes()
 <div>0, 1, 2, 3, 4, 5, 6, 7, 8, 9</div>

   The method shuffled returns a shuffled version of the Element or
   Frag:

 >>> from ll.xist import xsc
 >>> from ll.xist.ns import html
 >>> node = html.div(*xrange(10))
 >>> print node.shuffled().withsep(", ").asBytes()
 <div>8, 1, 3, 6, 7, 5, 2, 9, 4, 0</div>

   There are methods named reversed and sorted that return a reversed
   or sorted version of an element or fragment:

 >>> from ll.xist import xsc
 >>> from ll.xist.ns import html
 >>> def key(n):
 ...    return unicode(n)
 >>> node = html.div(8,4,2,1,9,6,3,0,7,5)
 >>> print node.sorted(key=key).reversed().withsep(",").asBytes()
 <div>9,8,7,6,5,4,3,2,1,0</div>

   The method mapped recursively walks the tree and generates a new
   tree, where all the nodes are mapped through a function. An
   example: To replace Python with Parrot in every text node on the
   Python page, do the following:

 from ll.xist import xsc, parsers

 def p2p(node, converter):
    if isinstance(node, xsc.Text):
       node = node.replace(u"Python", u"Parrot")
       node = node.replace(u"python", u"parrot")
    return node

 node = parsers.parseURL("http://www.python.org/", tidy=True)
 node = node.mapped(p2p)
 node.write(open("parrot_index.html", "wb"))

   The function must either return a new node, in which case this new
   node will be used instead of the old one, or return the old node
   to tell mapped that it should recursively continue with the
   content of the node.

URLs
====

   For URL handling XIST uses the module ll.url. Refer to its
   documentation for the basic functionality (especially regarding
   the methods __div__ and relative).

   When XIST parses an XML resource it uses a so called "base" URL.
   This base URL can be passed to all parsing functions. If it isn't
   specified it defaults to the URL of the resource being parsed.
   This base URL will be prepended to all URLs that are read during
   parsing:

 >>> from ll.xist import parsers
 >>> from ll.xist.ns import html
 >>> node = parsers.parseString('<img src="eggs.png"/>', base="root:spam/index.html")
 >>> print node.asBytes()
 <img src="root:spam/eggs.png" />

   For publishing a base URL can be specified too. URLs will be
   published relative to this base URL with the exception of relative
   URLs in the tree. This means:

     * When you have a relative URL (e.g. #top) generated by a
       convert call, this URL will stay the same when publishing.
     * Base URLs for parsing should never be relative: Relative base
       URLs will be prepended to all relative URLs in the file, but
       this will not be reverted for publishing. In most cases the
       base URL should be a root URL when you parse local files.
     * When you parse remote web pages you can either omit the base
       argument, so it will default to the URL being parsing, so that
       links, images, etc. on the page will still point back to their
       original location, or you might want to use the empty URL
       URL() as the base, so you'll get all URLs in the page as they
       are.
     * When XIST is used as a compiler for static pages, you're going
       to read source XML files, do a conversion and write the result
       to a new target file. In this case you should probably use the
       URL of the target file for both parsing and publishing. Let's
       assume we have an URL #top in the source file. When we use the
       "real" file names for parsing and publishing like this:

 node = parsers.parseFile("spam.htmlxsc", base="root:spam.htmlxsc")
 node = node.conv()
 node.write(open("spam.html", "wb"), base="root:spam.html")

       the following will happen: The URL #top will be parsed as
       root:spam.htmlxsc#top. After conversion this will be written
       to spam.html relative to the URL root:spam.html, which results
       in spam.html#top, which works, but is not what you want.

       When you use root:spam.html both for parsing and publishing,
       #top will be written to the target file as expected.

Pretty printing XML
===================

   The method pretty can be used for pretty printing XML. It returns
   a new version of the node, with additional white space between the
   elements:

 from ll.xist.ns import html
 node = html.html(
    html.head(
       html.title(u"foo"),
    ),
    html.body(
       html.div(
          html.h1(u"The ", html.em(u"foo"), u" page!"),
          html.p(u"Welcome to the ", html.em(u"foo"), u" page."),
       ),
    ),
 )

 print node.pretty().asBytes()

   This will print:

 <html>
    <head>
       <title>foo</title>
    </head>
    <body>
       <div>
          <h1>The <em>foo</em> page!</h1>
          <p>Welcome to the <em>foo</em> page.</p>
       </div>
    </body>
 </html>

   Element content will only be modified if it doesn't contain Text
   nodes, so mixed content will not be touched.

Automatic generation of image size attributes
=============================================

   The module ll.xist.ns.htmlspecials contains an element autoimg
   that extends ll.xist.ns.html.img. When converted to HTML via the
   convert method the size of the image will be determined and the
   height and width attributes will be set accordingly (if those
   attributes are not set already).

Embedding Python code
=====================

   It's possible to embed Python code into XIST XML files. For this
   XIST supports two new processing instructions: pyexec and pyeval
   (in the module ll.xist.ns.code). The content of pyexec will be
   executed when the processing instruction node is converted.

   The result of a call to convert for a pyeval processing
   instruction is whatever the Python code in the content returns.
   The processing instruction content is treated as the body of a
   function, so you can put multiple return statements there. The
   converter is available as the parameter converter inside the
   processing instruction. For example, consider the following XML
   file:

 <?pyexec
    # sum
    def gauss(top=100):
       sum = 0
       for i in xrange(top+1):
          sum += i
       return sum
 ?>
 <b><?pyeval return gauss()?></b>

   Parsing this file and calling convert results in the following:

 <b>5050</b>