========================
 The Docutils Publisher
========================

:Author: David Goodger
:Contact: goodger@python.org
:Date: $Date: 2008-06-25 08:54:03 +0200 (Mit, 25 Jun 2008) $
:Revision: $Revision: 5576 $
:Copyright: This document has been placed in the public domain.

.. contents::


The ``docutils.core.Publisher`` class is the core of Docutils,
managing all the processing and relationships between components.  See
`PEP 258`_ for an overview of Docutils components.

The ``docutils.core.publish_*`` convenience functions are the normal
entry points for using Docutils as a library.

See `Inside A Docutils Command-Line Front-End Tool`_ for an overview
of a typical Docutils front-end tool, including how the Publisher
class is used.

.. _PEP 258: ../peps/pep-0258.html
.. _Inside A Docutils Command-Line Front-End Tool: ./cmdline-tool.html


Publisher Convenience Functions
===============================

Each of these functions set up a ``docutils.core.Publisher`` object,
then call its ``publish`` method.  ``docutils.core.Publisher.publish``
handles everything else.  There are several convenience functions in
the ``docutils.core`` module:

:_`publish_cmdline`: for command-line front-end tools, like
  ``rst2html.py``.  There are several examples in the ``tools/``
  directory.  A detailed analysis of one such tool is in `Inside A
  Docutils Command-Line Front-End Tool`_

:_`publish_file`: for programmatic use with file-like I/O.  In
  addition to writing the encoded output to a file, also returns the
  encoded output as a string.

:_`publish_string`: for programmatic use with string I/O.  Returns
  the encoded output as a string.

:_`publish_parts`: for programmatic use with string input; returns a
  dictionary of document parts.  Dictionary keys are the names of
  parts, and values are Unicode strings; encoding is up to the client.
  Useful when only portions of the processed document are desired.
  See `publish_parts Details`_ below.

  There are usage examples in the `docutils/examples.py`_ module.

:_`publish_doctree`: for programmatic use with string input; returns a
  Docutils document tree data structure (doctree).  The doctree can be
  modified, pickled & unpickled, etc., and then reprocessed with
  `publish_from_doctree`_.

:_`publish_from_doctree`: for programmatic use to render from an
  existing document tree data structure (doctree); returns the encoded
  output as a string.

:_`publish_programmatically`: for custom programmatic use.  This
  function implements common code and is used by ``publish_file``,
  ``publish_string``, and ``publish_parts``.  It returns a 2-tuple:
  the encoded string output and the Publisher object.

.. _Inside A Docutils Command-Line Front-End Tool: ./cmdline-tool.html
.. _docutils/examples.py: ../../docutils/examples.py


Configuration
-------------

To pass application-specific setting defaults to the Publisher
convenience functions, use the ``settings_overrides`` parameter.  Pass
a dictionary of setting names & values, like this::

    overrides = {'input_encoding': 'ascii',
                 'output_encoding': 'latin-1'}
    output = publish_string(..., settings_overrides=overrides)

Settings from command-line options override configuration file
settings, and they override application defaults.  For details, see
`Docutils Runtime Settings`_.  See `Docutils Configuration Files`_ for
details about individual settings.

.. _Docutils Runtime Settings: ./runtime-settings.html
.. _Docutils Configuration Files: ../user/tools.html


Encodings
---------

The default output encoding of Docutils is UTF-8.  If you have any
non-ASCII in your input text, you may have to do a bit more setup.
Docutils may introduce some non-ASCII text if you use
`auto-symbol footnotes`_ or the `"contents" directive`_.

.. _auto-symbol footnotes:
   ../ref/rst/restructuredtext.html#auto-symbol-footnotes
.. _"contents" directive:
   ../ref/rst/directives.html#table-of-contents


``publish_parts`` Details
=========================

The ``docutils.core.publish_parts`` convenience function returns a
dictionary of document parts.  Dictionary keys are the names of parts,
and values are Unicode strings.

Each Writer component may publish a different set of document parts,
described below.  Not all writers implement all parts.


Parts Provided By All Writers
-----------------------------

_`encoding`
    The output encoding setting.

_`version`
    The version of Docutils used.

_`whole`
    ``parts['whole']`` contains the entire formatted document.


.. _HTML writer:

Parts Provided By the HTML Writer
---------------------------------

_`body`
    ``parts['body']`` is equivalent to parts['fragment_'].  It is
    *not* equivalent to parts['html_body_'].

_`body_prefix`
    ``parts['body_prefix']`` contains::

        </head>
        <body>
        <div class="document" ...>

    and, if applicable::

        <div class="header">
        ...
        </div>

_`body_pre_docinfo`
    ``parts['body_pre_docinfo]`` contains (as applicable)::

        <h1 class="title">...</h1>
        <h2 class="subtitle" id="...">...</h2>

_`body_suffix`
    ``parts['body_suffix']`` contains::

        </div>

    (the end-tag for ``<div class="document">``), the footer division
    if applicable::

        <div class="footer">
        ...
        </div>

    and::

        </body>
        </html>

_`docinfo`
    ``parts['docinfo']`` contains the document bibliographic data, the
    docinfo field list rendered as a table.

_`footer`
    ``parts['footer']`` contains the document footer content, meant to
    appear at the bottom of a web page, or repeated at the bottom of
    every printed page.

_`fragment`
    ``parts['fragment']`` contains the document body (*not* the HTML
    ``<body>``).  In other words, it contains the entire document,
    less the document title, subtitle, docinfo, header, and footer.

_`head`
    ``parts['head']`` contains ``<meta ... />`` tags and the document
    ``<title>...</title>``.

_`head_prefix`
    ``parts['head_prefix']`` contains the XML declaration, the DOCTYPE
    declaration, the ``<html ...>`` start tag and the ``<head>`` start
    tag.

_`header`
    ``parts['header']`` contains the document header content, meant to
    appear at the top of a web page, or repeated at the top of every
    printed page.

_`html_body`
    ``parts['html_body']`` contains the HTML ``<body>`` content, less
    the ``<body>`` and ``</body>`` tags themselves.

_`html_head`
    ``parts['html_head']`` contains the HTML ``<head>`` content, less
    the stylesheet link and the ``<head>`` and ``</head>`` tags
    themselves.  Since ``publish_parts`` returns Unicode strings and
    does not know about the output encoding, the "Content-Type" meta
    tag's "charset" value is left unresolved, as "%s"::

        <meta http-equiv="Content-Type" content="text/html; charset=%s" />

    The interpolation should be done by client code.

_`html_prolog`
    ``parts['html_prolog]`` contains the XML declaration and the
    doctype declaration.  The XML declaration's "encoding" attribute's
    value is left unresolved, as "%s"::

        <?xml version="1.0" encoding="%s" ?>

    The interpolation should be done by client code.

_`html_subtitle`
    ``parts['html_subtitle']`` contains the document subtitle,
    including the enclosing ``<h2 class="subtitle">`` & ``</h2>``
    tags.

_`html_title`
    ``parts['html_title']`` contains the document title, including the
    enclosing ``<h1 class="title">`` & ``</h1>`` tags.

_`meta`
    ``parts['meta']`` contains all ``<meta ... />`` tags.

_`stylesheet`
    ``parts['stylesheet']`` contains the embedded stylesheet or
    stylesheet link.

_`subtitle`
    ``parts['subtitle']`` contains the document subtitle text and any
    inline markup.  It does not include the enclosing ``<h2>`` &
    ``</h2>`` tags.

_`title`
    ``parts['title']`` contains the document title text and any inline
    markup.  It does not include the enclosing ``<h1>`` & ``</h1>``
    tags.


Parts Provided by the PEP/HTML Writer
`````````````````````````````````````

The PEP/HTML writer provides the same parts as the `HTML writer`_,
plus the following:

_`pepnum`
    ``parts['pepnum']`` contains


Parts Provided by the S5/HTML Writer
````````````````````````````````````

The S5/HTML writer provides the same parts as the `HTML writer`_.

Parts Provided by the LaTeX2e Writer
````````````````````````````````````

__`head_prefix`
    ``parts['head_prefix']`` contains the LaTeX document class, package 
    and stylesheet inclusion.

__`head`
    ``parts['head']`` currently is empty. 

__`body_prefix`
    ``parts['body_prefix']`` contains the LaTeX ``\begin{document}`` and
    title page. 

__`body`
    ``parts['body']`` contains the LaTeX document content, less
    the ``\begin{document}`` and ``\end{document}`` tags themselves.

__`body_suffix`
    ``parts['body_suffix']`` contains the LaTeX ``\end{document}``.
    

