===============
Unicode Support
===============

Since Pygments 0.6, the lexers use unicode strings internally. Because of that
you might discover the occasional `UnicodeDecodeError` if you pass strings with the
wrong encoding.

Per default all lexers have `encoding` set to `latin1`. If you pass a lexer a
string object (not unicode) it tries to decode the data using this encoding.
You can override the encoding using the `encoding` lexer option. If you have the
`chardet`_ library installed and set the encoding to ``chardet`` if will ananlyse
the text and fetch the best encoding automatically:

.. sourcecode:: python

    from pygments.lexers import PythonLexer
    lexer = PythonLexer(encoding='chardet')

The best way is to pass Pygments unicode objects. In that case you can't get
unexpected output.

The formatters now send unicode objects to the stream if you don't set the
encoding. You can do so by passing the formatters an `encoding` option:

.. sourcecode:: python

    from pygments.formatters import HtmlFormatter
    f = HtmlFormatter(encoding='utf-8')

.. _chardet: http://chardet.feedparser.org/
