| Home | Trees | Indices | Help |
|
|---|
|
|
object --+
|
Cleaner
Instances cleans the document of each of the possible offending
elements. The cleaning is controlled by attributes; you can
override attributes in a subclass, or set them in the constructor.
``scripts``:
Removes any ``<script>`` tags.
``javascript``:
Removes any Javascript, like an ``onclick`` attribute.
``comments``:
Removes any comments.
``style``:
Removes any style tags or attributes.
``links``:
Removes any ``<link>`` tags
``meta``:
Removes any ``<meta>`` tags
``page_structure``:
Structural parts of a page: ``<head>``, ``<html>``, ``<title>``.
``processing_instructions``:
Removes any processing instructions.
``embedded``:
Removes any embedded objects (flash, iframes)
``frames``:
Removes any frame-related tags
``forms``:
Removes any form tags
``annoying_tags``:
Tags that aren't *wrong*, but are annoying. ``<blink>`` and ``<marque>``
``remove_tags``:
A list of tags to remove.
``allow_tags``:
A list of tags to include (default include all).
``remove_unknown_tags``:
Remove any tags that aren't standard parts of HTML.
``safe_attrs_only``:
If true, only include 'safe' attributes (specifically the list
from `feedparser
<http://feedparser.org/docs/html-sanitization.html>`_).
``add_nofollow``:
If true, then any <a> tags will have ``rel="nofollow"`` added to them.
``host_whitelist``:
A list or set of hosts that you can use for embedded content
(for content like ``<object>``, ``<link rel="stylesheet">``, etc).
You can also implement/override the method
``allow_embedded_url(el, url)`` or ``allow_element(el)`` to
implement more complex rules for what can be embedded.
Anything that passes this test will be shown, regardless of
the value of (for instance) ``embedded``.
Note that this parameter might not work as intended if you do not
make the links absolute before doing the cleaning.
``whitelist_tags``:
A set of tags that can be included with ``host_whitelist``.
The default is ``iframe`` and ``embed``; you may wish to
include other tags like ``script``, or you may want to
implement ``allow_embedded_url`` for more control. Set to None to
include all tags.
This modifies the document *in place*.
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
Inherited from |
|||
|
|||
scripts = True
|
|||
javascript = True
|
|||
comments = True
|
|||
style = True
|
|||
links = True
|
|||
meta = True
|
|||
page_structure = True
|
|||
processing_instructions = True
|
|||
embedded = True
|
|||
frames = True
|
|||
forms = True
|
|||
annoying_tags = True
|
|||
remove_tags = None
|
|||
allow_tags = None
|
|||
remove_unknown_tags = True
|
|||
safe_attrs_only = True
|
|||
add_nofollow = True
|
|||
host_whitelist =
|
|||
whitelist_tags =
|
|||
_tag_link_attrs =
|
|||
_decomment_re = re.compile(r'
|
|||
|
|||
|
Inherited from |
|||
|
|||
|
|
Depending on the browser, stuff like ``e x p r e s s i o n(...)`` can get interpreted, or ``expre/* stuff */ssion(...)``. This checks for attempt to do stuff like this. Typically the response will be to kill the entire style; if you have just a bit of Javascript in the style another rule will catch that and remove only the Javascript from the style; this catches more sneaky attempts. |
|
|||
_tag_link_attrs
|
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0beta1 on Fri Jan 11 16:02:42 2008 | http://epydoc.sourceforge.net |