Class HTMLParser
object --+
|
_BaseParser --+
|
_FeedParser --+
|
HTMLParser
- Known Subclasses:
-
The HTML parser. This parser allows reading HTML into a normal XML
tree. By default, it can read broken (non well-formed) HTML, depending
on the capabilities of libxml2. Use the 'recover' option to switch this
off.
Available boolean keyword arguments: * recover - try hard
to parse through broken HTML (default: True) * no_network -
prevent network access for related files (default: True) *
remove_blank_text - discard empty text nodes * remove_comments -
discard comments * remove_pis - discard processing instructions *
compact - safe memory for short text content (default:
True)
Other keyword arguments: * encoding - override the document encoding *
target - a parser target object that will receive the parse events *
schema - an XMLSchema to validate against
Note that you should avoid sharing parsers between threads for
performance reasons.
|
|
__init__(...)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature |
|
|
|
a new object with type S, a subtype of T
|
|
|
Inherited from _FeedParser:
close,
feed
Inherited from _BaseParser:
copy,
makeelement,
setElementClassLookup,
set_element_class_lookup
Inherited from object:
__delattr__,
__getattribute__,
__hash__,
__reduce__,
__reduce_ex__,
__repr__,
__setattr__,
__str__
|
__init__(...)
(Constructor)
|
|
x.__init__(...) initializes x; see x.__class__.__doc__ for
signature
- Overrides:
object.__init__
|
- Returns: a new object with type S, a subtype of T
- Overrides:
object.__new__
|