| Trees | Index | Help |
|---|
| Module orchid :: Class OrchidExtractor |
|
| Method Summary | |
|---|---|
Creates a new link extractor. | |
Extracts all the links in the page according to the patterns specified in LINK_PATTERNS. | |
Returns a map from link type to a list of links of that type that appeared in the page. | |
Returns the BeautifulSoup datastructure of the HTML of the site that was set using setSite . | |
getRawContent(self)
| |
Sets the current site url and content for the extractor. | |
| Class Variable Summary | |
|---|---|
list |
LINK_PATTERNS = [('regular', <_sre.SRE_Pattern object at...
|
| Method Details |
|---|
__init__(self)
Creates a new link extractor. Should be followed by a call to
setSite
|
extract(self)Extracts all the links in the page according to the patterns specified in LINK_PATTERNS. The links are stored in a map (link type -> url list) called links (accessible by 'extractor.links' where extractor is an instance of HtmlLinkExtractor) |
getLinks(self)Returns a map from link type to a list of links of that type that appeared in the page. |
getParsedContent(self)Returns the BeautifulSoup datastructure of the HTML of the site that was set using setSite . |
setSite(self, stringUrl, content)Sets the current site url and content for the extractor.
|
| Class Variable Details |
|---|
| Trees | Index | Help |
|---|
| Generated by Epydoc 2.1 on Mon Dec 12 14:30:34 2005 | http://epydoc.sf.net |