Metadata-Version: 1.0
Name: leaf
Version: 0.4.3
Summary: Simple Python library for HTML parsing
Home-page: https://github.com/penpen/Leaf
Author: Roman Koblov
Author-email: pingu.g@gmail.com
License: MIT
Description: Leaf
        ====
        What is this?
        -------------
        This is a simple wrapper around lxml, which adds some nice features,
        which make work with lxml better. This library covers all my needs in
        html parsing.
        
        Dependencies
        ------------
        `lxml <http://lxml.de/>`_ obviously :3
        
        Features
        --------
         * Nice jquery-like css selectors
         * Simple access to element attributes
         * Easy way for convert html to other format (bbcode, markdown, etc)
         * Few nice functions for work with text
         * And, of course this saves all original features of lxml
        
        Description
        -----------
        Main function of module (as I mind) is leaf.parse, this function takes string with 
        html as an argument, and returns leaf.Parser object, which wraps lxml object.
        With this object you can do anything you want, like this::
        
        	document = leaf.parse(sample)
        	links = document('div#menu a') # get links in div with id menu through css selectors
        
        Or you can do this::
        
        	link = document.get('div#menu a') # get first link or return None
        
        And you can get attributes from these results like this::
        
        	print link.onclick
        
        Anyway, you can use standard lxml methods like object.xpath, and they returns results 
        wrapped into leaf.Parser.
        So, my favorite feature is parsing html into bbcode (markdown, etc)::
        
        	# Lets define simple formatter, which pass text 
        	# and wraps links into [url][/url] (like bbcode)
        	def omgcode_formatter(element, children):
        		# Replace <br> tag with line break
        	    if element.tag == 'br':
        	        return '\n'
        		# Wrap links into [url][/url]
        	    if element.tag == 'a':
        	        return u"[url=link}]{text}[/url]".format(link=element.href, text=children)
        		# Return children only for other elements.
        	    if children:
        	        return children
        
        This function will be recursively called with element and children (this is string with 
        children parsing result).
        So, lets call this parser in some leaf.Parser object::
        
        	document.parse(omgcode_formatter)
        
        More detailed examples availible in the tests.
        
        Finally, this library has some nice functions for work with text:
        
        *to_unicode* -- Convert string to unicode string
        
        *strip_accents* -- Strip accents from a string
        
        *strip_symbols* -- Strip ugly unicode symbols from a string
        
        *strip_spaces* -- Strip excess spaces from a string
        
        *strip_linebreaks* -- Strip excess line breaks from a stringChange log
        ==========
        
        0.4.4
        -----
         - fix inner_html method
         - added **kwargs to the parse function, added inner_html method to the Parser class
         - cssselect in deps
        
        0.4.2
        -----
         - Node attribute modification via node.href = '/blah'
         - Custom default value for get: document.get(selector, default=None)
         - Get element by index: document.get(selector, index)
        
        0.4.1
        -----
         - bool(node) returns True if element exists and False if element is None
         
        
        0.4
        ---
        First public version
        
Keywords: html,parsing,web scrapping
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Topic :: Scientific/Engineering :: Information Analysis
