A simple way to transform a HTML file or URL to structured data.
For example:
html = """<!DOCTYPE html><html lang="en"><head></head>
    <body>
        <h1><b>Title</b></h1>
        <div class="description">This is not a valid HTML
    </body>
</html>"""

config = {
    'map': [
        ['body_title', u'//h1/b/text()'],
        ['description', u'//div[@class="description"]/text()'],
    ]
}

>handler = html2data()
>received_obj = handler.load(html = html, config=config)
>print received_obj
{
    'body_title': 'Title',
    'description': 'This is not a valid HTML'
}


To use it you will need:
    - lxml 2.0+
    - httplib2