Metadata-Version: 1.1
Name: linkGrabber
Version: 0.2.6
Summary: Scrape links from a single web page
Home-page: https://github.com/detroit-media-partnership/linkGrabber
Author: Eric Bower
Author-email: neurosnap@gmail.com
License: The MIT License (MIT)
====================

Copyright (c) 2014 Detroit Media Partnership

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

Description: =====
        Link Grabber
        =====
        
        Link Grabber provides a quick and easy way to grab links from
        a single web page.  This python package is a simple wrapper 
        around `BeautifulSoup <http://www.crummy.com/software/BeautifulSoup/>`_, focusing on grabbing HTML's 
        hyperlink tag, "a." 
        
        Dependecies:
        
        *  BeautifulSoup
        *  Requests
        
        How-To
        ======
        
        .. code:: bash
        
            $ python setup.py install
        
        OR
        
        .. code:: bash
        
            $ pip install linkGrabber
        
        Quickie
        =======
        
        .. code:: python
        
            import re
            import linkGrabber
        
            links = linkGrabber.Links("http://www.google.com")
            links.find()
            # limit the number of "a" tags to 5
            links.find(limit=5)
            # filter the "a" tag href attribute
            links.find(href=re.compile("plus.google.com"))
        
        Documentation
        =============
        
        find
        ----------
        
        Parameters: 
         *  filters (dict): Beautiful Soup's filters as a dictionary
         *  limit (int):  Limit the number of links in sequential order
         *  reverse (bool): Reverses how the list of <a> tags are sorted
         *  sort (function):  Accepts a function that accepts which key to sort upon
            within the List class
        
        Find all links that have a style containing "11px"
        
        .. code:: python
        
            import re
            from linkGrabber import Links
        
            links = Links("http://www.google.com")
            links.find(style=re.compile("11px"), limit=5)
        
        Reverse the sort before limiting links:
        
        .. code:: python
        
            from linkGrabber import Links
        
            links = Links("http://www.google.com")
            links.find(limit=2, reverse=True)
        
        Sort by a link's  attribute:
        
        .. code:: python
        
            from linkGrabber import Links
        
            links = Links("http://www.google.com")
            links.find(limit=3, sort=lambda key: key['text'])
        
        Exclude text:
        
        .. code:: python
        
            import re
        
            from linkGrabber import Links
        
            links = Links("http://www.google.com")
            links.find(exclude=[{ "text": re.compile("Read More") }])
        
        Remove duplicate URLs and make the output pretty:
        
        .. code:: python
        
            from linkGrabber import Links
        
            links = Links("http://www.google.com")
            links.find(duplicates=False, pretty=True)
        
        Link Dictionary
        ---------------
        
        All attrs from BeautifulSoup's Tag object are available in the dictionary
        as well as a few extras:
        
        *  text (text inbetween the <a></a> tag)
        *  seo (parse all text after last "/" in URL and attempt to make it human readable)
        
        
        =========
        Changelog
        =========
        
        v0.2.6 (06/25/2014)
        -------------------
        
        * Exclude parameter is now a list of dictionaries
        * Added pretty property
        * Added duplicates property which will remove any identical URLs
        * Added more tests
        * Added better docs
        
        v0.2.5 (06/23/2014)
        -------------------
        
        * Added exclude parameter to Links.find() which removes 
        links that match certain criteria
        
        v0.2.4 (06/10/2014)
        -------------------
        
        * Updated documentation to be better read on pypi
        * Removed scrape.py and moved it to __init__.py
        * Now using nose for unit testing
        
        v0.2.3 (05/22/2014)
        -------------------
        
        * Updated setup py file and some verbage
        
        v0.2.2 (05/19/2014)
        -------------------
        
        * linkGrabber.Links.find() now responds with all Tag.attrs 
        from BeautifulSoup4 as well as 'text' and 'seo' keys
        
        v0.2.1 (05/18/2014)
        -------------------
        
        * Added more tests
        
        v0.2.0 (05/17/2014)
        -------------------
        
        * Modified naming convention, reduced codebase, more readable structure
        
        v0.1.9 (05/17/2014)
        -------------------
        
        * Python 3.4 compatability
        
        v0.1.8 (05/16/2014)
        -------------------
        
        * Changed paramerter names to better reflect functionality
        
        v0.1.7 (05/16/2014)
        -------------------
        
        * Update README
        
        v0.1.6 (05/16/2014)
        -------------------
        
        * Update README with more examples
        
        v0.1.5 (05/16/2014)
        -------------------
        
        * Updated find_links to accept link_reverse=(bool) and link_sort=(function)
        
        v0.1.0 (05/16/2014)
        -------------------
        
        * Initial release.
        
Keywords: linkgrabber,beautifulsoup,scraper,html,parser,hyperlinks
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: License :: OSI Approved :: MIT License
