Metadata-Version: 1.1
Name: speedparser
Version: 0.2.0
Summary: feedparser but faster and worse
Home-page: https://github.com/hiidef/speedparser/
Author: Jason Moiron
Author-email: jason@hiidef.com
License: MIT
Description: speedparser
        -----------
        
        Speedparser is a black-box "style" reimplementation of the `Universal Feed
        Parser <http://code.google.com/p/feedparser/>`_.  It uses some feedparser code
        for date and authors, but mostly re-implements its data normalization algorithms
        based on feedparser output.  It uses ``lxml`` for feed parsing and for optional
        HTML cleaning.  Its compatibility with ``feedparser`` is very good for a strict
        subset of fields, but poor for fields outside that subset.  See
        ``tests/speedparsertests.py`` for more information on which fields are more or
        less compatible and which are not.
        
        On an Intel(R) Core(TM) i5 750, running only on one core, ``feedparser`` managed
        ``2.5 feeds/sec`` on the test feed set (roughly 4200 "feeds" in 
        ``tests/feeds.tar.bz2``), while ``speedparser`` manages around ``65 feeds/sec``
        with HTML cleaning on and ``200 feeds/sec`` with cleaning off.
        
        installing
        ----------
        
        ``pip install speedparser``
        
        usage
        -----
        
        Usage is similar to feedparser::
        
            >>> import speedparser
            >>> result = speedparser.parse(feed)
            >>> result = speedparser.parse(feed, clean_html=False)
        
        differences
        -----------
        
        There are a few interface differences and many result differences between
        speedparser and feedparser.  The biggest similarity is that they both return
        a ``FeedParserDict()`` object (with keys accessible as attributes), they both
        set the ``bozo`` key when an error is encountered, and various aspects of the
        ``feed`` and ``entries`` keys are likely to be identical *or* very similar.
        
        ``speedparser`` uses different (and in some cases less or none; buyer beware)
        data cleaning algorithms than ``feedparser``.  When it is enabled, lxml's
        ``html.cleaner`` library will be used to clean HTML and give similar but not
        identical protection against various attributes and elements.  If you supply
        your own ``Cleaner`` element to the "``clean_html`` kwarg, it will be used
        by ``speedparser`` to clean the various attributes of the feed and entries.
        
        ``speedparser`` does not attempt to fix character encoding by default because
        this processing can take a long time for large feeds.  If the encoding value of
        the feed is wrong, or if you want this extra level of error tollerance, you
        can either use the ``chardet`` module to detect the encoding based on the
        document or pass ``encoding=True`` to ``speedparser.parse`` and it will fall
        back to encoding detection if it encounters encoding errors.
        
        If your application is using ``feedparser`` to consume many feeds at once and
        CPU is becoming a bottleneck, you might want to try out ``speedparser`` as an
        alternative (using ``feedparser`` as a backup).  If you are writing an
        application that does not ingest many feeds, or where CPU is not a problem,
        you should use ``feedparser`` as it is flexible with bad or malformed data and
        has a much better test suite.
        
        
        
Keywords: feedparser rss atom rdf lxml
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Intended Audience :: Developers
Classifier: Operating System :: POSIX
Classifier: Development Status :: 4 - Beta
