Metadata-Version: 1.1
Name: ngram
Version: 3.3.0
Summary: The NGram class extends the Python 'set' class with efficient

Home-page: http://github.com/gpoulter/python-ngram
Author: Graham Poulter
Author-email: UNKNOWN
License: LGPLv3+
Download-URL: http://pypi.python.org/pypi/ngram
Description: fuzzy search for members by means of an N-gram similarity measure.
        It also has static methods to compare a pair of strings.
        
        The N-grams are character based not word-based, and the class does not
        implement a language model, merely searching for members by string similarity.
        
        The `documentation`_ and `tutorial`_ are on the
        PyPI package documentation site.
        
        Installation
        ============
        
        Install python-ngram from `PyPI`_ using `pip installer`_::
        
           pip install ngram
        
        It should run on Python 2.6, Python 2.7 and Python 3.2
        
        How does it work?
        =================
        
        The set stores arbitrary items, but for non-string items a `key` function
        (such as `str`) must be specified to provide a string represenation.  The key
        function can also be used to normalise string items (e.g. lower-casing) prior
        to N-gram indexing.
        
        To index a string it pads the string with a specified dummy character, then
        splits it into overlapping substrings of N (default N=3) characters in length
        and associates each N-gram to the items that use it.
        
        To find items similar to a query string, it splits the query into N-grams,
        collects all items sharing at least one N-gram with the query,
        and ranks the items by score based on the ratio of shared to unshared
        N-grams between strings.
        
        History
        =======
        
        In 2007, Michel Albert (exhuma) wrote the python-ngram module based on Perl's
        `String::Trigram`_ module by Tarek Ahmed, and committed the code for 2.0.0b2 to
        a now-disused `Sourceforge`_ subversion repo.
        
        Since late 2008, Graham Poulter has maintained python-ngram, initially refactoring
        it to build on the `set` class, and also adding features, documentation, tests,
        performance improvements and Python 3 support.
        
        Primary development takes place on `GitHub`_, but changes are also pushed
        to the earlier repo on `Google Code`_.
        
        .. _documentation: http://packages.python.org/ngram/
        .. _tutorial: http://packages.python.org/ngram/tutorial.html
        .. _PyPI: http://pypi.python.org/pypi/ngram
        .. _pip installer: http://www.pip-installer.org
        .. _String::Trigram: http://search.cpan.org/dist/String-Trigram/
        .. _Sourceforge: https://sourceforge.net/projects/python-ngram/
        .. _GitHub: http://github.com/gpoulter/python-ngram
        .. _Google Code: http://code.google.com/p/python-ngram/
        
Keywords: ngram set string text similarity
Platform: any
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)
Classifier: License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)
Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)
Classifier: Natural Language :: English
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: Indexing
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.2
