Metadata-Version: 1.0
Name: NoAho
Version: 0.9
Summary: Fast, non-overlapping simultaneous multiple keyword search
Home-page: http://pypi.python.org/pypi/noaho
Author: Jeff Donner
Author-email: jeffrey.donner@gmail.com
License: UNKNOWN
Download-URL: http://pypi.python.org/packages/source/n/noaho/noaho-0.9.tar.gz
Description: Non-Overlapping Aho-Corasick Trie
        
        Features:
        - 'short' and 'long' (longest key) searches, both one-off and iteration
          over all in some text (non-overlapping). (Short probably isn't useful
          without overlap.)
        - Works with both unicode and str in Python 2, and unicode in Python 3
          (it's all UCS4 under the hood).
        - Allows you to associate an arbitrary Python object payload with each
          keyword, and supports dict operations len, [], and 'in' for the
          keywords (though no del or traversal).
        - Does the 'compilation' (generation of Aho-Corasick failure links) of
          the trie on-demand; you can mix adding keywords and searching text
          freely.
        - Can be used commercially, it has a minimal license.
        
        Anti-Features:
        - Does not support overlapping keywords, unless you move along the
          string manually, one character at a time, which would defeat the
          purpose.
        - find[all]_short is kind of useless.
        - Lacks key iteration and deletion from the mapping (dict) protocol
        - memory leaking untested (should be ok but ...)
        - No /testcase/ for unicode in Python 2 (did manual test however)
        - Unicode chars represented as ucs4, and, each character has its own
          hashtable, so it's relatively memory-heavy.
        - Requires a C++ compiler.
        
        Bug reports and patches welcome of course!
        
        
        
        To build and install, use either
          # Python 2
          python setup.py install # (or build, and copy the .so to where you want it)
        or
          # Python 3
          python3 setup.py install # (or build, and copy the .so to where you want it)
        
        
        'text' below applies to str and unicode in Python 2, or unicode in Python 3 (all there is)
            t.add(key_text, optional payload)
            (start, end, key_payload) = t.find_short(text_to_search)
            (start, end, key_payload) = t.find_long(text_to_search)
            (start, end, key_payload) = t.findall_short(text_to_search)
            (start, end, key_payload) = t.findall_long(text_to_search)
        
        Examples:
            >>> a = NoAho()
            >>> a.add('ms windows')
            >>> a.add('ms windows 2000', "this is canonical")
            >>> a.add('windows', None)
            >>> a.add('windows 2000')
            >>> text = 'windows 2000 ms windows 2000 windows'
            >>> for k in a.findall_short(text):
            ...     print text[k[0]:k[1]]
            ...
            windows
            ms windows
            windows
            >>> for k in a.findall_long(text):
            ...     print text[k[0]:k[1]]
            ...
            windows 2000
            ms windows 2000
            windows
        
        Mapping (dictionary) methods:
            trie = NoAho()
            trie['apple'] = apple_handling_function
            trie['orange'] = Orange()
            trie.add('banana') # payload will be None
            trie['pear'] # will give key error
            assert isinstance(trie['orange'], Orange)
            assert 'banana' in trie
            len(trie)
            # No del;
            # no iteration over keys
        
        The 'find[all]_short' forms are named as long and awkwardly as they are,
        to leave plain 'find[all]' free if overlapping matches are ever implemented.
        
        
        Deep Rebuilding:
        Written in C++ and Cython.
        
        You should not need to use Cython to build and use this module, but if
        you want to change it and regenerate, there's a build command line at
        the top of noaho.pyx.  To allow the generated noaho.cpp file to be
        used in both Py2 and Py3, use a Python 3-built Cython. This was built
        with Cython-0.15.1, but even as far back as 0.13 should work (for that
        version though, you'd have to manually use a c++ compiler to do the
        last step, of linking).
        
        
        For the fullest spec of what the code will and will not so, check out
        test-noaho.py (python[3] test-noaho.py)
        
        Untested: whether the payload handling is complete, ie that there are no
        memory leaks. It should be correct though.
        
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: License :: OSI Approved :: MIT License
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: C++
Classifier: Programming Language :: Cython
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.0
Classifier: Programming Language :: Python :: 3.1
Classifier: Programming Language :: Python :: 3.2
Classifier: Operating System :: OS Independent
Classifier: Environment :: Console
Classifier: Topic :: Text Processing
