Metadata-Version: 1.1
Name: whoosh-igo
Version: 0.7
Summary: tokenizers for Whoosh designed for Japanese language
Home-page: https://github.com/hideaki-t/whoosh-igo/
Author: Hideaki Takahashi
Author-email: mymelo@gmail.com
License: Apache License, Version 2.0
Description: ================================
         Japanese Tokenizers for Whoosh
        ================================
        
        About
        =====
        
        Tokenizers for Whoosh full text search library designed for Japanese language.
        This package conteins two Tokenizers.
        
        * IgoTokenizer
        
         + requires igo-python(http://pypi.python.org/pypi/igo-python/) and its dictionary.
        
        * TinySegmenterTokenizer
        
         + requires TinySegmenter in Python(https://code.google.com/p/mhagiwara/source/browse/trunk/nltk/jpbook/tinysegmenter.py)
        
        * MeCabTokenizer
        
         * requires MeCab python binding(http://mecab.sourceforge.net/bindings.html)
        
        
        How To Use
        ==========
        
        IgoTokenizer::
        
         import igo.Tagger
         import whooshjp
         from whooshjp.IgoTokenizer import IgoTokenizer
        
         tk = IgoTokenizer(igo.Tagger.Tagger('ipadic'))
         scm = Schema(title=TEXT(stored=True, analyzer=tk), path=ID(unique=True,stored=True), content=TEXT(analyzer=tk))
        
        
        TinySegmenterTokenizer::
        
         import tinysegmenter
         import whooshjp
         from whooshjp.TinySegmenterTokenizer import TinySegmenterTokenizer
        
         tk = TinySegmenterTokenizer(tinysegmenter.TinySegmenter())
         scm = Schema(title=TEXT(stored=True, analyzer=tk), path=ID(unique=True,stored=True), content=TEXT(analyzer=tk))
        
        
        
        Changelog for Japanese Tokenizers for Whoosh
        ============================================
        
        2011-02-19 -- 0.1
            * first release.
        
        2011-02-21 -- 0.2
            * add TinySegmenterTokenizer
            * change module name
        
        2011-02-24 -- 0.3
            * add FeatureFilter
        
        2011-02-27 -- 0.4
            * add MeCabTokenizer
            * add a mode for don't pickle igo tagger to minimize index.
        
        2011-04-17 -- 0.5
            * correct char offsets
        
        2011-04-17 -- 0.6
            * correct char offsets(TinySegmenterTokenizer)
        
        2012-04-14 -- 0.7
            * rename package(WhooshJapaneseTokenizer to whooshjp)
            * no longer import sub modules automatically
            * Python3 compatibility(3.2, 3.3)
            * Drop Python2.5 support
        
        
        
Keywords: japanese,tokenizer
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: Japanese
Classifier: Operating System :: OS Independent
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.2
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
