Metadata-Version: 1.1
Name: natto-py
Version: 0.0.9
Summary: A Tasty Python Binding with MeCab (FFI-based, no SWIG or compiler necessary)
Home-page: https://bitbucket.org/buruzaemon/natto-py
Author: Brooke M. Fujita
Author-email: buruzaemon@gmail.com
License: BSD
Description: natto-py
        ========
        
        What is natto-py?
        -----------------
        A package leveraging FFI (foreign function interface), ``natto-py`` combines
        the Python_ programming language with MeCab_, the part-of-speech and
        morphological analyzer for the Japanese language. No compiler is necessary, as
        it is **not** a C extension. ``natto-py`` will run on Mac OS, Windows and
        \*nix.
        
        You can learn more about `natto-py at Bitbucket`_.
        
        Requirements
        -------------
        ``natto-py`` requires the following:
        
        - An existing installation of `MeCab 0.996`_
        - A system dictionary, like `mecab-ipadic`_ or `mecab-jumandic`_
        - `cffi 0.8.6`_ or greater
        
        The following Python versions are supported:
        
        - `Python 2.7.8`_
        - `Python 3.2.5`_
        - `Python 3.3.5`_
        - `Python 3.4.2`_
        
        Installation
        ------------
        Install ``natto-py`` as you would any other Python package::
        
            $ pip install natto-py
        
        This will automatically install the ``cffi`` package, which ``natto-py`` uses
        to bind to the ``mecab`` library.
        
        Configuration
        -------------
        As long as the ``mecab`` (and ``mecab-config`` for \*nix and Mac OS)
        executables are on your ``PATH``, ``natto-py`` should not require any explicit
        configuration. 
        
        * On \*nix and Mac OS, it queries ``mecab-config`` to discover the path to the ``libmecab.so`` or ``libmecab.dylib``, respectively.
        * On Windows, it queries the Windows Registry to locate the MeCab installation folder.
        * In order to convert character encodings to/from Unicode, ``natto-py`` will examine the charset of the ``mecab`` system dictionary.
        
        Explicit configuration via MECAB_PATH and MECAB_CHARSET
        -------------------------------------------------------
        If ``natto-py`` for some reason cannot locate the ``mecab`` library,
        or if it cannot determine the correct charset used internally by
        ``mecab``, then you will need to set the ``MECAB_PATH`` and ``MECAB_CHARSET``
        environment variables. 
        
        * Set the ``MECAB_PATH`` environment variable to the exact name/path to your ``mecab`` library.
        * Set the ``MECAB_CHARSET`` environment variable if you compiled ``mecab`` and the related dictionary to use a non-default character encoding.
        
        e.g., for Mac OS::
        
            export MECAB_PATH=/usr/local/Cellar/mecab/0.996/lib/libmecab.dylib
            export MECAB_CHARSET=utf8
        
        e.g., for bash on UNIX/Linux::
        
            export MECAB_PATH=/usr/local/lib/libmecab.so
            export MECAB_CHARSET=euc-jp
        
        e.g., on Windows::
        
            set MECAB_PATH=C:\Program Files\MeCab\bin\libmecab.dll
            set MECAB_CHARSET=shift-jis
        
        e.g., from within a Python program::
        
            import os
        
            os.environ['MECAB_PATH']='/usr/local/lib/libmecab.so'
            os.environ['MECAB_CHARSET']='utf-16'
        
        Usage
        -----
        Here's a very quick guide to using ``natto-py``.
        
        Instantiate a reference to the ``mecab`` library, and display some details::
        
            from natto import MeCab
        
            nm = MeCab()
            print(nm)
        
            # displays details about the MeCab instance
            <natto.mecab.MeCab
             pointer=<cdata 'mecab_t *' 0x000000000037AB40>,
             libpath="/usr/local/lib/libmecab.so",
             options={},
             dicts=[<natto.dictionary.DictionaryInfo
                     pointer=<cdata 'mecab_dictionary_info_t *' 0x00000000003AC530>,
                     filepath="/usr/local/lib/mecab/dic/ipadic/sys.dic",
                     charset=utf8,
                     type=0],
             version=0.996>
        
        ----
        
        Display details about the ``mecab`` system dictionary used::
        
            sysdic = nm.dicts[0]
            print(sysdic)
        
            # displays the MeCab system dictionary info
            <natto.dictionary.DictionaryInfo
             pointer=<cdata 'mecab_dictionary_info_t *' 0x00000000003AC530>,
             filepath="/usr/local/lib/mecab/dic/ipadic/sys.dic",
             charset=utf8,
             type=0>
        
        ----
        
        Parse Japanese text and send the MeCab result as a string to ``stdout``::
        
            print(nm.parse('ピンチの時には必ずヒーローが現れる。'))
        
            # MeCab result as a single string
            ピンチ    名詞,一般,*,*,*,*,ピンチ,ピンチ,ピンチ
            の      助詞,連体化,*,*,*,*,の,ノ,ノ
            時      名詞,非自立,副詞可能,*,*,*,時,トキ,トキ
            に      助詞,格助詞,一般,*,*,*,に,ニ,ニ
            は      助詞,係助詞,*,*,*,*,は,ハ,ワ
            必ず    副詞,助詞類接続,*,*,*,*,必ず,カナラズ,カナラズ
            ヒーロー  名詞,一般,*,*,*,*,ヒーロー,ヒーロー,ヒーロー
            が      助詞,格助詞,一般,*,*,*,が,ガ,ガ
            現れる  動詞,自立,*,*,一段,基本形,現れる,アラワレル,アラワレル
            。      記号,句点,*,*,*,*,。,。,。
            EOS
        
        ----
        
        Next, try parsing the text with MeCab node parsing. A generator yielding the
        MeCab nodes lets you efficiently iterate over the output, without first
        materializing each and every resulting MeCab node instance. The MeCab nodes 
        yielded allow access to more detailed information about each morpheme.
        
        Here we use a `Python with statement`_ to automatically clean up after we 
        finish node parsing with the MeCab tagger. This is the recommended approach if
        you are use ``natto-py`` in a production environment::
        
            # use a Python with statement 
            # to ensure mecab_destroy is invoked
            with MeCab() as nm:
                for n in nm.parse('ピンチの時には必ずヒーローが現れる。', as_nodes=True):
            ...     if not n.is_eos():
            ...         print("{}\t{}".format(n.surface, n.cost))
            ...
            ピンチ    3348
            の        3722
            時        5176
            に        5083
            は        5305
            必ず    7525
            ヒーロー   11363
            が       10508
            現れる   10841
            。        7127
        
        ----
        
        MeCab output formatting is extremely flexible, and is highly recommended for
        any serious natural language processing task. Rather than obtaining MeCab's
        output as a large, single string and then parsing that, try using MeCab's 
        ``--node-format`` option to customize the node's feature value.
        
        This example formats the node feature example and extracts the following as a
        comma-separated value:
        
        * morpheme surface
        * part-of-speech
        * part-of-speech ID
        * pronunciation
        
        The ``-F`` short form of the ``--node-format`` option is used here::
        
            # -F    ... short-form of --node-format
            # %m    ... morpheme surface
            # %f[0] ... part-of-speech
            # %h    ... part-of-speech id (ipadic)
            # %f[8] ... pronunciation
            with MeCab('-F%m,%f[0],%h,%f[8]') as nm:
                for n in nm.parse('ピンチの時には必ずヒーローが現れる。', as_nodes=True):
            ...     if not n.is_eos():
            ...         print(n.feature)
            ...
            ピンチ,名詞,38,ピンチ
            の,助詞,24,ノ
            時,名詞,66,トキ
            に,助詞,13,ニ
            は,助詞,16,ワ
            必ず,副詞,35,カナラズ
            ヒーロー,名詞,38,ヒーロー
            が,助詞,13,ガ
            現れる,動詞,31,アラワレル
            。,記号,7,。
        
        ----
        
        Learn More
        ----------
        * You can read more about ``natto-py`` on the `project Wiki`_.
        * `API documentation on Read the Docs`_.
        
        Contributing to natto-py
        ------------------------
        - Use mercurial_ and `check out the latest code at Bitbucket`_ to make sure the
          feature hasn't been implemented or the bug hasn't been fixed yet.
        - `Browse the issue tracker`_ to make sure someone already hasn't requested it
          and/or contributed it.
        - Fork the project.
        - Start a feature/bugfix branch.
        - Commit and push until you are happy with your contribution.
        - Make sure to add tests for it. This is important so I don't break it in a
          future version unintentionally. I use unittest_ as it is very natural
          and easy-to-use.
        - Please try not to mess with the ``setup.py``, ``CHANGELOG``, or version
          files. If you must have your own version, that is fine, but please isolate
          to its own commit so I can cherry-pick around it.
        
        Changelog
        ---------
        Please see the ``CHANGELOG`` for the release history.
        
        Copyright
        ---------
        Copyright |copy| 2014-2015, Brooke M. Fujita. All rights reserved. Please see
        the LICENSE file for further details.
        
        .. _Python: http://www.python.org/
        .. _MeCab: http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html
        .. _mecab-ipadic: https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz
        .. _mecab-jumandic: https://mecab.googlecode.com/files/mecab-jumandic-5.1-20070304.tar.gz
        .. _natto-py at Bitbucket: https://bitbucket.org/buruzaemon/natto-py
        .. _MeCab 0.996: http://code.google.com/p/mecab/downloads/list
        .. _cffi 0.8.6: https://bitbucket.org/cffi/cffi
        .. _Python 2.7.8: https://www.python.org/download/releases/2.7.8/
        .. _Python 3.2.5: https://www.python.org/download/releases/3.2.5/
        .. _Python 3.3.5: https://www.python.org/download/releases/3.3.5/
        .. _Python 3.4.2: https://www.python.org/downloads/release/python-342/
        .. _NLTK3's lead: https://github.com/nltk/nltk/wiki/Porting-your-code-to-NLTK-3.0
        .. _Python with statement: https://www.python.org/dev/peps/pep-0343/
        .. _project Wiki: https://bitbucket.org/buruzaemon/natto-py/wiki/Home
        .. _API documentation on Read the Docs: http://natto-py.readthedocs.org/en/latest/
        .. _mercurial: http://mercurial.selenic.com/
        .. _check out the latest code at Bitbucket: https://bitbucket.org/buruzaemon/natto-py/src
        .. _Browse the issue tracker: https://bitbucket.org/buruzaemon/natto-py/issues?status=new&status=open
        .. _unittest: http://pythontesting.net/framework/unittest/unittest-introduction/
        .. |copy| unicode:: 0xA9 .. copyright sign
        
Keywords: MeCab 和布蕪 納豆 Japanese morphological analyzer NLP 形態素解析 自然言語処理 FFI binding バインディング
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Natural Language :: Japanese
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: POSIX :: BSD
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.2
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
