Metadata-Version: 1.1
Name: google-ngram-downloader
Version: 4.0.0
Summary: The streaming access to the Google ngram data.
Home-page: https://github.com/dimazest/google-ngram-downloader
Author: Dmitrijs Milajevs
Author-email: dimazest@gmail.com
License: MIT License
Description: =========================
         Google ngram downloader
        =========================
        
        .. image:: https://travis-ci.org/dimazest/google-ngram-downloader.png?branch=master
            :target: https://travis-ci.org/dimazest/google-ngram-downloader
        
        .. image:: https://coveralls.io/repos/dimazest/google-ngram-downloader/badge.png?branch=master
            :target: https://coveralls.io/r/dimazest/google-ngram-downloader?branch=master
        
        .. image:: https://requires.io/github/dimazest/google-ngram-downloader/requirements.png?branch=master
           :target: https://requires.io/github/dimazest/google-ngram-downloader/requirements/?branch=master
           :alt: Requirements Status
        
        .. image:: https://pypip.in/v/google-ngram-downloader/badge.png
            :target: https://crate.io/packages/google-ngram-downloader/
            :alt: Latest PyPI version
        
        .. image:: https://pypip.in/d/google-ngram-downloader/badge.png
            :target: https://crate.io/packages/google-ngram-downloader/
            :alt: Number of PyPI downloads
        
        `The Google Books Ngram Viewer dataset`__ is a freely available resource under
        a `Creative Commons Attribution 3.0 Unported License`__ which provides ngram
        counts over books scanned by Google.
        
        __ http://storage.googleapis.com/books/ngrams/books/datasetsv2.html
        __ http://creativecommons.org/licenses/by/3.0/
        
        The data is so big, that storing it is almost impossible. However, sometimes
        you need an aggregate data over the dataset. For example to build a
        co-occurrence matrix.
        
        This package provides an iterator over the dataset stored at Google. It
        decompresses the data on the fly and provides you the access to the underlying
        data.
        
        Features
        ========
        
        * Download ngrams of various length and languages.
        * Access to part of ngrams, e.g. ones that start with an 'a'.
        
        Installation
        ============
        
        ::
        
            pip install google-ngram-downloader
        
        
        The command line tool
        =====================
        
        It also provides a simple command line tool to download the ngrams called
        `google-ngram-downloader`. Refer to the help to see available actions::
        
            google-ngram-downloader help
            usage: google-ngram-downloader <command> [options]
        
            commands:
        
             cooccurrence  Write the cooccurrence frequencies of a word and its contexts.
             download      Download The Google Books Ngram Viewer dataset version 20120701.
             help          Show help for a given help topic or a help overview.
             readline      Print the raw content.
        
        
        Example use of the API
        ======================
        
        >>> from google_ngram_downloader import readline_google_store
        >>>
        >>> fname, url, records = next(readline_google_store(ngram_len=5))
        >>> fname
        'googlebooks-eng-all-5gram-20120701-0.gz'
        >>> url
        'http://storage.googleapis.com/books/ngrams/books/googlebooks-eng-all-5gram-20120701-0.gz'
        >>> next(records)
        Record(ngram=u'0 " A most useful', year=1860, match_count=1, volume_count=1)
        
        Changes
        =======
        
        Version 4.0.0
        
        * Added 'indices' keyword. Thanks to neocortex.
        * Added 'language' flat. Thanks to Ray Powell (rpowellgit).
        
        Version 3.1.1
        -------------
        
        * Non-unique contexts are taken into account inside of an ngram.
        
        Version 3.1
        -----------
        
        * The ``cooccurrence`` command does not perform any ngram modification.
        
        Version 3.0
        -----------
        
        * `download`, `readile` and `cooccurrence` subcommands.
        * `readline_google_store` transforms lines to `Record` in several processes.
        
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Topic :: Utilities
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
