Metadata-Version: 1.0
Name: gnip-historical
Version: 0.4.0
Summary: Gnip Historical libarary and command scripts.
Home-page: http://pypi.python.org/pypi/gnip-historical/
Author: Scott Hendrickson, Brian Lehman, Josh Montague
Author-email: scott@drskippy.net
License: LICENSE.txt
Description:                                  Python Library
                                              and
                                     Command Line Utilities
                               for Gnip Historical PowerTrack API
        
        
        The process for launching and retrieveing data for an historical historical job 
        requires only a few steps:
          1) create job
          2) retrieve and review job quote
          3) accept or reject job
          4) download data files list
          5) download data
        
        Untilities are included to assist with each step.
        
        SETUP UTILITY
        =============
        First, set up your Gnip credentials.  There is a simple utility to create the local credential 
        file named ".gnip".
        
        $ ./setup_gnip_creds.py 
        Username: shendrickson@gnip.com    
        Password: 
        Password again: 
        Endpoint URL. Enter your Account Name (eg https://historical.gnip.com:443/accounts/<account name>/): shendrickson
        Done creating file ./.gnip
        Be sure to run:
        chmod og-w .gnip
            
        $ chmod og-w .gnip
        
        If you use the example JSON job description, be sure to change the "serviceUserNameField"
        to your own, i.e., for Twitter, use your Twitter handle.
        
        You will likely wish to run these utilities from other directory locations so be sure the export an
        updated PYTHONPATH,
        
        $ export PYTHONPATH=${PYTHONPATH}:path-to-gnip-python-historical-utilities
        
        CREATE JOB
        ==========
        Create a job description by editing the example JSON file provided ("bieber_job1.json").
        
        You will end up with a single JSON record like this (see GNIP documentation for option 
        details). the fromDate and toDate are in the format YYYYmmddHHMM:
        
        {
            "dataFormat" : "activity-streams",
            "fromDate" : "201201010000",
            "publisher" : "twitter",
            "rules" : 
            [
                {
                    "tag" : "bestRuleEver",
                    "value" : "bieber"
                }
            ],
            "serviceUsername" : "PUT_YOUR_TWITTER_HANDLE_HERE",
            "streamType" : "track",
            "title" : "BieberJob1",
            "toDate" : "201201010001"
        }
        
        To create the job,
        
        $ ./create_job.py -f./bieber_job1.json -t "Social Data Phenoms - Bieber"
        
        The response is the JSON record returned by the server. It will describe the job (including
        JobID and the JobURL, or any error messages.
        
        To get help,
        
        $ ./create_job.py -h
        Usage: create_job.py [options]
        
        Options:
          -h, --help            show this help message and exit
          -u URL, --url=URL     Job url.
          -l, --prev-url        Use previous Job URL (only from this configuration
                                file.).
          -v, --verbose         Detailed output.
          -f FILENAME, --filename=FILENAME
                                File defining job (JSON)
          -t TITLE, --title=TITLE
                                Title of project, this title supercedes title in file.
        
        
        LIST JOBS, get JOB QUOTES and get JOB STATUS:
        =============================================
        $ ./list_jobs.py -h
        Usage: list_jobs.py [options]
        
        Options:
          -h, --help            show this help message and exit
          -u URL, --url=URL     Job url.
          -l, --prev-url        Use previous Job URL (only from this configuration
                                file.).
          -v, --verbose         Detailed output.
          -d SINCEDATESTRING, --since-date=SINCEDATESTRING
                                Only list jobs after date, (default
                                2012-01-01T00:00:00)
        
        For example, I have three completed jobs, a Gnip job, a Bieber job and a SXSW 
        job for which data is avaiable.
        
        $  ./list_jobs.py 
        #########################
        TITLE:     GNIP2012
        STATUS:    finished
        PROGRESS:  100.0 %
        JOB URL:   https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/eeh2vte64.json
        #########################
        TITLE:     Justin Bieber 2009
        STATUS:    finished
        PROGRESS:  100.0 %
        JOB URL:   https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/j5epx4e5c3.json
        #########################
        TITLE:     SXSW2010-2012
        STATUS:    finished
        PROGRESS:  100.0 %
        JOB URL:   https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/sbxff05b8d.json
        
        
        To see detailed information or download data filelist, 
        specify URL with -u or add -v flag (data_files.txt contains 
        only URLs from last job in list)
        
        DOWNLOAD URLS OF FILES CONTAINING DATA
        ======================================
        To retrieve the file locations for the data files this job created on S3, pass 
        the job URL with the -u flag (or if you used -u for this job previously, just use -l--see help),
        
        $  ./list_jobs.py -u  https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/sbxff05b8d.json
        #########################
        TITLE:     SXSW2010-2012
        STATUS:    finished
        PROGRESS:  100.0 %
        JOB URL:   https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/sbxff05b8d.json
        
        RESULT:
         Job completed at ........ 2012-09-01 04:35:23
         No. of Activities ....... -1
         No. of Files ............ -1
         Files size (MB) ......... -1
         Data URL ................ https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/sbxff05b8d/results.json
        DATA SET:
         No. of URLs ............. 131,211
         File size (bytes)........ 2,151,308,466
         Files (URLs) ............ https://archive.replay.historicals.review.s3.amazonaws.com/historicals/twitter/track/activity-streams/shendrickson/2012/08/28/20100101-20120815_sbxff05b8d/2010/01/01/00/00_activities.json.gz?AWSAccessKeyId=AKIAJ7O2S22DN2NDN7UQ&Expires=1349066046&Signature=hDSc0a%2BRQeG%2BknaSAWpzSUoM1F0%3D
        https://archive.replay.historicals.review.s3.amazonaws.com/historicals/twitter/track/activity-streams/shendrickson/2012/08/28/20100101-20120815_sbxff05b8d/2010/01/01/00/10_activities.json.gz?AWSAccessKeyId=AKIAJ7O2S22DN2NDN7UQ&Expires=1349066046&Signature=DOZlXKuMByv5uKgmw4QrCOpmEVw%3D
        https://archive.replay.historicals.review.s3.amazonaws.com/historicals/twitter/track/activity-streams/shendrickson/2012/08/28/20100101-20120815_sbxff05b8d/2010/01/01/00/20_activities.json.gz?AWSAccessKeyId=AKIAJ7O2S22DN2NDN7UQ&Expires=1349066046&Signature=X4SFTxwM2X9Y7qwgKCwG6fH8h7w%3D
        https://archive.replay.historicals.review.s3.amazonaws.com/historicals/twitter/track/activity-streams/shendrickson/2012/08/28/20100101-20120815_sbxff05b8d/2010/01/01/00/30_activities.json.gz?AWSAccessKeyId=AKIAJ7O2S22DN2NDN7UQ&Expires=1349066046&Signature=WVubKurX%2BAzYeZLX9UnBamSCrHg%3D
        https://archive.replay.historicals.review.s3.amazonaws.com/historicals/twitter/track/activity-streams/shendrickson/2012/08/28/20100101-20120815_sbxff05b8d/2010/01/01/00/40_activities.json.gz?AWSAccessKeyId=AKIAJ7O2S22DN2NDN7UQ&Expires=1349066046&Signature=OG9ygKlXNxFvJLlAEWi3hes5yyw%3D
        ...
        
        Writing files to data_files.txt...
        
        Filenames for the 131K files created on S3 by the job have been downloaded to a file in 
        the local directory, ./data_files.txt.
        
        DOWNLOAD DATA
        =============
        
        To retrieve this data use the utility,
        
        $ ./get_data_files.bash
        ...
        
        This will lauch up to 8 simultaneousl cUrl connections to S3 to download the files 
        into a local ./data/year/month/day/hour... directory tree (see name_mangle.py for details).
        
        ACCEPT/REJECT JOB
        =================
        After a job is quoted, you can accept or reject the job.  The job will not start until it is accepted.
        
        $ ./accept_job -u https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historicals/track/jobs/c9pe0day6h.json
        
        or 
        
        $ ./reject_job -u https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historicals/track/jobs/c9pe0day6h.json
        
        The module gnip_historical.py provides additional functionality you can access programatically.
        
        ==
        Gnip-Python-Historical-Utilities by Scott Hendrickson is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/.
        
Platform: UNKNOWN
