Metadata-Version: 1.1
Name: datagristle
Version: 0.45
Summary: A toolbox and library of ETL & data analysis tools
Home-page: http://github.com/kenfar/DataGristle
Author: Ken Farmer
Author-email: kenfar@gmail.com
License: BSD
Download-URL: http://github.com/downloads/kenfar/DataGristle/DataGristle-0.45.tar.gz
Description: Datagristle is a toolbox of tough and flexible data connectors and analyzers.  
        It's kind of an interactive mix between ETL and data analysis optimized for 
        rapid analysis and manipulation of a wide variety of data.
        
        It's neither an enterprise ETL tool, nor an enterprise analysis, reporting, 
        or data mining tool.  It's intended to be an easily-adopted tool for technical
        analysts that combines the most useful subset of data transformation and 
        analysis capabilities necessary to do 80% of the work.  Its open source python
        codebase allows it to be easily extended to with custom code to handle that
        always challenging last 20%.
        
        Current Status:  Strong support for easy analysis and simple transformations of
        csv files. 
        
        ###Next Steps:  
        
           * attractive PDF output of gristle_determinator.py
           * metadata database population
        
        ###Its objectives include:
        
           * multi-platform (unix, linux, mac os, windows with effort) 
           * multi-language (primarily python)
           * free - no cripple-licensing
           * primary audience is programming data analysts - not non-technical analysts
           * primary environment is command-line rather than windows, graphical desktop
             or eclipse
           * extensible
           * allow a bi-directional iteration between ETL & data analysis
           * can quickly perform initial data analysis prior to longer-duration, deeper
             analysis with heavier-weight tools.
        
        ###Dependencies
        
           * Python 2.6 or Python 2.7
        
        ###Mature Existing utilities:
        
           * gristle_determinator.py 
               - Identifies file formats, generates metadata, prints file analysis report
               - This is the most mature - and also used by the other utilities so that 
                 you generally do not need to enter file structure info.
           * gristle_freaker.py
               - Produces a frequency distribution of multiple columns from input file.
           * gristle_slicer.py
               - Used to extract a subset of columns and rows out of an input file.
           * gristle_viewer.py
               - Shows one record from a file at a time - formatted based on metadata. 
        
        ###Immature Existing utilities:
        
           * gristle_differ.py
               - Shows differences between two files
           * gristle_file\_converter.py
               - Converts a csv from one dialect to another.  Can handle multi-character
                 field delimiters as well as record delimiters.
           * gristle_filter.py 
               - Applies simple filter logic to file.
           * gristle_scalar.py
               - Performs scalar operations (min, max, avg, count unique, etc) on a file
           * gristle_validator.py
               - Validates a file - currently just confirms number of fields for each row.
        
        ###Future utilities:
        
           * gristle_metadata.py 
               - Manages metadata - allows users to query, add, update, delete
                 file, field, transformation, reporting descriptions.
           * gristle_generator
               - Generates test data based on gristle metadata
           * gristle_validator 
               - Confirms validity of database and file structure and contents.
           * gristle_file\_joiner.py
               - joins two files on their common keys and produces a new file
           * gristle_grouper.py
               - reads a file, aggregates on a given set of fields, produces a new file
           * gristle_db\_loader.py 
               - loads a file into a database
           * gristle_db\_extractor.py 
               - extracts data from a database into a file
           * gristle_field\_merge.py 
               - prints the matched values from multiple files side by side along with counts
        
        ###Licensing
        
           * Gristle uses the BSD license - see the separate LICENSE file for further 
             information
        
        
        
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python
Classifier: Operating System :: POSIX
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing
Classifier: Topic :: Utilities
