============
csvvalidator
============

This module provides some simple utilities for validating data contained in CSV 
files, or other similar data sources.

Note that the `csvvalidator` module is intended to be used in combination with 
the standard Python `csv` module. The `csvvalidator` module **will not** 
validate the *syntax* of a CSV file. Rather, the `csvvalidator` module can be 
used to validate any source of row-oriented data, such as is provided by a 
`csv.reader` object.

I.e., if you want to validate data from a CSV file, you have to first construct 
a CSV reader using the standard Python `csv` module, specifying the appropriate 
dialect, and then pass the CSV reader as the source of data to either the 
`CSVValidator.validate` or the `CSVValidator.ivalidate` method.

The `CSVValidator` class is the foundation for all validator objects that are 
capable of validating CSV data. 

You can use the CSVValidator class to dynamically construct a validator, e.g.::

    import sys
    import csv
    from csvvalidator import *

    field_names = (
                   'study_id', 
                   'patient_id', 
                   'gender', 
                   'age_years', 
                   'age_months',
                   'date_inclusion'
                   )

    validator = CSVValidator(field_names)
    
    # basic header and record length checks
    validator.add_header_check('EX1', 'bad header')
    validator.add_record_length_check('EX2', 'unexpected record length')
    
    # some simple value checks
    validator.add_value_check('study_id', int, 
                              'EX3', 'study id must be an integer')
    validator.add_value_check('patient_id', int, 
                              'EX4', 'patient id must be an integer')
    validator.add_value_check('gender', enumeration('M', 'F'), 
                              'EX5', 'invalid gender')
    validator.add_value_check('age_years', number_range_inclusive(0, 120, int), 
                              'EX6', 'invalid age in years')
    validator.add_value_check('date_inclusion', datetime_string('%Y-%m-%d'),
                              'EX7', 'invalid date')
    
    # a more complicated record check
    def check_age_variables(r):
        age_years = int(r['age_years'])
        age_months = int(r['age_months'])
        valid = (age_months >= age_years * 12 and 
                 age_months % age_years < 12)
        if not valid:
            raise ValueError(age_years, age_months)
    validator.add_record_check(check_age_variables,
                               'EX8', 'invalid age variables')

    # validate the data and write problems to stdout    
    data = csv.reader('/path/to/data.csv', delimiter='\t')
    problems = validator.validate(data)
    write_problems(problems, sys.stdout)

For more complex use cases you can also sub-class `CSVValidator` to define 
re-usable validator classes for specific data sources.

The source code for this module lives at: 

    https://github.com/alimanfoo/csvvalidator

For a complete account of all of the functionality available from this module, 
see the example.py and tests.py modules in the source code repository.

