A small API to read and analyze CSV files by inferring types for each column of data.
Currently, only int, float and string types are supported. from collections import namedtuple
def cast(table)
cast type casts all of the values in table to their
corresponding types in types.
The only special case here is missing values or NULL columns. If a
value is missing or a column has type NULL (i.e., all values are
missing), then the value is replaced with None.
N.B. cast is idempotent. i.e., cast(x) = cast(cast(x)).
def cell_str(cell_contents)
cell_str is a convenience function for converting cell contents
to a string when there are still NULL values.
N.B. If you choose to work with data while keeping NULL values, you will likely need to write more functions similar to this one.
def column(table, colname)
column returns a named tuple Column of the column in
table with name colname.
def columns(table)
columns returns a list of all columns in the data set, where each
column has type Column.
def convert_columns(table, **kwargs)
convert_columns executes converter functions on specific columns,
where the parameter names for kwargs are the column names, and
the parameter values are functions of one parameter that return a
single value.
For example
convert_columns(names, rows, colname=lambda s: s.lower())
would convert all values in the column with name colname to
lowercase.
def convert_missing_cells(table, dstr='', dint=0, dfloat=0.0)
convert_missing_cells changes the values of all NULL cells to the
values specified by dstr, dint and dfloat. For example, all
NULL cells in columns with type str will be replaced with the
value given to dstr.
def convert_types(table, fstr=None, fint=None, ffloat=None)
convert_types works just like convert_columns, but on
types instead of specific columns.
def frequencies(column)
frequencies returns a dictionary where the keys are unique values
in the column, and the values correspond to the frequency of each
value in the column.
def map_data(table, f)
map_data executes f on every cell in table with five
arguments, in order: column type, column name, row index, column
index, contents. The result of the function is placed in the
corresponding cell location.
A new Table is returned with the converted values.
def map_names(table, f)
map_names executes f on every column header in table, with
three arguments, in order: column type, column index, column
name. The result of the function is placed in the corresponding
header location.
A new Table is returned with the new column names.
def print_data_table(table)
print_data_table is a convenience function for pretty-printing
the data in tabular format, including header names and type
annotations.
def read(fname, delimiter=',', skip_header=False)
read loads cell data, column headers and type information
for each column given a file path to a CSV formatted file. A
Table namedtuple is returned with fields types,
names and rows.
All cells have left and right whitespace trimmed.
All rows must be the same length.
delimiter is the string the separates each field in a row.
If skip_header is set, then no column headers are read, and
column names are set to their corresponding indices (as strings).
class Column
Column(type, name, cells)
class Table
Table(types, names, rows)
Documentation generated by
pdoc.