*****
Usage
*****

==========
Invocation
==========

``DataMatrix`` requires a file, or file-like object. A typical invocation is::

	import datamatrix
	matrix = datamatrix.DataMatrix(open("somefile"), header=True)

Aside the file object, which is mandatory, there are a number of parameters that can be used. First of all, the ``header`` parameters tells ``DataMatrix`` if the file to read has a header or not, and if so, the header will be used to assign names to the columns. Otherwise, it will just be a number for each column.  To specify the column where row names are located, the ``row_names`` parameter is used: ::

	matrix = datamatrix.DataMatrix(open("somefile", header=True, row_names=1))

In this case, row names are obtained from the first column in the file. 

If you are loading a file with an empty first element on the header (that is the case with files saved by R) you must set the ``fixR`` parameter to ``True``, which will work around this issue, otherwise you will obtain unpredictable results.
``DataMatrix`` uses the ``csv`` module to do its parsing, so you can specify additional parameters to define the format of your data, such as delimiter
(the separator between fields), lineterminator and quoting (how to deal with non-numeric fields). See the csv module documentation for additional details.

Notice that since the ``csv`` module does not support Unicode input, using Unicode text with ``DataMatrix`` may give unpredictable results.

Lastly, you can tell the initializer to skip a certain numbers of lines using the ``skip`` parameter. 

.. versionadded:: 0.9

.. seealso::

   Module :mod:`csv`
      Documentation of the :mod:`csv` standard module.

================
Basic operations
================

If you print a ``DataMatrix`` instance, you'll get some basic information: ::

	>>> print matrix
	    File name:
	    Column with identifier names: None (numeric)
	    No. of rows: 2
	    No. of columns: 2
	    Columns: Name, surname

With the columns attribute you can view the columns as a list: ::

	>>> print matrix.columns
	    ['Name', 'surname']

Row names can be printed intead with the ``rownames`` attribute.

You can access specific rows with the ``getRow`` method: ::

	>>> matrix.getRow(1)
	    ['1', 'Albert', 'Einstein']

Or specific columns with a dictionary-like syntax:::

	>>> matrix["surname"]
	    ['Einstein', 'Marx']

.. versionchanged:: 0.8 

In ``DataMatrix`` versions prior to 0.8, the ``getColumn`` method was used. This is no longer the case: the method has been marked as deprecated and will be removed in future versions.

To get a representation of your data, there is the ``view`` method: ::

	>>> matrix.view()
	    1 Albert Einstein
	    2 Groucho Marx

===========================
Row and column manipulation
===========================

Rows and columns can be appended with the ``append`` and ``appendRow`` methods, respectively. In both cases, the item to be appended needs to be a sequence
(list or tuple) and must be as long as the other columns (when appending columns) or cover all the columns (when appending rows): ::

	>>> profession = ["scientist", "comedian"] # new column
	>>> matrix.append(profession, "Job")

	>>> entry = ["Isaac", "Asimov", "writer"] # new row
	>>> matrix.appendRow(entry,"3")

Notice that when you append a row and a column you must specify a column or a row name to the methods, as the examples above show. Also, the rows and columns you are apppending need to be of the same length of the rows (or columns) already present in the ``DataMatrix`` instance.

Alternatively, you can insert rows and columns at a specified position using the ``insert`` (for columns) and ``insertRow`` (for rows). They behave exactly like the ``append*`` methods, with the difference that you must supply an integer argument (1 or greater than 1) representing the column or row number: ::

	>>> matrix.insert(profession,"Job",2)
	>>> matrix.inserRow(entry,"3",1)

.. versionadded:: 0.7

If the number is greater than the number of columns or rows available, the method automatically defaults to the append variant. Again, rows and columns must be of the same length as the ones already present in the instance.

.. versionadded:: 0.9
        You can bind multiple DataMatrix instances by rows and columns, using the ``cbind`` and ``rbind`` functions, which join matrices by columns and rows.

An example::

        >>> new_matrix = datamatrix.cbind(matrix1, matrix2)
        >>> new_matrix = datamatrix.rbind(matrix1, matrix2)


Attempting to bind matrices of unequal lengths (rows or columns depending on the used function) will raise a ``ValueError`` exception.

==========
Subsetting
==========

.. versionadded:: 0.9

You can generate subsets of your matrices using the ``subset`` function and using a list of columns as a parameter. The result is a new DataMatrix instance::

        new_matrix = datamatrix.subset(old_matrix, ["Supplier", "Price"])

===========================================
Further manipulation of  DataMatrix objects
===========================================

.. versionadded:: 0.8

For some special uses, a number of functions have been provided. ``elementApply`` applies a function to the whole matrix, ``matrixApply`` applies a function to either rows or columns, giving a single result, while ``filterMatrix`` can be used to filter rows depending on the content of a specific column. For further information, refer to the documentation strings of those functions.

You can also transpose the matrix (invert the rows and the columns) with the help of the ``transpose`` function.

Also, two conveinence functions have been provided to quickly calculate the mean of columns or rows: they are ``meanRows`` and ``meanColumns``, respectively.

.. versionadded:: 0.9
        The ``apply`` function can be used to apply a function to a specific column, either to each element, or to the column as a whole.

=========================
Saving DataMatrix objects
=========================

You can write ``DataMatrix`` objects to files or file-like objects with the ``writeMatrix`` function present in the module::

	fh = open("somefile.txt","w")
	datamatrix.writeMatrix(matrix,fh)

Output formatting is again set via options to the csv module. Optionally you can save only part of the columns, specified as a list: ::

	datamatrix.writeMatrix(matrix, fh, columns = ["Name","Job"])

If you want the header (column names) to be included, you need to set the ``header`` parameter to ``True``: ::

	datamatrix.writeMatrix(matrix, fh, header = True)

