==================================================
pyfasta: pythonic access to fasta sequence files.
==================================================


:Author: Brent Pedersen (brentp)
:Email: bpederse@gmail.com
:License: MIT


Implementation
==============

Requires Python >= 2.5. Stores a flattened version of the fasta file without 
spaces or headers. And a pickle of the start, stop (for fseek) locations of 
each header in the fasta file for internal use.
Now supports the numpy array interface.


Usage
=====

::

    >>> from pyfasta import Fasta

    >>> f = Fasta('tests/data/three_chrs.fasta')
    >>> sorted(f.keys())
    ['chr1', 'chr2', 'chr3']

    >>> f['chr1']
    FastaRecord('tests/data/three_chrs.fasta.flat', 0..80)

Slicing
-------
::

    >>> f['chr1'][:10]
    'ACTGACTGAC'

    # get the 1st basepair in every codon (it's python yo)
    >>> f['chr1'][::3]
    'AGTCAGTCAGTCAGTCAGTCAGTCAGT'


    # the index stores the start and stop of each header from the fasta file.
    # (you should never need this)
    >>> f.index
    {'chr3': (160, 3760), 'chr2': (80, 160), 'chr1': (0, 80)}


    # can query by a 'feature' dictionary
    >>> f.sequence({'chr': 'chr1', 'start': 2, 'stop': 9})
    'CTGACTGA'

    # with reverse complement for - strand
    >>> f.sequence({'chr': 'chr1', 'start': 2, 'stop': 9, 'strand': '-'})
    'TCAGTCAG'


---------------------
Numpy Array Interface
---------------------
::

    # FastaRecords support the numpy array interface.
    >>> import numpy as np
    >>> a = np.array(f['chr2'])
    >>> a.shape[0] == len(f['chr2'])
    True

    >>> a[10:14]
    array(['A', 'A', 'A', 'A'], 
          dtype='|S1')


    # cleanup (though for real use these will remain for faster access)
    >>> import os
    >>> os.unlink('tests/data/three_chrs.fasta.gdx')
    >>> os.unlink('tests/data/three_chrs.fasta.flat')
