===========
 Sunflower
===========
by Michael Hoffman <mmh1 at washington dot edu>

.. contents::

Description
===========
Sunflower models the simultaneous binding of transcription factors to
DNA. It uses a hidden Markov model that resembles a sunflower.

Quick start
===========
.. note: do not include indents when copying the code below:

After you have the prerequisites listed below set up, try::

  python setup.py build
  python setup.py install

  sunflower --resource human test/data/brca_tss.fna brca.h5
  sunreport --include=BRCA2 brca.h5 > brca2.sunreport.tab

If you want to look at the output in R, try this::

  library(lattice)
  d.brca2 = read.delim("brca2.sunreport.tab")
  xyplot(prob ~ pos | state, d.brca2, type = "l")

Prerequisites
=============
You must have the following installed before installing Sunflower:

- a C99 compiler (I use GCC_ 3.4.4)
- HDF5_ 1.6.5
- Python_ 2.5.1

Additionally, the Sunflower setup script will automatically install
the following packages or newer versions:

- NumPy_ 1.0.5
- path.py 2.2
- PyTables_ 2.0
- setuptools 0.6c7
- textinput 0.1.1

In order for automatic prerequisite installation to work, you either
need write access to your Python installation's site-packages
directory, or to configured setuptools to install into your
home directory automatically (see hint below).

.. _GCC: http://gcc.gnu.org/
.. _HDF5: http://www.hdfgroup.org/
.. _NumPy: http://numpy.scipy.org/
.. _Python: http://www.python.org/
.. _PyTables: http://www.pytables.org/

Optional
--------
- Poly 0.1.1, for running on distributed systems
- Pyrex 0.9.5.1a, for hacking Pyrex source files
- docsql and MySQLdb, for using a MySQL database for output

Hints on installing the prerequisites in your home directory
------------------------------------------------------------
To install the prerequisites, the easiest thing to do is use your
system's package manager. This may require you to be root. If you
aren't, here are some hints on how to install things within your home
directory. Check to make sure the packages aren't already available on
your system first.

Configuration and directory setup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This will create some directories for you to store your own
libraries. In case you are using a multiplatform system, binary
libraries go into specific directories for each platform under your
home directory. For example, on a Linux x86_64 system,
platform-specific code goes in ``~/arch/Linux-x86_64``.

You'll need stuff like this in your ``~/.bashrc`` to determine your
platform and set up appropriate paths::

  export ARCH=$(uname)-$(uname -m)

  export PATH=${HOME}/arch/${ARCH}/bin:${HOME}/arch/${ARCH}/opt/python-2.5.1/bin:${HOME}/bin:${PATH}

  unset PYTHONHOME

  PYTHON_VERSION=2.5
  export PYTHONPATH=${HOME}/arch/${ARCH}/lib/python${PYTHON_VERSION}:${HOME}/lib/python${PYTHON_VERSION}

  export HDF5_DIR=${HOME}/arch/${ARCH}/opt/hdf5-1.6.6

You'll also need to load this configuration for the rest of these
hints, and create the appropriate directories. Run this from the bash
prompt::

  source ~/.bashrc
  mkdir -p ${HOME}/arch/${ARCH}/lib/python${PYTHON_VERSION} \
    ${HOME}/lib/python${PYTHON_VERSION} ${HOME}/arch/${ARCH}/bin \
    ${HOME}/bin:${PATH} ${HOME}/arch/${ARCH}/opt

HDF5
~~~~
Download the HDF5 source tarball, build, test, and install it::

  wget ftp://ftp.hdfgroup.org/HDF5/current/src/hdf5-1.6.6.tar.gz
  tar zxvf hdf5-1.6.6.tar.gz
  cd hdf5-1.6.6
  ./configure --prefix=${HDF5_DIR}
  make
  make install

Python
~~~~~~
If you're using MacOS or Windows, there are binaries available at
<http://www.python.org/download/>. If not, you can download the source
tarball, build, test, and install it::

  wget http://www.python.org/ftp/python/2.5.1/Python-2.5.1.tar.bz2
  tar jxvf Python-2.5.1.tar.bz2
  cd Python-2.5.1
  ./configure --prefix=~/arch/${ARCH}/opt/python-2.5.1
  LD_RUN_PATH=${HDF5_DIR}:${LD_RUN_PATH} make
  make install

Setuptools
~~~~~~~~~~
To get setuptools to automatically install packages within your home
directory, put this in ``~/.pydistutils.cfg``::

  [install]
  prefix = ~
  exec_prefix = ~/arch/$ARCH
  install_platlib = $platbase/lib/python$py_version_short
  install_purelib = $base/lib/python$py_version_short
  install_scripts = $platbase/bin

  [easy_install]
  install_dir = $platbase/lib/python$py_version_short
  script_dir = $platbase/bin

You must set ``$ARCH`` as above in your .bashrc. The other variables
will be determined by Python.

Supported environments
----------------------
We test Sunflower on Linux-i386 and Linux-x86_64. Other UNIX-like
environments may work. Cygwin does not work at the moment, but
probably will in the future.

Program synopses
================
You should be able to run all these programs with ``--help`` to get some
of the available options.

Preparation
-----------
``pwm2sfl``
  convert JASPAR PWMs to .sfl format

``sunrecompose``
  adjust unbound state to fit provided base composition

``sunscramble``
  scramble transcription factor states randomly

Simulation
----------
``sunflower``
  simulate TF binding to the reference sequence or simulated
  mutations, and store results in an HDF5 file

Reporting
---------
``sunreport``
  report results aggregated, or for an individual sequence

Utilities
---------
``h5attr``
  set attributes on HDF5 files

``h5cat``
  concatenate multiple HDF5 files

Included data
=============
I've included a dataset from JASPAR CORE 2006, generated using the
script in ``build_data.sh``. It includes only human and general
transcription factors, and is recomposed against human genome build
NCBI36.

Acknowledgments
===============
Thanks to Alison Meynert for code contributions (in pwm2sfl) and Guy
Slater (fastacomp).

JASPAR data included by kind permission of Boris Lenhard. If you use
this data, please cite:

  Vlieghe D, Sandelin A, De Bleser PJ, Vleminckx K, Wasserman WW, van
  Roy F, Lenhard B. "A new generation of JASPAR, the open-access
  repository for transcription factor binding site profiles." *Nucleic
  Acids Res.* 2006 January 1; 34(Database issue): D95-D97.

This material is based upon work supported under a National Science
Foundation Graduate Research Fellowship. Any opinions, findings,
conclusions or recommendations expressed in this publication are those
of the author(s) and do not necessarily reflect the views of the
National Science Foundation.

License
=======
Sunflower is available under the GPLv2 license.

Contact
=======
Please let me <hoffman+sunflower@ebi.ac.uk> know if you have any
comments on the installation or use of Sunflower. I would love to know
if you get it working on a system not listed above.

