Metadata-Version: 1.1
Name: mempamal
Version: 0.1.5
Summary: MEMPAMAL: Means for EMbarrassingly PArallel MAchine Learning
Home-page: https://github.com/BenoitDamota/mempamal
Author: Benoit Da Mota
Author-email: damota.benoit@gmail.com
License: BSD 3-clause
Description: .. -*- mode: rst -*-
        
        MEmPaMaL
        ========
        
        Means for EMbarrassingly PArallel MAchine Learning.  
        
        *Même pas mal!* is a french exclamation standing for *Did not hurt!*.
        So, if your computer cannot manage your machine learning load, just
        respond MEmPaMaL and give it a try.
        
        MEmPaMaL is a Python module for some machine learning workflows built on top of
        Scikit-learn and distributed under the 3-Clause BSD license.
        
        Purpose
        -------
        
        MEmPaMaL is a set of python helpers to produce and run some 
        embarrassingly parallel machine learning workflows. The goal of
        MEmPaMaL is to produce a list of commands and dependencies. We state
        that today, commands and files are a very portable and effective
        approach for quick and dirty data exploration or new algorithms
        development. *If your workflow runs on your personal computer but takes
        too many times, MEmPaMal can help you to scale out!*
        
        For the machine learning description it relies on the Scikit-learn [1]
        design (pipelines, estimators with fit/predict, transformers with
        fit/transform, etc.) and can accept estimators that respect the
        Scikit-learn conventions. One possible back-end to execute the list of
        commands and dependencies is Soma-Workflow [2]. This approach was
        successfully applied to personal multicore computers, clusters (with
        more than 2,000 cores) and cloud. To see more on the origins and
        motivations of this project, you can read [3].
        
        So, what is this set of machine learning workflows and the restrictions?
        
        Restrictions 
        ------------ 
        
        - **Suited for supervised learning**. For a given set of samples we
          measure features, an array denoted X of shape (n_samples,
          n_features), and targets, an array y of shape (n_samples,
          n_targets). The goal is to construct a predictive model and to
          measure its goodness of fit on unseen data (see examples for more
          insight).
        
        - **Did you say Big Data?** MEmPaMaL don't care about this marketing
          term, it just some scripts to help you distribute your computation
          as seamlessly as possible. If your code and computing infrastructure
          is able to perform *Big Data* analysis with simple files and
          commands, it should work but it is NOT the purpose of MEmPaMaL.
        
        - **Need your knowledge**. You need do understand what you do. For
          instance, most of the times multi-targets are inappropriate for a
          given estimator. Knowing about your data is probably more important
          than being able to compute many models.
        
        - **It's not magic!** You need the computing resources corresponding
          to your load then MEmPaMaL can help you to scale out.
        
        - **It's really not magic!** If your algorithm is not efficient or
          consume too many memory, MEmPaMaL will just help you to run it many
          times in parallel. So, be careful of the snowball effect and if not
          sure, try it at small scale first.
        
        - **It's definitely not magic!** Some codes are already parallel
          (multiprocessing, OpenMP, etc.) and you need to understand the
          implications and make choices. For instance, you can limit OpenMP to
          one thread: ``export OMP_NUM_THREADS=1``. In some cases, you would
          like to play with the limits of your system and for that you must
          deeply understand the underlying technologies and the possibilities
          of the execution back-end.
        
        Examples
        --------
        
        A simple example:
        http://nbviewer.ipython.org/github/BenoitDamota/mempamal/blob/master/mempamal/examples/MEmPaMaL_first_example.ipynb
        
        An example with model selection:
        http://nbviewer.ipython.org/github/BenoitDamota/mempamal/blob/master/mempamal/examples/MEmPaMaL_second_example.ipynb
        
        An example with ParsimonY (another Python machine learing library):
        http://nbviewer.ipython.org/github/BenoitDamota/mempamal/blob/master/mempamal/examples/MEmPaMaL_third_example.ipynb
        
        An example with nested parallelism:
        http://nbviewer.ipython.org/github/BenoitDamota/mempamal/blob/master/mempamal/examples/MEmPaMaL_fourth_example.ipynb
        
        References
        ----------
        
        [1] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR
        12, pp. 2825-2830, 2011. (http://scikit-learn.org)
        
        [2] Soma-workflow: a unified and simple interface to parallel
        computing resources, S. Laguitton et al., in MICCAI Workshop on High
        Performance and Distributed Computing for Medical Imaging,
        2011. (http://neurospin.github.io/soma-workflow/)
        
        [3] Machine learning patterns for neuroimaging-genetic studies in the cloud,
        B. Da Mota et al., in Frontiers in neuroinformatics, 2014.
        
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: Unix
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python
Classifier: Topic :: Software Development
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries :: Python Modules
