Metadata-Version: 1.1
Name: django-analyze
Version: 0.4.21
Summary: A general purpose framework for training and testing classification algorithms.
Home-page: https://github.com/chrisspen/django-analyze
Author: Chris Spencer
Author-email: chrisspen@gmail.com
License: LGPL
Description: Django-Analyze - Framework for managing classifiers
        ===================================================
        
        Overview
        --------
        
        There are tons of amazing algorithms and machine learning tools for
        detecting patterns in data. However, what most of these lack is a useful
        framework and UI for managing the often complicated setup of the data
        flow and predictions.
        
        This package provides several tools for utilizing Django's admin
        interface and ORM to help organize and manage machine learning setups.
        
        The framework revolves around two basic objects:
        
        1. A problem, which organizes solutions to acheive some prediction goal.
           This is mainly implemented a genetic algorithm.
        2. A predictor, which organizes a specific solution to either guess a
           numeric value (i.e. regression) or a label (i.e. classification).
        
        I made this separation to help myself with maintainence over the life
        time of an application. Often, I'd want to monitor the accuracy of a
        solution, but also evaluation other potential solutions without
        interrupting the solution used for production predictions. Once a
        superior solution was found, then I'd want to push it into production
        use with as little effort as possible. By explicitly representing
        different solutions as different records in the database, I found I
        could easily monitor them and slip them in and out of use as needed.
        
        Problem
        -------
        
        The ``problem`` represents a domain where we're attempting to solve some
        prediction task, by either guessing a number or guessing a label. In the
        code, this is referred to as the ``Genome``. A record in the ``Genome``
        table represents a distinct problem domain and stores all the parameters
        used to control and manage the search for solutions.
        
        From the ``Genome`` you define ``Genes``, which are parameters available
        for use when attempting to solve the problem.
        
        Specific solutions to the problem are represented by the ``Genotype``
        model, which contains a list of genes and their associated values as
        key/values pairs.
        
        To search for the best solution to a problem, you first implement a
        custom evaluating function, which will take a genotype as an argument
        and return a positive number, called the fitness, representing its
        overall suitability in solving the problem. By default, a value of 0 is
        interpreted to be the worse possible fitness and increasing value
        representing increasing levels of suitability. Personally, I find it
        convenience and intuitive to bound fitness between 0 and 1, but this is
        not strictly enforced.
        
        You then set this function in your ``Genome's`` ``evaluator`` field and
        run the management command:
        
        ::
        
            python manage.py evolve_population --genome=<genome_id>
        
        Depending on the other settings in the genome, this will run for a
        maximum predetermined number of iterations or until improvement of the
        fitness has stalled. From the genome's admin change page, you can browse
        the list of generated genotypes and inspect their fitness, possibly
        selecting one for production use.
        
        For example, a simple genome might consist of a single gene called
        ``algorithm``, which contains one of several algorithm names (e.g.
        'Bayesian', 'LinearSVC', 'RandomForest', etc.). You would write your
        evaluation function to read this string and instantiate the appropriate
        class associated with the name. You could then add additional genes
        representing parameters common to multiple algorithms or unique to only
        a few. The ``Genotype`` model with generate a unique hash based on which
        genes it contains, and use this to avoid creating duplicate genotypes.
        
        Predictor
        ---------
        
        todo
        
        Usage
        -----
        
        todo
        
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Web Environment
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Framework :: Django
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
