"""
Methods for performing matrix factorization.  See
graphlab.matrix_factorization.create for additional documentation.
"""

import graphlab.connect as _mt
import graphlab as _graphlab
from graphlab.toolkits.recommender.recommender import RecommenderModel
import logging

DEFAULT_HYPER_PARAMETER_RANGE = {
    'n_factors': range(2, 25),
    'regularization': [0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0]
}

def create(*args, **kwargs):
    """
    *Deprecated*. Please use

    ``graphlab.recommender.create(..., method="matrix_factorization")``

    instead.
    """

    _mt._get_metric_tracker().track('toolkit.recsys.matrix_factorization.create')

    logging.warning("matrix_factorization.create will be deprecated soon. Please switch to recommender.create")

    kwargs["method"] = "matrix_factorization";

    return _graphlab.recommender.create(*args, **kwargs)


class MatrixFactorizationModel(RecommenderModel):
    r"""
    A matrix factorization model is able to learn a set of parameters
    for each user and for each item, scoring a (user, item) pair based
    on the similarity of the user and the item.  In other words, the
    method will compose a set of user-item ratings (when are
    represented as a matrix) into a number of latent factors for each
    item and corresponding factors for each user.

    This model can be created using
    :func:`create(..., method='matrix_factorization') <graphlab.recommender.create>`.
    Do NOT instantiate this model class directly.



    **Examples**


    The factorization models, matrix_factorization and factorization_model,
    both find a mix of linear and interation terms that attempt to predict the
    ratings of user and item pairs as accurately as possible.  The models
    differ in how user and item side feature are treated -- the
    factorization_model fits interaction factors to these side features, while
    the matrix_factorization only fits the linear terms.

    The following examples are given for "matrix_factorization", but
    "factorization_model" works in exactly the same way.

    The basic use is simple::

       >>> gl.recommender.create(data, "user", "item", "rating",
                                 method = "matrix_factorization")

    To penalize items that are not in the training set, set
    `unobserved_rating_regularization` to a value greater than 0.  The
    expresses the thought that if a user is not observed interacting with an
    item, then that user has implied a weak preference against that item.  The
    algorithm attempts to find a model that predicts all unseen user-item pairs
    to score below `unobserved_rating_value`, with
    `unobserved_rating_regularization` controlling how it ballances this
    objective with predicting the actual scores of the model.  When
    `unobserved_rating_regularization == 1`, then penalizing unseen items
    scoring higher than `unobserved_rating_value` is given equal weight to
    fitting the actual scores.

    For example, if the ratings are given between 1 and 5, you might
    want to push unseen items to somewhere close to the mean::

       >>> gl.recommender.create(data, "user", "item", "rating",
                                 method = "matrix_factorization",
                                 unobserved_rating_regularization = 0.1,
                                 unobserved_rating_value = 3)


    **Model Definition**


    Like :class:`FactorizationModel <graphlab.recommender.FactorizationModel>`
    and :class:`LinearRegressionModel <graphlab.recommender.LinearRegressionModel>`,
    `MatrixFactorizationModel` trains a model capable of predicting a score for
    each possible combination of users and items.  The internal coefficients of
    the model are learned from known scores of users and items.
    Recommendations are then based on these scores.

    In the two factorization models, users and items are represented by weights
    and factors.  These model coefficients are learned during training.
    Roughly speaking, the weights, or bias terms, account for a user or item's
    bias towards higher or lower ratings.  For example, an item that is
    consistently rated highly would have a higher weight coefficient associated
    with them.  Similarly, an item that consistently receives below average
    ratings would have a lower weight coefficient to account for this bias.

    The factor terms model interactions between users and items.  For example,
    if a user tends to love romance movies and hate action movies, the factor
    terms attempt to capture that, causing the model to predict lower scores
    for action movies and higher scores for romance movies.  Learning good
    weights and factors is controlled by several options outlined below.

    More formally, the predicted score for user :math:`i` on item :math:`j` is
    given by

        .. math::
           \operatorname{score}(i, j) = \mu + w_i + w_j + {\mathbf u}_i^T {\mathbf v}_j

    where :math:`\mu` is a global bias term that centers the scores,
    :math:`w_i` is the user weight term, :math:`w_j` is the item weight term.
    The latent factors, which are vectors of length ``num_factors``, are given
    by :math:`{\mathbf u}_i` and :math:`{\mathbf v}_j`.


    **Side information**


    Both the matrix factorization model and the full factorization model can
    easily work with side information. If given, all scores and recommendations
    take this information into account.  Side information is provided by
    supplying ``item_data`` and ``user_data`` to
    :func:`graphlab.recommender.create`.

    ``MatrixFactorizationModel`` and :class:`FactorizationModel
    <graphlab.recommender.FactorizationModel>` differ only in how they handle
    additional columns in the training observation data or side information.
    ``MatrixFactorizationModel`` learns interaction factors for only the user
    and item terms and assigns only linear weights to all the side information
    terms.  In contrast, the full :class:`FactorizationModel
    <graphlab.recommender.FactorizationModel>` learns interaction factors for
    all additional columns in the data.

    Typically, :class:`FactorizationModel
    <graphlab.recommender.FactorizationModel>` outperforms plain matrix
    factorization in this case, but may require a longer training time.


    **Parameters for Model Creation and Training**


    A `MatrixFactorizationModel` is created by calling
    :func:`graphlab.recommender.create` with ``method="matrix_factorization"``.
    It is also the default model when a target column is specified.

    Several parameters govern the training of the model and can have a
    significant impact on the quality of the recommendations.  The primary
    concern when training the model is that the model should not *overfit*.
    When a trained model overfits, it captures random noise in the data instead
    of useful patterns that can be generalized beyond the specific dataset used
    to build the model.  The parameters below control this aspect of the
    training. 

    n_factors : integer, default = 8.

       ``n_factors`` controls the dimension of the latent factors used to
       capture the interactions between users and items.  Increasing this value
       allows the model to fit more complex interactions at the expense of
       training time and possible overfitting.

    regularization : float, default = 1.0.

       ``regularization`` helps control the complexity in the user and item
       factors.  Increasing this value can significantly help the model to not
       overfit.

    unobserved_rating_regularization : float, default = 0.

       When ``unobserved_rating_regularization`` is larger than zero, the model
       attempts to learn factors that both score known user-item pairs
       accurately and also score user-item pairs not observed in the sampling
       set as close to or below ``unobserved_rating_value`` as possible.  This
       parameter controls the balance of these two objectives, with larger
       values suppressing the unobserved user-item scores at the expense of
       accurately predicting the known user and item scores. Enabling this
       option can vastly improve the precision and recall of any of the
       factorization models.

       NOTE: this option is not compatible with ``binary_targets = True``.

    unobserved_rating_value : float, default = None.

       When ``unobserved_rating_regularization`` is greater than 0, then the the
       model attempts learn coefficients that that both predict the scores in
       the training data correctly and predict the score for unobserved user and
       item interactions as close to `unobserved_rating_value` as possible.  If
       None, then set to the mean of the ratings.

    linear_regularization : float, default = 0.

       ``linear_regularization`` is similar to ``regularization``, but applies
       to the linear weights instead of the factors.  This typically does not
       as much effect on preventing the model from overfitting as
       ``regularization``, but increasing this can encourage the model to
       explain the scores using primarly interaction terms instead of the user
       and item weights.


    **Parameters Governing Model Behavior**


    Several other parameters control the internal structure of the model.

    nmf : boolean, default = False.

       If True, then the intercept and linear terms are disabled, and the
       matrix factors are constrained to be non-negative.  (Note: This option
       is not available for the :class:`FactorizationModel
       <graphlab.recommender.FactorizationModel>` model.)

    binary_targets : boolean, default = False.

       If True, then the targets given in the observation data must be either 0
       or 1.  In this case, the score function above uses the logistic function
       to train the weights, which can yield better results.



    **Mathematical Details**



    Formally, the objective function we are optimizing for is:


           .. math::
              \min_{ \mathbf{u}, \mathbf{w}, \mathbf{V}, \mathbf{U}}
              \sum_{(i,j,y) \in \mathcal{D}}
              \mathcal{L}(\operatorname{score}(i, j), y)
              + \lambda_w \lVert {\mathbf w} \rVert^2_2
              + \lambda_{UV} \left(\lVert {\mathbf U} \rVert^2_2
                                   + \lVert {\mathbf V} \rVert^2_2 \right)


    where :math:`{\mathbf U} = ({\mathbf u}_1, {\mathbf u}_2, ...)` denotes the user's
    latent factors and :math:`{\mathbf V} = ({\mathbf v}_1, {\mathbf v}_2, ...)` denotes
    the item latent factors.  The loss function :math:`\mathcal{L}(\hat{y}, y)`
    is :math:`(\hat{y} - y)^2` by default.

    When ``unobserved_rating_regularization`` is nonzero, then the equation
    above gets an additional term.  Let :math:`\lambda_{\text{urr}}` represent
    the value of `unobserved_rating_regularization`, and let
    :math:`v_{\text{ur}}` represent ``unobserved_rating_value``.  Then the
    objective we attempt to minimize is:

           .. math::
              \min_{ \mathbf{u}, \mathbf{w}, \mathbf{V}, \mathbf{U}}
              \sum_{(i,j,y) \in \mathcal{D}}
              \mathcal{L}(\operatorname{score}(i, j), y)
              + \lambda_w \lVert {\mathbf w} \rVert^2_2
              + \lambda_{UV} \left(\lVert {\mathbf U} \rVert^2_2
                                   + \lVert {\mathbf V} \rVert^2_2 \right)
              + \frac{\lambda_{urr}}{\text{const}}
              \sum_{(i,j) \notin \mathcal{D}}
              \mathcal{L}\left(\operatorname{score}(i, j), v_{\text{ur}}\right)

    (This particular objective is infeasable so a sampling
    approximation is used internally.)


    **Parameters for Optimization**


    The optimization used to train the model is a carefully engineered version
    of SGD.  The step size is chosen automatically by analyzing a number of
    small runs on sampled subsets of the data, and then the convergence of the
    rest of the full run is monitored to ensure that the result is free of
    numerical issues.

    A number of parameters control this optimization. The default values of
    these parameters should work in most cases.

    sgd_step_size : double, default = 0

        The starting step size to use for the SGD tuning.  If zero (default),
        the step size is chosen automatically.

    max_iterations : integer, default = 100.

        The maximum number of passes through the data allowed.

    step_size_decrease_rate : double, default = 0.75

        The rate at which the step size decreases.

    sgd_convergence_threshold : double, default = 1e-6.

        Convergence is tested using variation in the training objective.  The
        variation in the training objective is calculated by dividing the
        difference between the maximum and minimum objective values over the
        past ``sgd_convergence_interval`` steps by the value of the objective.
        When this value falls below ``sgd_convergence_threshold``, then the
        optimization stops.  Set this value to 0 to prevent stopping before
        ``max_iterations`` passes through the data.

    sgd_convergence_interval : integer, default = 8.

        Stop optimization when the loss has not improved by
        convergence_threshold in this number of passes through the data.


    sgd_trial_sample_proportion : double, default = 0.01.

        The proportion of the original data to use to build a trial dataset for
        automatically setting the stepsize.

    sgd_trial_sample_minimum_size : integer, default = 10000.

        The minimum number of samples from the original data to use to build a
        trial dataset for automatically setting the stepsize.  If there are
        fewer than 10000 observations in the training data, we simply use the
        full dataset.

    sgd_max_trial_iterations : integer, default = 30.

        The maximum number of passes through the data to allow for in testing a
        step size on the the trial dataset.


    References
    ------------------------------------------------------------

    See this paper for details about the model: Koren, Yehuda, Robert Bell and
    Chris Volinsky. "Matrix Factorization Techniques for Recommender Systems."
    Computer Volume: 42, Issue: 8 (2009): 30-37


    """

    def __init__(self, model_proxy):
        '''__init__(self)'''
        self.__proxy__ = model_proxy

    def _get_wrapper(self):
        def model_wrapper(model_proxy):
            return MatrixFactorizationModel(model_proxy)
        return model_wrapper
