.. py:currentmodule:: Orange.data

=============================
Data instances (``Instance``)
=============================

Class `Orange.data.Instance` holds a data instance. Each data instance
corresponds to a domain, which defines its length, data types and
values for symbolic indices.

--------
Features
--------

The data instance is described by a list of features defined by the
domain descriptor (:obj:`Orange.data.domain`). Instances support indexing
with either integer indices, strings or variable descriptors.

Since "age" is the the first attribute in dataset lenses, the
below statements are equivalent::

    >>> data = Orange.data.Table("lenses")
    >>> age = data.domain["age"]
    >>> example = data[0]
    >>> print example[0]
    young
    >>> print example[age]
    young
    >>> print example["age"]
    young

Negative indices do not work as usual in Python, since they refer to
the values of meta attributes.

The last element of data instance is the class label,
if the domain has a class. It should be accessed using
:obj:`~Orange.data.Instance.get_class()` and
:obj:`~Orange.data.Instance.set_class()`.

The list has a fixed length that equals the number of variables.

---------------
Meta attributes
---------------

Meta attributes provide a way to attach additional information to data
instances, such as, for example, an id of a patient or the number of times
the instance was missclassified during some test procedure. The most
common additional information is the instance's weight. These attributes
do not appear in induced models.

Instances from the same domain do not need to have the same meta
attributes. Meta attributes are hence not addressed by positions,
but by their id's, which are represented by negative indices. Id's are
generated by function :obj:`Orange.feature.Descriptor.new_meta_id()`. Id's can
be reused for multiple domains.

Domain descriptor can, but doesn't need to know about
meta descriptors. See documentation on :obj:`Orange.data.Domain` for
more on that.

If there is a particular descriptor associated with the meta attribute
for the domain, attribute or its name can also be used for
indexing. When registering meta attributes with domains, it is
recommended to use the same id for the same attribute in all domains.

Meta values can also be loaded from files in tab-delimited format.

Meta attributes are often used as weights. Many procedures, such as
learning algorithms, accept the id of the meta attribute defining the
weights of instances as an additional argument.

The following example adds a meta attribute with a random value to
each data instance.

.. literalinclude:: code/instance-metavar.py
    :lines: 1-

The code prints out::

    ['young', 'myope', 'no', 'reduced', 'none'], {-2:0.84}

(except for a different random value). Data instance now consists of
two parts, ordinary features that
resemble a list since they are addressed by positions (eg. the first
value is "psby"), and meta values that are more like dictionaries,
where the id (-2) is a key and 0.84 is a value (of type
:obj:`Orange.data.Value`).

To tell the learning algorithm to use the weights, the id needs to be
passed along with the data::

    bayes = Orange.classification.bayes.NaiveLearner(data, id)

Many other functions accept weights in similar fashion.

Code ::

    print orange.getClassDistribution(data)
    print orange.getClassDistribution(data, id)

prints out ::

    <15.000, 5.000, 4.000>
    <9.691, 3.232, 1.969>

where the first line is the actual distribution and the second a
distribution with random weights assigned to the instances.

Registering the meta attribute using :obj:`Orange.data.Domain.add_meta`
changes how the data instance is printed out and how it can be
accessed::

    w = Orange.feature.Continuous("w")
    data.domain.addmeta(id, w)

Meta-attribute can now be indexed just like ordinary features. The
following three statements are equivalent::

    print data[0][id]
    print data[0][w]
    print data[0]["w"]

Another consequence of registering the meta attribute is that it
allows for conversion from Python native types::

    ok = Orange.feature.Discrete("ok?", values=["no", "yes"])
    ok_id = Orange.feature.Descriptor.new_meta_id()
    data.domain.addmeta(ok_id, ok)
    data[0][ok_id] = "yes"

The last line fails unless the attribute is registered since Orange
does not know which variable descriptor to use to convert the string
"yes" to an attribute value.

-------
Hashing
-------

Data instances compute hashes using CRC32 and can thus be used for
keys in dictionaries or collected to Python data sets.

.. class:: Instance

    .. attribute:: domain

        The domain to which the data instance corresponds. This
        attribute is read-only.

    .. method:: __init__(domain[, values])

        Construct a data instance with the given domain and initialize
        the values. Values are given as a list of
        objects that can be converted into values of corresponding
        variables: strings and integer indices (for discrete varaibles),
        strings or numbers (for continuous variables), or instances of
        :obj:`Orange.data.Value`.

        If values are omitted, they are set to unknown.

        :param domain: domain descriptor
        :type domain: Orange.data.Domain
        :param values: A list of values
        :type value: list

        The following example loads data on lenses and constructs
        another data instance from the same domain.

        .. literalinclude:: code/instance-construct.py
            :lines: 1-5

        Same can be done using other representations of values

        .. literalinclude:: code/instance-construct.py
            :lines: 7-8

    .. method:: __init__([domain ,] instance)

        Construct a new data instance as a shallow copy of the
        original. If a domain descriptor is given, the instance is
        converted to another domain.

        :param domain: domain descriptor
        :type domain: Orange.data.Domain
        :param instance: Data instance
        :type value: :obj:`Instance`

        The following examples constructs a reduced domain and a data
        instance in this domain. ::

            domain_red = Orange.data.Domain(["age", "lenses"], domain)
            inst_red = Orange.data.Instance(domain_red, inst)

    .. method:: __init__(domain, instances)

        Construct a new data instance for the given domain, where the
        feature values are found in the provided instances using
        both their ordinary features and meta attributes that are
        registered with their corresponding domains. The new instance
        also includes the meta attributes that appear in the provided
        instances and whose values are not used for the instance's
        features.

        :param domain: domain descriptor
        :type domain: Orange.data.domain
        :param instances: data instances
        :type value: list of Orange.data.Instance

        .. literalinclude:: code/instance_merge.py
                :lines: 3-

        The new domain consists of variables from `data1` and `data2`:
        `a1`, `a3` and `m1` are ordinary features, and `m2` and `a2`
        are meta attributes in the new domain. `m2` has the
        same meta attribute id as it has in `data1`, while `a2` gets a
        new meta id. In addition, the new domain has two new
        attributes, `n1` and `n2`.

        Here is the output::

            First example:  [1, 2], {"m1":3, "m2":4}
            Second example:  [1, 2.5], {"m1":3, "m3":4.5}
            Merge:  [1, 2.5, 3, ?], {"a2":2, "m2":4, -5:4.50, "n2":?}


        Since attributes `a1` and `m1` appear in domains of both
        original instance, the new instance can only be constructed if
        these values match. `a3` comes from the second instance, and
        meta attributes `a2` and `m1` come from the first one. The
        meta attribute `m3` is also copied from the second instance;
        since it is not registered within the new domain, it is
        printed out with an id (-5) instead of with a name. Values of
        the two new attributes are left undefined.

    .. method:: native([level])

        Convert the instance into an ordinary Python list. If the
        optional argument `level` is 1 (default), the result is a list of
        instances of :obj:`Orange.data.Value`. If it is 0, it contains
        pure Python objects, that is, strings for discrete variables
        and numbers for continuous ones.

    .. method:: compatible(other, ignore_class=False)

        Return ``True`` if the two instances are compatible, that
        is, equal in all features which are not missing in one of
        them. The optional second argument can be used to omit the
        class from comparison.

    .. method:: get_class()

        Return the instance's class as :obj:`Orange.data.Value`.

    .. method:: get_classes()

        Return the values of multiple classes as a list of
        :obj:`Orange.data.Value`.

    .. method:: set_class(value)

        Set the instance's class.

        :param value: the new instance's class
        :type value: :obj:`Orange.data.Value`, number or string

    .. method:: set_classes(values)

        Set the values of multiple classes.

        :param values: a list of values; the length must match the number of multiple classes
        :type values: list

    .. method:: get_metas([key_type])

        Return a dictionary containing meta values of the data
        instance. The argument ``key_type`` can be ``int`` (default),
        ``str`` or :obj:`Orange.feature.Descriptor` and
        determines whether
        the dictionary keys are meta ids, variables names or
        variable descriptors. In the latter two cases, only registered
        attributes are returned. ::

            data = Orange.data.Table("inquisition2")
            example = data[4]
            print example.get_metas()
            print example.get_metas(int)
            print example.get_metas(str)
            print example.get_metas(Orange.feature.Descriptor)

        :param key_type: the key type; either ``int``, ``str`` or :obj:`~Orange.feature.Descriptor`
        :type key_type: ``type``

    .. method:: get_metas(optional, [key_type])

        Similar to above, but return a dictionary that contains
        only non-optional attributes (if ``optional`` is 0) or
        only optional attributes.

        :param optional: tells whether to return optional or non-optional attributes
        :type optional: ``bool``
        :param key_type: the key type; either ``int``, ``str`` or :obj:`~Orange.feature.Descriptor`
        :type key_type: `type``

    .. method:: has_meta(attr)

        Return ``True`` if the data instance has the specified meta
        attribute.

        :param attr: meta attribute
        :type attr: :obj:`id`, ``str`` or :obj:`~Orange.feature.Descriptor`

    .. method:: remove_meta(attr)

        Remove the specified meta attribute.

        :param attr: meta attribute
        :type attr: :obj:`id`, ``str`` or :obj:`~Orange.feature.Descriptor`

    .. method:: get_weight(attr)

        Return the value of the specified meta attribute. The
        attribute's value must be continuous and is returned as ``float``.

        :param attr: meta attribute
        :type attr: :obj:`id`, ``str`` or :obj:`~Orange.feature.Descriptor`

    .. method:: set_weight(attr, weight=1)

        Set the value of the specified meta attribute to ``weight``.

        :param attr: meta attribute
        :type attr: :obj:`id`, ``str`` or :obj:`~Orange.feature.Descriptor`
        :param weight: weight of instance
        :type weight: ``float``
