Module orngCI implements classes and function that support constructive induction (CI). It includes minimal-complexity function decomposition and a much generalized minimal-error function decomposition. Both can be used as feature induction algorithms or as a standalone learning algorithm HINT (Zupan, 1997; Zupan et al, 1998). In addition, the module provides an interface to generalized Kramer's method (Kramer, 1994) and serveral other CI related algorithms.
Constructive induction takes a subset of attributes, which we call bound attributes and creates a new attribute whose values are computed from values of the bound attributes. Orange supports the so-called data-driven constructive induction, where learning examples are used to decide how the new attribute is computed from the bound ones.
Feature constructors (orange objects that construct features) are thus given examples, a list of bound attributes, and, optionally, an id of the weight attribute. They construct a new attribute and return a tuple with the new feature and a value that estimates its quality.
Complex feature constructors, such as those for minimal-error
decomposition and for Kramer's algorithm show the flexibility of
Orange's component based approach. On the other hand, the hierarchy of
components that need to be put together to construct a workable
feature inducer is too complex even for an experienced user. The main
mission of orngCI is thus to help the user by
"flattening" the hierarchy. Defaults are provided for all the
components so the user only needs to give the specific parts of the
hierarchy.
For instance, let fim be an instance of
orange.FeatureIM, a feature constructor for minimal error
function decomposition. Its most crucial parameter, m, is
"hidden" in
fim.clustersFromIM.columnAssessor.m. Parameter m
is an attribute of an m-estimate based column assessor component of
the column quality assessor based algorithm for column clustering. To
save the user from having to remember this, orngCI offers
its own class . Now, if fim is an
instance of orngCI.FeatureByIM the user needs to set
fim.m. Before the actual feature construction takes
place, orngCI.FeatureByIM constructs an appropriate
orange.FeatureByIM and sets its
clustersFromIM.columnAssessor.m to
fim.m. Constructing a structure and putting the
user-supplied components at the right places is called
deflattening.
A general principle is that if you define a component, you also
need to define its subcomponents as well. The default
columnAssessor is
orange.ColumnAssessor_m. If you leave it alone, all its
subcomponents are set as well. However, if you set
columnAssessor to
orange.ColumnAssessor_Measure, it is your responsibility
to set its attribute "measure", either by specifying it as a "measure"
argument to FeatureByIM or by setting it directly in
columnAssessor.measure.
The rest of this section is intended for advanced users.
The function that performs deflattening is always called createInstance.
orange classes that perform the feature construction. The instance is also stored in self's field instance.
Deflattening is initiated by the call operator. The call operator
is given examples, bound attributes and, optionally, the id of the
weight attribute, and returns a constructed attribute and an
assessment of its quality. To do its job, the call operator needs an
instance of a corresponding Orange feature constructor. Constructing
such instances at each invocation of call would be too costly,
therefore a kind of caching is implemented. When the call operator is
invoked for the first time, createInstance is called to
deflatten the structure and the result is stored in
self.instance. In further calls, the stored
self.instance is used again. However, when any of classes
attributes that affect the feature construction (such as
fim.m, see above example) are modified,
self.instance is set to None and a new
instance will be constructed at next call.
Now, if you want to, say, tune fim.m, you can change
it and cause reinstantiations of corresponding orange
classes.
This will construct features from the first two attributes of the
domain, using different values of m. (It would be a blasphemy not to
publish a shorted way to write the above code fragment: the last four
lines can be replaced by attrs = [fim(data, bound)[0] for fim.m
in [0.1, 0.5, 1.0, 2.0]]). However, at each call to
fim, createInstances is called and a new set
of components is constructed since setting m resets
them. To make the program faster, you should replace
by
To construct a feature, you need to call a constructor. The call
operator expects two or three arguments: examples, bound-attributes
and, optionally an id of a weight meta attribute. It returns a tuple
containing the attribute and its quality estimation. This is the same
for all feature constructors in orngCI, although some may
ignore some arguments or return useless quality assessments.
The method of quality estimation depends upon the algorithm used - it can be a decrease of error, an increase of impurity, ets. Qualities returned by different feature constructors are not comparable. The quality can be either negative or positive; in any case, higher numbers mean better attributes. If the measure of quality is negated (ie. low numbers meaning better attributes), they will have a negative sign.
The famous Monk 1 data set (Thrun et al, 1991) has the goal concept y := a=b or e=1. When joining attributes 'a' and 'b', the new attribute should express equality of a and b. Let us see if the minimal-complexity function decomposition can find the appropriate function.
Note that we specified the bound attributes by names. This is not the only way. If you want, you can give attribute descriptors, or even their indices in the domain. Thus, knowing that attributes "a" and "b" are the first two attributes, the above call could be replaced by
or even
or by any combination of names, descriptors and indices.
Now for the results. Values of the new attribute show to which values of the bound attributes they correspond. The new attribute is binary, its first value is named '1-1+2-2+3-3' and the other is '1-2+1-3+2-1+2-3+3-1+3-2'. It is obvious that this is the correct feature. Its quality is -2; minimal-complexity function decomposition prefers attributes with less values, so higher the number of values, more negative (and thus lower) the quality.
It makes sense to rename the values of the constructed feature. In our case, we shall call the first value "eq" and the second would be "diff".
If any of bound attributes have values that include '-' or '+', the constructed attribute's values will be named "c1", "c2", "c3"...
Constructed features can be added to the domain and used in further processing of the data, such as, for instance, learning. If that is what you want, you do not need to read this section any further: if you have followed the above example, you can conclude it by
This adds the attribute to the domain and returns a new example table.
The feature knows how to compute itself from the values of the attributes it is constructed from. The usual Orange mechanism is used for this: the new feature has a getValueFrom pointing to a classifier that computes the feature's value. In short, when some method has an example and needs a value of a particular attribute, but this attribute is not in the example's domain, it can check whether the attribute has a getValueFrom defined. If it has, the method calls it, passing the example as an argument. The getValueFrom either computes the attribute's value or returns don't know.
For most of features constructed by methods in orngCI, getValueFrom will contain some variant of orange.ClassifierByLookupTable. In above case, it is a classifier by lookup table based on two attributes. Values of the new attribute are read from its lookupTable. Each entry in the table corresponds to a combination of value of bound attributes; the first corresponds to (1, 1), the second to (1, 2), then to (1, 3), (2, 1), (2,2)..., where the first digit is for variable1 and the second for variable2. Value "eq" occurs at the first, the fifth and the ninth elements of the table, which correspond to (1, 1), (2, 2) and (3, 3). This is exactly what we expected for Monk 1 dataset.
If you need to manually correct the feature, you can change the lookupTable:
This would work for most cases, but it would fail when you start to mess with probability distributions of the new feature. The complete description of classifiers by lookup table is, however, out of scope of this text.
Let us show an example of what getValueFrom can do. We shall compute the class distributions for different values of the new attribute:
Works even though the new attribute, ab is not in domain. Its values are computed on the fly, while counting, and are not stored anywhere. What the results say is that for the first value ("eq"), all examples are in the second class ("1"), while when values of a and b are different, 216 examples (exactly 3/4) are in the first class and the other 72 in the second. Similarly, you could assess the quality of the new attribute by using information gain, for instance. Note that the automatic computation won't always work. This is done as a precaution in cases when the algorithm would require many recomputations of the value on one and the same example. ReliefF, for instance, is one such measure. In such cases, you add the new attribute to a domain and compute a new dataset in an ExampleTable, as shown above.
Feature inducers can be used to remove (or merge) redundant attribute values. In Monk 1, the attribute "e" has four values, but the goal concept only asks whether the value is "1" or not. Thus, values "2", "3" and "4" are equivalent. If minimal complexity decomposition is called to construct a new attribute from "e", it will easily remove the redundancies:
We can check the lookupTable again.
orange.ColorIG_MCF) is available, there is no point in changing this (more accurately, there is nothing to change it to).completion is orngCI.FeatureByMinComplexity.CompletionByBayes; the values are determined using Bayesian classifier. The alternatives are orngCI.FeatureByMinComplexity.CompletionByDefault, where the most frequent value is used for imputation, and orngCI.FeatureByMinComplexity.NoCompletion, where the values are left undefined.
Class
The algorithm builds an incompatibility matrix (also called partition matrix) and merges the columns until a stopping criterion is reached. In Zupan's method, columns are compared by how the merge would affect the average m-error estimate over elements of the matrix. Merging is stopped when the error stops decreasing. In Orange, different similarity measures and stopping criteria can be used.
Since the FeatureByIM is analogous for FeatureByMinComplexity , orngCI.FeatureByMinError is provided as an alias for orngCI.FeatureByIM. No class with this name appears in orange, though.
orngCI.FeatureByIM has a number of settable components. Their actual positions after deflattening are given in parentheses.
None, orange.IMBySorting will be used.ClustersFromIMByAssessor which uses a columnAssessor to evaluate distances between the columns and merges the most similar columns until the stopCriterion says it's enough.orange.StopIMClusteringByAssessor_noProfit, which stops the merging when the gain is negative or when it is smaller than the prescribed proportion. Alternatives are orange.StopIMClusteringByAssessor_binary that stops merging when only two columns (future values) are left, and orange.StopIMClusteringByAssessor_n which stops at n columns.
minProfitProportion is 0.n is specified, the default stopCriterion is orange.StopIMClusteringFromByAssessor_n.orange.StopIMClusteringByAssessor_binary is used as stopping criterion.
orange.ColumnAssessor_m, which uses m-error estimation. Alternatives are
orange.ColumnAssessor_Laplace minimizes Laplace estimation of error of matrix elements;orange.ColumnAssessor_Measure which optimizes an attribute quality measure (such as information gain); this would be usually used in conjunction with orange.StopIMClusteringByAssessor_binary or orange.StopIMClusteringByAssessor_n;orange.ColumnAssessor_Kramer which optimizes Kramer's impurity measure. It is defined for binary classes only. Since the profit is non-positive, the default stopping criteria is changed to orange.StopIMClusteringByAssessor_binary. You may replace it by orange.StopIMClusteringByAssessor_n.orange.ColumnAssessor_Relief uses a measure similar to ReliefF. It has a local extreme, so it can be used with any stopping criterion.columnAssessor is of type orange.ColumnAssessor_Measure. If this option is given, the default columnAssessor is orange.ColumnAssessor_Measure.completion is orngCI.FeatureByIM.CompletionByBayes; the values are determined using Bayesian classifier. The alternatives are orngCI.FeatureByIM.CompletionByDefault, where the most frequent value is used for imputation, and orngCI.FeatureByIM.NoCompletion, where the values are left undefined.
You should pay attention to make sensible combinations of the arguments. If you, for instance, combine column assessor based on Kramer's impurity measure with the ordinary "no profit" stop criterion (by setting both explicitly), you will probably get features that will be simply Cartesian products of bound attributes, since Kramer's impurity measure is non-decreasing. orngCI does not warning about nonsense settings.
As an example, let us check how minimal-error decomposition performs on Monk 1. We shall check its performance with different values of m, and for each we shall print out the values of the constructed feature.
Results can vary due to random decisions in the procedure. In general, lower m's give correct features. This is not surprising: through m the user may inform the algorithm about the level of noise in the data. With higher m's, the algorithm will tend to oversimplify the feature, in our case, it will merge more than needed.
Let us see whether Kramer's impurity measure can compete with this.
It cannot. But what if we allow it to build a four-valued attribute?
As a more interesting test, let us see what happens if we construct an attribute with optimization of information gain.
orange.MeasureAttribute_info() estimates entropies in the elements of the incompatibility matrix and observes the changes in entropy with potential merging. This is not the standard information gain of an attribute but rather a form of non-myopic information gain measure.)
implements Kramer's algorithm for feature construction (Kramer, 1994).
Basically, it computes a distribution of classes for each combination of values of bound attributes and merges them, just like the minimal-error decomposition merges columns of incompatibility matrix - and hopefully. In case of Monk 1, it extract a 0:36 distribution for combinations 1-1, 2-2 and 3-3, and 36:12 for other values. It is straightforward for any clustering algorithm to see that the former three values belong to one and the latter three to another group.
This method is somewhat similar to minimal-error decomposition. If you sum all rows of incompability matrix to a single row, you get a list of distributions corresponding to combinations of value of bound attributes. And that's exactly the data that Kramer's algorithm operates on. The remaining differences between the algorithms are only in the choice of components; while minimal-error estimation uses a change of m-error estimate as a measure of distance between two values of the new attribute, Kramer's algorithm uses a special impurity measure (which can be, as Kramer himself states, replaced by any other impurity measure or even by some measure of similarity between two distributions).
Due to the similarity between the two methods, the structure of their components is similar. The data structure that corresponds to IM (incompatilibity matrix) is called ExampleDist. There is no component analogous to IMconstructor.
ClustersFromDistributionsByAssessor which uses a distributionAssessor to evaluate differences between the distributions and merges the most similar values until the stopCriterion says it's enough.orange.DistributionAssessor_Kramer, which uses Kramer's measure of impurity. Alternatives are
orange.ColumnAssessor_Laplace minimizes Laplace estimation of error of distributions;orange.DistributionAssessor_Measure which optimizes an attribute quality measure (such as information gain);orange.ColumnAssessor_m which assesses the quality of distributions using m-estimate for error.orange.DistributionAssessor_Relief uses a measure similar to ReliefF. It has a local extreme, so it can be used with any stopping criterion.orngCI.FeatureByIM class. This also influences the default stopCriterion (see below).
orngCI.FeatureByIM: when no n, minProfitProportion or binary are given, the stopCriterion depends upon columnAssessor. When Kramer's assessor is used, default is orange.StopDistributionClustering_binary, which stops when only two values are left. Otherwise, orange.StopDistributionClustering_noProfit that stops when the gain is negative or when it is smaller than the prescribed proportion is used. The third alternative is orange.StopDistributionClustering_n that stops at n values.
minProfitProporion is 0.n is specified, the default stopCriterion is orange.StopDistributionClusteringByAssessor_n.orange.StopIMClusteringByAssessor_binary is used as stopping criterion.
distributionAssessor is of type orange.DistributionAssessor_Measure. If this option is given, the default distributionAssessor is orange.DistributionAssessor_Measure.completion is orngCI.FeatureByKramer.CompletionByBayes; the values are determined using Bayesian classifier. The alternatives are orngCI.FeatureByKramer.CompletionByDefault, where the most frequent value is used for imputation, and orngCI.FeatureByKramer.NoCompletion, where the values are left undefined.
The class usage is essentially the same as that of orngCI.FeatureByIM except for different defaults for distributionAssessor and stopCriterion. This invokes the default Kramer's algorithm that uses Kramer's measure of impurity for distribution quality assessment and induces binary concepts:
Setting m will replace the distribution estimator by orange.DistributionAssessor_m and, consequentially, orange.StopDistributionClustering_noProfit will be used as the stop criterion:
constructs features with the given number of values. Grouping of combinations of bound values is random and the estimated quality of the feature is a random value between 0 and 100, inclusive. The class is useful for comparisons - if a "real" method of constructive induction does not outperform this, it is of no use.
As an example, we shall induce 30-valued random features:
constructs an attribute which presents a Cartesian product of bound attributes. Its quality is estimated by a given attribute quality estimator of type orange.MeasureAttribute. If none is given, all qualities are 0.
Let us construct a cartesian product-attribute and assess its quality using information gain:
orngCI. is a class that constructs a list of features using the given feature construction algorithm. Besides, you can also specify a subset generator - an object that generates bound sets.
orange.SubsetsGenerator_constSize(2), which returns all pairs of attributes. Orange provides several alternatives, the most useful if orange.SubsetsGenerator_minMaxSize(min, max) which generates all subsets within the specified minimal and maximal number of elements. You can define your own generators in Python, if you like.The core of the classes' code is really trivial. The class first checks whether the two fields are specified and provides the default for subsetGenerator if needed. Then it executes:
We will user the class to induce new features from all attribute pairs in Monk 1 using Kramer's algorithm.
induces a hierarchy of attributes as proposed by Zupan et al (1998). At each step, it induces attributes from subset of the current attribute set. If any attributes are found, one of them is selected, inserted into domain and its bound attributes are removed. This is the repeated until there is only one attribute left or the selected attribute includes all attributes as bound set (i.e. if new attributes are induced from pairs of existing attributes, this would stop the induction when there are only two attributes left. The result of calling StructureInducer is a function that computes the class attribute from the remaining attributes. Through using functions boundset() and getValueFrom it is possible to descend down through hierarchy.
FeatureByMinError or FeatureByKramer) or another Python function or class with the same behaviour.StructureInducer constructs a FeatureGenerator and initializes it with featureInducer and subsetsGenerator.orngCI.AttributeRedundanciesRemover. If given, it is called prior to structure induction (but not during induction).keepDuplicates to 1).This example shows how to induce a structure using Kramer's method for feature construction, where data stores examples for dataset Monk 1.
Class can be used as a Learner.
Basically, HINT is just a wrapper around StructureInducer. The user can specify the type of decomposition (minimal complexity, minimal error) and the size of bound sets. Alternatively, HINT can use defaults. When minimal error decomposition is used, HINT fits parameter m for error estimation by performing a 5-fold cross validation to find the value that gives the maximal classification accuracy.
"complexity", "error", or "auto". This, basically, sets the StructureInducer's featureInducer to FeatureByMinError or FeatureByMinComplexity. When type is "auto", HINT tries to determine whether the domain contains noise or not by observing whether there are examples with the same values of attributes but different classes. If there are no such examples, minimal-complexity is used; if there are, it switches to minimal-error.
FeatureByMinError is used if m is given, and FeatureByMinComplexity if it is not.This function adds an attribute to the domain. A new domain (and a new ExampleTable is constructed; the original is left intact. Values for the new attribute are computed through its valueFrom function.
Similar to addAttribute, but also removes the bound attributes from the domain.
Prints out a hierarchy starting with a given attribute or a classifier. In the latter case, the classifier must support boundset method; classifiers returned by InduceStructure and HINT do have this method.
You've seen an example of how to use this function a while ago.
Writes a hierarchy starting with a given attribute or a classifier to a file for tree drawing program Dot from package GraphViz. If the classifier is given, it must support boundset method; classifiers returned by InduceStructure and HINT do have this method. File can be given as a string, in which case a new file is created, or as an opened file, in which case the tree is appended to the file.
S. Kramer. CN2-MCI (1994): A two-Step Method for Constructive Induction. Proc. ML-COLT '94 Workshop on Constructive Induction and Change of Representation. (http://www.hpl.hp.com/personal/Tom_Fawcett/CICR/kramer.ps.gz)
S. B. Thrun et al. (1991): A performance comparison of different learning algorithms. Carnegie Mellon University CMU-CS-91-197.
B. Zupan (1997). Machine learning based on function decomposition. PhD Thesis at University of Ljubljana.
B. Zupan, M. Bohanec, J. Demsar and I. Bratko (1998). Feature transformation by function decomposition. IEEE intell. syst. their appl., vol. 13, pp. 38-43