It can often happen that the definition of a discrete attribute (EnumVariable) declares values that do not actually appear in the data, either originally or as a consequence of some preprocessing. Such anomalies are taken care of by class that, given an attribute and the data, determines whether there are any unused values and reduces the attribute if needed. There are four possible cases.
None.removeOneValued which is False by default, so such attributes are retained unless explicitly specified otherwise.ClassifierByLookupTable1 is used for mapping).Attributes
False).Let us show the use of the class on a simple dataset with three examples, given by the following tab-delimited file.
The below script construct a list newattrs which contains either the original attribute, None or a reduced attribute, for each attribute from the original dataset.
part of unusedValues.py (uses unusedValues.tab)
And here's the script's output.
Attributes a and y are OK and are left alone. In b, value 1 is not used and is removed (not in the original attribute, of course; a new attribute is created). c is useless and is removed altogether. d is retained since removeOneValued was left at False; if we set it to True, this attribute would be removed as well.
The values of the new attribute for b are automatically computed from the original. The script can thus proceed as follows.
part of unusedValues.py (uses unusedValues.tab)
List newattrs includes some original attributes (a, d and y) a new attribute (b) and a None (for c). The latter is removed by filter called at the beginning of the script. We use filteredattrs to construct a new domain and then convert the original data to newdata. As the output shows, the two tables are the same except for the removed attribute c.