.. _hierarchical:	

Column hierarchies and the `coloring` attribute
--------------------------------------------------------------------------------

Every instance of :class:`tabular.tabarray.tabarray` is initialized with a `coloring` attribute.   This attribute is used to represent hierarchical groupings of columns.    Specifically, for a tabarray `X`::

	X.coloring
	
is a Python dictionary whose keys are the names of "color-groups" -- that is, groupings of columns.   For any color group `g`::

	X.coloring[g] 
	
is a Python list of names of columns of `X` that belong to group `g`.      Tabarray's index method (and things that use it) recognize the names of the color-groups, so that you can use them just like individual column names. 

For instance, let's make up some data:

>>> import numpy
>>> import tabular as tb
>>> Sectors = ['Service','Manufacturing','Education','Healthcare','Entertainment','Taxes']
>>> Areas =  ['New York','California','Oregon','New Mexico','Virginia','Hunan','Guangdong','Heilongjiang','Bayern','Hesse','Rhineland','Sachsen']
>>> Recs = [(v,) + tuple(numpy.random.randint(0,10,size=(len(Areas),))) for v in Sectors]
>>> Data = tb.tabarray(records = Recs,names = ['Sector'] + Areas)
>>> Data
tabarray([('Service', 5, 2, 6, 0, 6, 9, 3, 9, 5, 9, 7, 0),
       ('Manufacturing', 8, 4, 8, 4, 7, 5, 9, 0, 8, 9, 0, 0),
       ('Education', 6, 9, 6, 5, 5, 9, 1, 4, 6, 0, 4, 9),
       ('Healthcare', 0, 8, 0, 6, 4, 2, 0, 9, 0, 9, 8, 1),
       ('Entertainment', 8, 2, 5, 6, 7, 0, 5, 5, 7, 8, 6, 0),
       ('Taxes', 0, 5, 5, 6, 9, 8, 2, 8, 4, 5, 9, 2)], 
      dtype=[('Sector', '|S13'), ('New York', '<i4'), ('California', '<i4'), ('Oregon', '<i4'), ('New Mexico', '<i4'), ('Virginia', '<i4'), ('Hunan', '<i4'), ('Guangdong', '<i4'), ('Heilongjiang', '<i4'), ('Bayern', '<i4'), ('Hesse', '<i4'), ('Rhineland', '<i4'), ('Sachsen', '<i4')])

Obviously, there's some extra structure on this information that isn't captured by `Data`:  the first few columns are American states, the next few are Chinese provinces, and the last few are German administrative units.   To encode this extra structure, we use the coloring attribute.   

>>> Data.coloring										#The coloring attribute is initialized as the empty dictionary. 
{}
>>> Data.coloring['US'] = ['New York','California','Oregon','New Mexico','Virginia'] 		# put in the US entry . . .
>>> Data.coloring['China'] = ['Hunan','Guangdong','Heilongjiang']				# . . . the China entry
>>> Data.coloring['Germany'] = ['Bayern','Hesse','Rhineland','Sachsen']				# . . . the German entry

Now, we can reference the subgroups of columns just like we would reference individual columns:

>>> Data['US']
tabarray([(5, 2, 6, 0, 6), (8, 4, 8, 4, 7), (6, 9, 6, 5, 5), (0, 8, 0, 6, 4),
       (8, 2, 5, 6, 7), (0, 5, 5, 6, 9)], 
      dtype=[('New York', '<i4'), ('California', '<i4'), ('Oregon', '<i4'), ('New Mexico', '<i4'), ('Virginia', '<i4')])
>>> Data[['Germany','Guangdong']]
tabarray([(3, 5, 9, 7, 0), (9, 8, 9, 0, 0), (1, 6, 0, 4, 9), (0, 0, 9, 8, 1),
       (5, 7, 8, 6, 0), (2, 4, 5, 9, 2)], 
      dtype=[('Guangdong', '<i4'), ('Bayern', '<i4'), ('Hesse', '<i4'), ('Rhineland', '<i4'), ('Sachsen', '<i4')])

You can set the coloring at the instantiation of a tabarray by passing the `coloring` keyword argument to the tabarray constructor:

>>> Data = tb.tabarray(records = Recs,names = Names, coloring = {'US': ['New York', 'California', 'Oregon', 'New Mexico', 'Virginia'],'China':['Hunan','Guangdong','Heilongjiang']})
>>> Data.coloring
{'China': ['Hunan', 'Guangdong', 'Heilongjiang'],
 'US': ['New York', 'California', 'Oregon', 'New Mexico', 'Virginia']}

Colorings can encode nested and overlapping hierarchies. For instance:

>>> Data.coloring['Western'] = ['California','Oregon','New Mexico']
>>> Data['Western']
tabarray([(2, 6, 0), (4, 8, 4), (9, 6, 5), (8, 0, 6), (2, 5, 6), (5, 5, 6)], 
      dtype=[('California', '<i4'), ('Oregon', '<i4'), ('New Mexico', '<i4')])
>>> Data.coloring['Southern'] = ['Virginia','Hunan','Guangdong','Bayern']

Coloring information is handled properly by tabarray methods.  For instance `__getitem__` handles coloring by intersection of columns:

>>> Data['US'].coloring
{'Western': ['California', 'Oregon', 'New Mexico']}
>>> Data['US']['Western']
>>> Data[['California','New York','Oregon']].coloring
{'Western': ['California', 'Oregon']}
tabarray([(2, 6, 0), (4, 8, 4), (9, 6, 5), (8, 0, 6), (2, 5, 6), (5, 5, 6)], 
      dtype=[('California', '<i4'), ('Oregon', '<i4'), ('New Mexico', '<i4')])
>>> Data['Germany'].coloring
{'Southern': ['Bayern']}
>>> Data['Southern']['China']
tabarray([(4, 5), (3, 6), (3, 0), (0, 7), (7, 2), (0, 4)], 
      dtype=[('Hunan', '<i4'), ('Guangdong', '<i4')])

As another example, the `colstack` handles coloring by union of columns:

>>> USNames = ['New York', 'California', 'Oregon', 'New Mexico', 'Virginia']
>>> USRecs = [tuple(numpy.random.randint(0,10,size = (len(USNames),))) for v in Sectors]
>>> USData = tb.tabarray(records = USRecs,names = USNames,coloring={'Western':['California','Oregon','New Mexico']})
>>> ChinaNames = ['Hunan', 'Guangdong', 'Heilongjiang']
>>> ChinaRecs = [tuple(numpy.random.randint(0,10,size = (len(ChinaNames),))) for v in Sectors]
>>> ChinaData = tb.tabarray(records = ChinaRecs,names = ChinaNames,coloring={'Southern':['Guangdong','Hunan']})
>>> Z = USData.colstack(ChinaData)				
>>> Z.coloring
{'Southern': ['Guangdong', 'Hunan'],
 'Western': ['California', 'Oregon', 'New Mexico']}
 
Some tabarray methods make more sophisticated use of coloring.   For instance, the `pivot` method uses coloring to encode the 2-D structure of a pivot table.  (See :ref:`pivot documentation <pivot>`.)

