Reading data
============

Create a workspace from the standard workspace factory using either a path to a
single data file or a path to a directory of similiar files. A workspace maps
(in the Python sense) collections (or layers) of feature data, providing access
to them via items() etc.

    >>> from mill import workspace
    >>> w = workspace('docs/data')
    >>> w # doctest: +ELLIPSIS
    <mill.workspace.Workspace object at ...>
    >>> w.path
    'docs/data'
    >>> w.items() # doctest: +ELLIPSIS
    [('test_uk', <mill.collection.Collection object at ...>)]

A collection is a workspace item and can be obtained as with any mapping or
dict. The name of a collection corresponds to the OGR layer name, meaning that
it's the base of a file name. A collection's *schema* property is currently a
list of (name, integer type) tuples, but this is likely to change in the
future. The number of features in a collection, or its length, can be obtained
in the usual Python way. 

    >>> c = w['test_uk']
    >>> c # doctest: +ELLIPSIS
    <mill.collection.Collection object at ...>
    >>> c.name
    'test_uk'
    >>> c.schema
    [('CAT', 2), ('FIPS_CNTRY', 4), ('CNTRY_NAME', 4), ('AREA', 2), ('POP_CNTRY', 2)]
    >>> len(c)
    48

Features in a collection can be accessed as if the collection were a dict. 
  
    >>> f = c['1']
    >>> f.id
    '1'
    >>> f.properties['CNTRY_NAME']
    'United Kingdom'

Users can control the response by binding callables to the collection's object
hook. The default object hook (used above) is mill.feature.Feature, a class
modeled loosely on GeoJSON. Callables must take 3 positional parameters and
return a Python object, like so:

    >>> from mill.feature import Feature
    >>> def testing_feature(id, properties, wkb):
    ...     d = {}
    ...     for key, val in properties.items():
    ...         if type(val) == type('string'):
    ...             val = val.encode('utf-8')
    ...             val = unicode(val)
    ...         d[key] = val
    ...     return Feature(id, d, wkb.encode('hex'))

This one converts all string properties to unicode and hex encodes the WKB byte
string extracted from the data and is used like so:

    >>> c.object_hook = testing_feature
    >>> f = c['1']
    >>> f.id
    '1'
    >>> f.properties['CNTRY_NAME']
    u'United Kingdom'
    >>> f.geometry # doctest: +ELLIPSIS
    '0103000000010000000d0000003d1059a48...'

More efficient access to features by id is provided by a collection's *all*
attribute, which opens a data access session:

    >>> features = c.all
    >>> f = features['1']
    >>> f.id
    '1'
    >>> f.properties['CNTRY_NAME']
    u'United Kingdom'

This attribute also provides the iterator protocol:

    >>> features = c.all
    >>> f = features.next()
    >>> f.id
    '0'
    >>> f.properties['CNTRY_NAME']
    u'United Kingdom'

Collections also provide a filtering iterator. The bbox positional parameter
causes the filter to pass only the features which intersect with the specified
(minx, miny, maxx, maxy) bounding value tuple. The tuple values are understood
to be of the same coordinate system as the data. 

  >>> query = c.filter(bbox=(-1.0, 50.0, -0.5, 50.5))
  >>> query # doctest: +ELLIPSIS
  <mill.collection.Iterator object at ...>
  >>> results = [f for f in query]
  >>> len(results)
  1
  >>> f = results[0]
  >>> f.id
  '45'
  >>> f.properties['CNTRY_NAME']
  u'United Kingdom'

