IsIndex
=======

IsIndex attempts to guess if a html file is really an index that should
be the default page on a folder. It does this by looking at the links in
the content. If it contains many links all pointing to objects in a 
certain folder then it will make this as teh index. 
If multiple are indexes then only one will win.
If the file is not in the folder for which its an index, this will 
adjust the path to put it inside the folder.

The strategy used is as follows:

- get all the potential indexes and determine what they are most likely to be
  index of.

- rank them on the depth of that dir

- pick most deep dir. move all indexes that point to it into there.

- choose one of those to be the index

- loop (this move indexes that point to indexes)



>>> from collective.transmogrifier.tests import registerConfig
>>> from collective.transmogrifier.transmogrifier import Transmogrifier
>>> transmogrifier = Transmogrifier(plone)


>>> config = """
... [transmogrifier]
... pipeline =
...     source
...     isindex 
...     printer
...     
... [source]
... blueprint = transmogrify.webcrawler.test.htmlbacklinksource
... content=<a href="f1/blah1"></a><a href="f1/blah2"></a>
... f1/blah1=blah1
... f1/blah2=blah2
...
... [isindex]
... blueprint = transmogrify.webcrawler.isindex
...
... [printer]
... blueprint = collective.transmogrifier.sections.tests.pprinter
... """ 
>>> registerConfig(u'test1', config)
>>> transmogrifier(u'test1')
{'_mimetype': 'text/html',
 '_origin': 'content',
 '_path': 'f1/content',
 '_site_url': 'http://test.com/',
 'text': '<a href="f1/blah1"></a><a href="f1/blah2"></a>'}
{'_backlinks': [('http://test.com/content', '')],
 '_mimetype': 'text/html',
 '_path': 'f1/blah1',
 '_site_url': 'http://test.com/',
 'text': 'blah1'}
{'_backlinks': [('http://test.com/content', '')],
 '_mimetype': 'text/html',
 '_path': 'f1/blah2',
 '_site_url': 'http://test.com/',
 'text': 'blah2'}
     
>>> config = """
... [transmogrifier]
... pipeline =
...     source
...     isindex 
...     printer
... [source]
... blueprint = transmogrify.webcrawler.test.htmlbacklinksource
... f1/content=<a href="blah1"></a><a href="blah2"></a>
... f1/blah1=blah1
... f1/blah2=blah2
...
... [isindex]
... blueprint = transmogrify.webcrawler.isindex
...
... [printer]
... blueprint = collective.transmogrifier.sections.tests.pprinter
... """ 

>>> registerConfig(u'test2', config)
>>> transmogrifier(u'test2')
{'_mimetype': 'text/html',
 '_path': 'f1/content',
 '_site_url': 'http://test.com/',
 'text': '<a href="blah1"></a><a href="blah2"></a>'}
{'_backlinks': [('http://test.com/f1/content', '')],
 '_mimetype': 'text/html',
 '_path': 'f1/blah1',
 '_site_url': 'http://test.com/',
 'text': 'blah1'}
{'_backlinks': [('http://test.com/f1/content', '')],
 '_mimetype': 'text/html',
 '_path': 'f1/blah2',
 '_site_url': 'http://test.com/',
 'text': 'blah2'}     