<todo version="0.1.19">
    <title>
        Pyndexter, pronounced 'poindexter', a full text indexing abstraction layer
    </title>
    <note priority="medium" time="1145722536">
        Callbacks for index() and discard(), perhaps something similar for Source objects?
        <comment>
            Framework.update() accepts a filter callback. This could be sufficient.
        </comment>
    </note>
    <note priority="medium" time="1145802778" done="1170655322">
        Finish PyLucene adapter
        <comment>
            Functional enough for a first commit.
        </comment>
    </note>
    <note priority="medium" time="1145854608" done="1146296772">
        Finish MetaSource
    </note>
    <note priority="medium" time="1146321654">
        I think it might need a MIME filter system, for translating known content types to plain text for indexing. eg. Just the content of HTML pages. This could get out of hand.
    </note>
    <note priority="medium" time="1146328561" done="1146368244">
        state() is being called, which in the naive implementation simply walks the entire source. Need some way around this. Should the state() be accumulated somehow when the source is being walked?
    </note>
    <note priority="medium" time="1146331225" done="1146368238">
        HTTPSource should be able to handle multiple iterations, but self._traversed renders this impossible.
    </note>
    <note priority="medium" time="1159011350">
        For storing state, perhaps there should be default store_state(store)/restore_state(store) methods. Also need a Store class, or just use a file object...
    </note>
    <note priority="high" time="1159197046" done="1169000053">
        Refactor Indexer into two classes: the Indexer itself, and a class that glues Source and the Indexer together. This would remove the duplication I'm getting in all the stock methods (update, index, fetch, etc.)
        <comment>
            Done as the Framework class.
        </comment>
    </note>
    <note priority="medium" time="1168868728" done="1169000047">
        Add slicing to Result objects. This will allow fast pagination in result displays.
    </note>
    <note priority="low" time="1168875038" done="1170587379">
        Add some "stock" query translators (eg. a AND b OR c style, a b or c, +a +b c, etc.)
        <comment>
            Added a general to_boolean() method to the Query object. Operators can be overridden for variants.
        </comment>
    </note>
    <note priority="medium" time="1169007320">
        Incremental updates for the indexer state. Waiting until the end of the index, then writing the state, is bad. A single document error can render the entire index useless.
        <note priority="medium" time="1169007391">
            "Transactions" for state updates?
        </note>
        <note priority="medium" time="1169090428">
            I think an anydbm style interface for storing state could be useful.
        </note>
    </note>
    <note priority="medium" time="1169048222" done="1170655393">
        Add a swish-e adapter. The Python module SwishE only appears to expose searching :(
        <comment>
            Done, but only for searching.
        </comment>
    </note>
    <note priority="medium" time="1169086953">
        Why is Xapian not returning all the hits?
    </note>
    <note priority="medium" time="1169116208">
        I'd like to add database Sources, but I can't see a way to handle updated rows without doing a full table scan.
    </note>
    <note priority="medium" time="1169444419">
        Use metakit for pure-Python implementation? (Check out "divmod pyndex" for ideas)
    </note>
    <note priority="medium" time="1170604364" done="1170931795">
        Deprecate Hit and just use Document - they're almost identical in functionality.
        <comment>
            Bad idea. Hit now has indexed and current members, which lazily fetch from the Indexer and Framework, respectively.
        </comment>
        <note priority="medium" time="1170812979" done="0">
            Perhaps Results should use the framework to try and fetch a Document, then "underlay" the hit attributes?
        </note>
    </note>
    <note priority="high" time="1170651530">
        Add generalised "field" indexing.
    </note>
    <note priority="medium" time="1170653876">
        Search result ordering.
    </note>
    <note priority="high" time="1170654664">
        How do we detect when sources have been removed from the index? If file:///tmp changes to file:///usr, the Framework has no real way of detecting which URI's in the index are no longer valid.
    </note>
    <note priority="medium" time="1170685227">
        Default indexer tasks
        <note priority="medium" time="1146296806">
            Optimise on disk format for DefaultIndexer. Use URI/word "ids" rather than full word.
        </note>
        <note priority="medium" time="1170685251">
            Abstract storage mechanism so that sqlite, metakit, anydbm, etc. can be used. This would allow for wide use.
        </note>
        <note priority="medium" time="1170685266">
            Use bigrams same as the current 'default' search? This is a good solution I think. Allows for sub-word searches, start and end of word searches, etc.
        </note>
        <note priority="medium" time="1170685271">
            Optionally use snowball stemmer.
        </note>
        <note priority="medium" time="1170685277">
            Have a built-in stemmer? Porter?
        </note>
        <note priority="medium" time="1170685318">
            Use "nltk" stemmer?
        </note>
    </note>
    <note priority="medium" time="1170686012">
        http://www.biais.org/blog/index.php/2007/01/31/25-spelling-correction-using-the-python-natural-language-toolkit-nltk &lt;- interesting
    </note>
    <note priority="medium" time="1170739349">
        Pyndex adapter.
    </note>
    <note priority="medium" time="1170813131">
        Add utility function for converting attribute dictionary keys to plain strings (common pattern).
    </note>
    <note priority="medium" time="1170829158">
        Normalise URI usage everywhere.
    </note>
    <note priority="veryhigh" time="1170915596">
        Fix port parsing in util.URI.
    </note>
    <note priority="medium" time="1171055477">
        Write a decent test suite.
        <note priority="medium" time="1171271157">
            Test that searches return the right hits. Don't care about order.
        </note>
        <note priority="medium" time="1171271356">
            Test that all interfaces pass and receive unicode correctly.
        </note>
        <note priority="medium" time="1171271371">
            Test that all indexers and sources pass URI objects correctly.
        </note>
    </note>
</todo>
