Metadata-Version: 1.1
Name: interlap
Version: 0.1
Summary: interlap: fast, simple interval overlap testing
Home-page: http://brentp.github.io/interlap
Author: Brent S Pedersen
Author-email: bpederse@gmail.com
License: MIT
Description: InterLap: simple, fast interval overlap testing
        -----------------------------------------------
        
        InterLap is >20 times faster than doing a naive search (see: brentp.github.io/interlap/bench.html)
        with **no memory overhead** because it operates on a sorted list. It is pure python and has no
        dependencies.
        
        It uses binary search and a knowledge of the longest observed interval to quickly query datasets
        with 100's of thousands of intervals.
        
        See the documentation at [https://brentp.github.io/interlap/](https://brentp.github.io/interlap)
        
        Usage
        -----
        
        Interlap takes tuples or lists where the first 2 elements are start, end and the remaining
        elements can be anything.
        
        ```Python
        >>> from interlap import InterLap
        >>> inter = InterLap()
        
        #Here create 10K random intervals:
        
        >>> import random
        >>> random.seed(42)
        >>> sites = random.sample(range(22, 100000000, 12), 50000)
        >>> ranges = [(i, i + random.randint(2000, 20000)) for i in sites]
        
        >>> inter.update(ranges)
        >>> inter.add((20, 22, {'info': 'hi'}))
        
        >>> [20, 21] in inter
        True
        
        >>> next(inter.find((21, 21)))
        (20, 22, {'info': 'hi'})
        
        >>> inter.add((2, 3, {'info': 'hello'}))
        
        *NOTE*: below shows how edge-cases are handled.
        >>> list(inter.find((2, 2)))
        [(2, 3, {'info': 'hello'})]
        >>> list(inter.find((3, 3)))
        [(2, 3, {'info': 'hello'})]
        
        Test every item in the InterLap. These 50K queries take < 0.5 seconds:
        
        >>> for s, e in ranges:
        ...     assert (s, e) in inter
        
        ```
        
        Installation
        ------------
        
        ```Shell
        
        pip install interlap
        
        ```
        
        Example
        -------
        
        In general, we will want one interlap per chromosome for genomic data.
        The snippet below shows a simple way to do that in the process of creating
        and then querying some intervals.
        
        ```Python
        
        from interlap import InterLap
        from collections import defaultdict
        import sys
        
        # use defaultdict to key by chromosome.
        inter = defaultdict(InterLap)
        
        for toks in (x.rstrip().split("\t") for x in open(sys.argv[1])):
            start, end = map(int, toks[1:3])
            inter[toks[0]].add((start, end, toks))
        
        # now find overlaps in another file:
        
        for toks in (x.rstrip().split("\t") for x in open(sys.argv[2])):
            start, end = map(int, toks[1:3])
        
            print list(inter[toks[0]].find((start, end)))
        
        ```
        
        Why
        ---
        
        I am aware of bx-python's interval tree (I wrote the cython version)
        but for some projects it is nice to have a simple dependency (or no
        dependency since this can be included as a single file or 30 lines
        of code) when we just need something a bit better than naive overlap
        testing.
        
        In my testing, the method implemented here, using a sorted list and keeping
        track of the longest observed interval is the fastest *pure python* method
        *as long as the longest observed interval is does not cover a substantial 
        fraction of intervals in the set*.
        
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: License :: OSI Approved :: MIT License
Classifier: Intended Audience :: Science/Research
Classifier: Natural Language :: English
Classifier: Operating System :: Unix
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
