Metadata-Version: 1.1
Name: bitstream
Version: 2.0.0-alpha.2
Summary: A Binary Data Type with a Stream Interface
Home-page: https://github.com/boisgera/bitstream
Author: Sébastien Boisgérault
Author-email: Sebastien.Boisgerault@mines-paristech.fr
License: MIT License
Description: **About this document.** It is originally a plain text file using the
        `Markdown <http://daringfireball.net/projects/markdown/>`__ syntax, but
        you may be reading a HTML, PDF or ReST version instead. In any case, the
        contents are subject to a `Creative Commons Attribution
        3.0 <http://creativecommons.org/licenses/by/3.0/>`__ license.
        
        What is Bitstream ?
        ===================
        
        Bitstream provides a binary data type with a stream interface for
        `Python <http://www.python.org/>`__.
        
        -  **Binary Data:** the ``BitStream`` class is a linearly ordered
           container of bits. The standard library is only convenient to manage
           binary data at the byte level. Consider using BitStream instead,
           especially you need to address the bit level.
        
        -  **Stream Interface:** you can only read data at the start of a stream
           and write data at its end. This is a very simple way to interact with
           binary data, but it is also the pattern that comes naturally in many
           applications. To manage binary codes and formats, in my experience,
           random data access is not a requirement.
        
        -  **Python and Numpy Types.** BitStream has built-in readers and
           writers for the common data types with a standard binary layout:
           bools, ASCII strings, fixed-size integers and floating-point
           integers.
        
        -  **User-Defined Types.** The list of supported types and binary
           representation may be enlarged at will: new readers and writers can
           be implemented and associated to specific data types.
        
        -  **Performance.** Bitstream is a Python C-extension module that has
           been optimized for the common use cases. Hopefully, it will be fast
           enough for your needs ! Under the hood, the
           `Cython <http://www.cython.org>`__ language and compiler are used to
           generate this extension module.
        
        -  **Open-Source:** the Bitstream software is distributed under a `MIT
           license <http://opensource.org/licenses/MIT>`__, its documentation
           under a `Creative Commons Attribution
           3.0 <http://creativecommons.org/licenses/by/3.0/>`__ license. The
           development takes place on
           `GitHub <https://github.com/boisgera/bitstream>`__ and releases are
           also available on `PyPi <https://pypi.python.org/pypi/bitstream/>`__.
        
        Requirements & Installation
        ===========================
        
        Bitstream targets `Python
        2.7 <http://www.python.org/download/releases/2.7>`__, you will need to
        install it first.
        
        **TODO:** move NumPy dependency here (? Dunno ...), talk about
        Linux-only platform.
        
        Then, several installation options are available: **TODO:** state
        clearly what one should do depending on the aim.
        
        -  **Easy install:** if the `pip <https://pypi.python.org/pypi/pip>`__
           package manager is available, execute the following command as root:
        
           ::
        
               $ pip install bitstream
        
           The dependencies of Bitstream will be handled automatically. If you
           don't have root privileges, use
           `virtualenv <https://pypi.python.org/pypi/virtualenv>`__.
        
        -  **Install from source:** the releases of Bitstream are available on
           the `Python Package Index
           (PyPi) <https://pypi.python.org/pypi/bitstream/>`__. Once you have
           downloaded and unpacked the archive, to build the Bitstream module,
           you need `setuptools <https://pypi.python.org/pypi/setuptools>`__.
           You also need to install the `NumPy <http://www.numpy.org>`__
           package, version 1.6.1 or later.
        
           **TODO: test if numpy is automatically download if needed**.
        
           Then, as root, execute
        
           ::
        
               $ python setup.py install
        
        -  **Hack with git:** to experiment with the latest version of
           Bitstream, clone the GitHub repository:
        
           ::
        
               $ git clone git://github.com/boisgera/bitstream.git
        
           To actually build the module, you will need everything you need to
           build from source and will execute the same command. If in addition,
           you want to edit the source files, you will also need the
           `Cython <http://www.cython.org>`__ compiler, version 0.15.1 or later
           and will execute instead:
        
           ::
        
               $ python setup.py install --with-cython
        
        Getting Started
        ===============
        
        Most of the features of bitstream are available via the ``BitStream``
        class.
        
        ::
        
            >>> from bitstream import BitStream
        
        The module is tightly integrated with the
        `NumPy <http://www.numpy.org>`__ library. For convenience, we import all
        symbols from its top-level module.
        
        ::
        
            >>> from numpy import *
        
        Overview of Bitstream Features
        ==============================
        
        ::
        
            >>> stream = BitStream()
            >>> stream
            <BLANKLINE>
            >>> stream.write(True, bool)
            >>> stream
            1
            >>> stream.write(False, bool)
            >>> stream
            10
            >>> stream.write(-128, int8)
            >>> stream
            1010000000
            >>> stream.write("AB", str)
            >>> stream
            10100000000100000101000010
            >>> stream.read(bool, 2)
            [True, False]
            >>> stream
            100000000100000101000010
            >>> stream.read(int8, 1)
            array([-128], dtype=int8)
            >>> stream
            0100000101000010
            >>> stream.read(str, 2)
            'AB'
            >>> stream
            <BLANKLINE>
        
        Built-in Readers and Writers
        ============================
        
        Bools
        -----
        
        Write single bits to a bitstream with the arguments ``True`` and
        ``False``:
        
        ::
        
            >>> stream = BitStream()
            >>> stream.write(False, bool)
            >>> stream.write(True , bool)
            >>> stream
            01
        
        Lists of booleans may be used too write multiple bits at once:
        
        ::
        
            >>> stream = BitStream()
            >>> stream.write([], bool)
            >>> stream
            <BLANKLINE>
            >>> stream.write([False], bool)
            >>> stream.write([True] , bool)
            >>> stream
            01
            >>> stream.write([False, True], bool)
            >>> stream
            0101
        
        The second argument to the ``write`` method -- the type information --
        can also be specified with the keyword argument ``type``:
        
        ::
        
            >>> stream = BitStream()
            >>> stream.write(False, type=bool)
            >>> stream.write(True , type=bool)
            >>> stream
            01
        
        For single bools or lists of bools, the type information is optional:
        
        ::
        
            >>> stream = BitStream()
            >>> stream.write(False)
            >>> stream.write(True)
            >>> stream.write([])
            >>> stream.write([False])
            >>> stream.write([True])
            >>> stream.write([False, True])
            >>> stream
            010101
        
        Numpy ``bool_`` scalars or one-dimensional arrays can be used instead:
        
        ::
        
            >>> bool_
            <type 'numpy.bool_'>
            >>> stream = BitStream()
            >>> stream.write(bool_(False)  , bool)
            >>> stream.write(bool_(True)   , bool)
            >>> stream
            01
        
            >>> stream = BitStream()
            >>> empty = array([], dtype=bool)
            >>> stream.write(empty, bool)
            >>> stream
            <BLANKLINE>
            >>> stream.write(array([False]), bool)
            >>> stream.write(array([True]) , bool)
            >>> stream.write(array([False, True]), bool)
            >>> stream
            0101
        
        For such data, the type information is also optional:
        
        ::
        
            >>> stream = BitStream()
            >>> stream.write(bool_(False))
            >>> stream.write(bool_(True))
            >>> stream.write(array([], dtype=bool))
            >>> stream.write(array([False]))
            >>> stream.write(array([True]))
            >>> stream.write(array([False, True]))
            >>> stream
            010101
        
        Python and Numpy numeric types are also valid arguments: zero is
        considered false and nonzero numbers are considered true.
        
        **Q:** Use a predicate instead (non-zero) ? and check iff ?
        
        ::
        
            >>> small_integers = range(0, 64)
            >>> stream = BitStream()
            >>> for integer in small_integers:
            ...     stream.write(integer, bool)
            >>> stream
            0111111111111111111111111111111111111111111111111111111111111111
            >>> stream = BitStream()
            >>> for integer in small_integers:
            ...     stream.write(-integer, bool)
            >>> stream
            0111111111111111111111111111111111111111111111111111111111111111
        
            >>> large_integers = [2**i for i in range(6, 64)]
            >>> stream = BitStream()
            >>> for integer in large_integers:
            ...     stream.write(integer, bool)
            >>> stream
            1111111111111111111111111111111111111111111111111111111111
            >>> stream = BitStream()
            >>> for integer in large_integers:
            ...     stream.write(-integer, bool)
            >>> stream
            1111111111111111111111111111111111111111111111111111111111
        
        **TODO:** use iinfo(type).min/max
        
        **TODO:** write ``sample(type, r)`` iterator.
        
        ::
        
            >>> def irange(start, stop, r=1.0):
            ...     i = 0
            ...     while i < stop:
            ...         yield i
            ...         i = max(i+1, int(i*r))
        
            >>> unsigned = [uint8, uint16, uint32]
            >>> for integer_type in unsigned:
            ...     _min, _max = iinfo(integer_type).min, iinfo(integer_type).max
            ...     for i in irange(_min, _max + 1, r=1.001):
            ...         integer = integer_type(i)
            ...         if integer and BitStream(integer, bool) != BitStream(True):
            ...             type_name = integer_type.__name__
            ...             print "Failure for {0}({1})".format(type_name, integer)
        
        
        
        
        
            >>> stream = BitStream()
            >>> stream.write(0.0, bool)
            >>> stream.write(1.0, bool)
            >>> stream.write(pi , bool)
            >>> stream.write(float64(0.0), bool)
            >>> stream.write(float64(1.0), bool)
            >>> stream.write(float64(pi) , bool)
            >>> stream
            011011
        
        **TODO:** arrays of numeric type (non-bools), written as bools
        
        --------------
        
        **TODO:** Mark all following behaviors as undefined ? Probably safer ...
        
        Actually, any single data written as a bool, is conceptually cast into a
        bool first, with the semantics of the ``bool`` constructor. List and
        one-dimensional numpy array arguments are considered holders of multiple
        data, each of which is converted to bool. Any other sequence type
        (strings, tuples, etc.) is considered single data.
        
        ::
        
            >>> bool("")
            False
            >>> bool(" ")
            True
            >>> bool("A")
            True
            >>> bool("AAA")
            True
        
            >>> stream = BitStream()
            >>> stream.write("", bool)
            >>> stream.write(" ", bool)
            >>> stream.write("A", bool)
            >>> stream.write("AAA", bool)
            >>> stream
            0111
            >>> stream = BitStream()
            >>> stream.write(["", " " , "A", "AAA"], bool)
            >>> stream
            0111
            >>> stream = BitStream()
            >>> stream.write(array(["", " " , "A", "AAA"]), bool)
            >>> stream
            0111
        
            >>> stream = BitStream()
            >>> stream.write(    (), bool)
            >>> stream.write(  (0,), bool)
            >>> stream.write((0, 0), bool)
            >>> stream
            011
        
            >>> stream = BitStream()
            >>> stream.write([[], [0], [0, 0]], bool)
            >>> stream
            011
        
            >>> class BoolLike(object):
            ...     def __init__(self, value):
            ...         self.value = bool(value)
            ...     def __nonzero__(self):
            ...         return self.value
            >>> false = BoolLike(False)
            >>> true = BoolLike(True)
            >>> stream = BitStream()
            >>> stream.write(false, bool)
            >>> stream.write(true, bool)
            >>> stream.write([false, true], bool)
            >>> stream
            0101
        
        TODO:
        
        -  direct call to ``write_bool`` (import the symbol first)
        -  reader tests
        
        Integers
        --------
        
        **TODO**
        
        Floating-Point Numbers
        ----------------------
        
        ::
        
            >>> import struct
            >>> struct.pack(">d", pi)
            '@\t!\xfbTD-\x18'
        
            >>> stream = BitStream()
            >>> stream.write(0.0)
            >>> stream.write([1.0, 2.0, 3.0])
            >>> stream.write(arange(4.0, 10.0))
            >>> len(stream)
            640
            >>> output = stream.read(float, 10)
            >>> type(output)
            <type 'numpy.ndarray'>
            >>> all(output == arange(10.0))
            True
        
            >>> BitStream(1.0) == BitStream(1.0, float) == BitStream(1.0, float64)
            True
            >>> BitStream(1.0) == BitStream([1.0]) == BitStream(ones(1))
            True
        
        The byte order is big endian:
        
        ::
        
            >>> BitStream(struct.pack(">d", pi)) == BitStream(pi)
            True
        
        Extra Methods
        =============
        
        **TODO:**:
        
        -  length
        
        -  str, repr
        
        -  \_extend ? Make it public ? This is low-level ... but may be
           necesssary to implement new readers/writers. Don't specify it now, as
           we don't specify the offsets / stream state, let the user only rely
           on the high-level methods.
        
        -  copy
        
        -  hash, comparison.
        
        Custom Writers and Readers
        ==========================
        
        ::
        
            >>> import bitstream
        
        Definition and Registration of Writers and Readers
        --------------------------------------------------
        
        Let's define a writer for the binary representation of natural numbers:
        
        ::
        
            >>> def write_integer(stream, data):
            ...     if isinstance(data, list):
            ...         for integer in data:
            ...             write_integer(stream, integer)
            ...     else:
            ...         integer = int(data)
            ...         if integer < 0:
            ...             error = "negative integers cannot be encoded"
            ...             raise ValueError(error)
            ...         bools = []
            ...         while integer:
            ...             bools.append(integer & 1)
            ...             integer = integer >> 1
            ...         bools.reverse()
            ...         stream.write(bools, bool)
        
        We can check that this writer behaves as expected:
        
        ::
        
            >>> stream = BitStream()
            >>> write_integer(stream, 42)
            >>> stream
            101010
            >>> write_integer(stream, [1, 2, 3])
            >>> stream
            10101011011
        
        Then, we can associate it to the type ``int``:
        
        ::
        
            >>> bitstream.register(int, writer=write_integer)
        
        After this step, ``BitStream`` will redirect all data of type ``int`` to
        this writer:
        
        ::
        
            >>> BitStream(42)
            101010
            >>> BitStream([1, 2, 3])
            11011
        
        If the type information is explicit, other kind of data can use this
        writer too:
        
        ::
        
            >>> BitStream(uint8(42), int)
            101010
            >>> BitStream("42", int)
            101010
        
        A possible implementation of the corresponding reader is given by:
        
        ::
        
            >>> def read_integer(stream, n=None):
            ...     if n is not None:
            ...         error = "unsupported argument n"
            ...         raise NotImplementedError(error)
            ...     else:
            ...         integer = 0
            ...         for _ in range(len(stream)):
            ...             integer = integer << 1
            ...             if stream.read(bool):
            ...                 integer += 1
            ...     return integer
        
            >>> read_integer(BitStream(42))
            42
        
        Once this reader is registered with
        
        ::
        
            >>> bitstream.register(int, reader=read_integer)
        
        the calls to ``read_integer`` can be made through the ``read`` method of
        ``BitStream``.
        
        ::
        
            >>> BitStream(42).read(int)
            42
        
        In all readers, the second argument of readers, named ``n``, represents
        the number of values to read from the stream. Here, this argument is not
        supported, instead any call to this reader interprets the complete
        stream content as a single value.
        
        Writer and Reader Factories
        ---------------------------
        
        We actually had a legitimate reason not to support the number of values
        argument in the binary representation reader. Indeed, when the binary
        representation is used to code sequence of integers instead of a single
        integer, it becomes ambiguous: the same bitstream may represent several
        sequences of integers. For example, we have:
        
        ::
        
            >>> BitStream(255)
            11111111
            >>> BitStream([15, 15])
            11111111
            >>> BitStream([3, 7, 3, 1])
            11111111
            >>> BitStream([3, 3, 3, 3])
            11111111
        
        We say that this code is not *self-delimiting*, as there is no way to
        know where is the boundary between the bits coding for different
        integers.
        
        For natural numbers with known bounds, we may solve this problem by
        setting a number of bits to be used for each integer. However, to do
        that, we would have to define and register a new writer for every
        possible number of bits. Instead, we register a single but configurable
        writer, defined by a writer factory.
        
        Let's define a type tag ``uint`` whose instances hold a number of bits:
        
        ::
        
            >>> class uint(object):
            ...     def __init__(self, num_bits):
            ...         self.num_bits = num_bits
        
        Then, we define a factory that given a ``uint`` instance, returns a
        stream writer:
        
        ::
        
            >>> def write_uint_factory(instance):
            ...     num_bits = instance.num_bits
            ...     def write_uint(stream, data):
            ...         if isinstance(data, list):
            ...             for integer in data:
            ...                 write_uint(stream, integer)
            ...         else:
            ...             integer = int(data)
            ...             if integer < 0:
            ...                 error = "negative integers cannot be encoded"
            ...                 raise ValueError(error)
            ...             bools = []
            ...             for _ in range(num_bits):
            ...                 bools.append(integer & 1)
            ...                 integer = integer >> 1
            ...             bools.reverse()
            ...             stream.write(bools, bool)
            ...     return write_uint
        
        Finally, we register this writer factory with ``bitstream``:
        
        ::
        
            >>> bitstream.register(uint, writer=write_uint_factory)
        
        To select a writer, we use the proper instance of type tag:
        
        ::
        
            >>> BitStream(255, uint(8))
            11111111
            >>> BitStream(255, uint(16))
            0000000011111111
            >>> BitStream(42, uint(8))
            00101010
            >>> BitStream(0, uint(16))
            0000000000000000
        
        **TODO: reader, give details, comment.**
        
        ::
        
            >>> def read_uint_factory(instance): # use the name factory ?
            ...     num_bits = instance.num_bits
            ...     def read_uint(stream, n=None):
            ...         if n is None:
            ...             integer = 0
            ...             for _ in range(num_bits):
            ...                 integer = integer << 1
            ...                 if stream.read(bool):
            ...                     integer += 1
            ...             return integer
            ...         else:
            ...             integers = [read_uint(stream) for _ in range(n)]
            ...             return integers
            ...     return read_uint
        
            >>> bitstream.register(uint, reader=read_uint_factory)
        
            >>> stream = BitStream([0, 1, 2, 3, 4], uint(8))
            >>> stream.read(uint(8))
            0
            >>> stream.read(uint(8), 1)
            [1]
            >>> stream.read(uint(8), 3)
            [2, 3, 4]
        
        Snapshots (Bitstream state)
        ===========================
        
        (random thought for the moment, fondations for the design of snapshots:)
        
        **TODO:** sort doc material vs dev comments, dispatch, polish.
        
        terms: snapshot (state ?), save, restore.
        
        **Goals:**
        
        -  for the implementation and error API point of view: give a mechanism
           to roll back all operations that may end up with a "corrupted" stream
           (stream content has changed but cannot deliver what the read asked
           for), so that the mere error scheme that we have now on reader can be
           upgraded to a real exception handling mechansim: if a read fails, the
           stream state hasn't changed.
        
        -  offer the user with read-only and on-demand roll-back features.
        
        Add ``save`` (returns a (read\_offset, write\_offset) state) and
        ``restore`` (with state as an argument) or ``load`` ? We leverage the
        fact in our stream model, the data is not immutable, but no information
        is lost, only added at the end, so we may always roll back if we need
        too.
        
        These two methods shall enable a true exception management (not mere
        errors, when shit happens, we still have a usable state), AND at the
        same time, read-only streams. Maybe higher-level constructs (with
        context manager ?) could be useful here to exploit those two schemes.
        
        UPDATE: if we want the save / restor NOT TO CRASH, we have to ensure of
        two things
        
        -  first that the state stores the id of the stream ... you can't
           restore a state that was not created by you.
        
        -  secondly, as restore + write break the immutability of the stream,
           save/restore pairs should only be applied in reverse order, with
           possible drops in the restore. That should be check by the stream.
           What I mean is that save 1, save 2, restore 2, restore 1 os OK, S1,
           S2, R1 is ok, but S1, S2, R1, S2 is not.
        
        Design: ``State`` class with ref to the stream attribute,
        ``read_offset``, ``write_offset``, implements the comparison (?). Not
        that simple. The idea behind the comparison is that you should always be
        able to restore an OLDER snapshot but actually if you think of it,
        that's older in the story of emission of snapshots. So you also have to
        embed a snapshot number and base your comparison on that. As a
        consequence, bitstream instances have nothing to store but a snapshot
        number (the number of the snapshot that was emitted, or 0 if no snapshot
        was). No, this is more complex, requires some thinking. Need to track
        all restorable states in the stream ? Maybe ...
        
        **TODO.** basic doctest.
        
        ::
        
            >>> stream = BitStream()
            >>> s0 = stream.save()
            >>> stream.write("A")
            >>> s1 = stream.save()
            >>> stream.write("B")
            >>> s2 = stream.save()
            >>> stream.restore(s1)
            >>> stream == BitStream("A")
            True
            >>> stream.restore(s2) # doctest: +ELLIPSIS
            Traceback (most recent call last):
            ...
            ValueError: ...
            >>> stream.write("C")
            >>> stream == BitStream("AC")
            True
            >>> s3 = stream.save()
            >>> stream.restore(s1)
            >>> stream == BitStream("A")
            True
            >>> stream.restore(s0)
            >>> stream == BitStream("")
            True
        
        Most useful patterns:
        
        **Avoid copies.** Do read/write stuff on a stream and when you're done,
        restore the original stream intact. Here the snapshot approach avoids a
        copy of the bitstream. The pattern is a ``try/finally`` with a snapshot
        restore in the finally clause.
        
        ::
        
            >>> stream = BitStream("ABC")
            >>> snapshot = stream.save()
            >>> try:
            ...     # turn "ABC" into "BCD"
            ...     _ = stream.read(str, 1)
            ...     stream.write("D")
            ... finally:
            ...     stream.restore(snapshot)
            >>> stream == BitStream("ABC")
            True
        
        If an exception can be raised during the read/write, the stream is still
        restored in the original state.
        
        ::
        
            >>> from bitstream import ReadError
            >>> stream = BitStream("ABC")
            >>> snapshot = stream.save()
            >>> try:
            ...     # read too much data
            ...     _ = stream.read(str, 4)
            ... except ReadError:
            ...     pass
            ... finally:
            ...     stream.restore(snapshot)
            >>> stream == BitStream("ABC")
            True
        
        Remark: the pattern breaks if during the actions, an earlier snapshot is
        restored.
        
        **Support true exceptions in readers.** Som reading actions may fail,
        but you are not able to tell beforehand, you have to start a sequence of
        smaller reads before you know of the big read call is going to work. A
        reader with a proper exception support will restore the orginal state of
        the stream before raising the exception if something goes wrong.
        Typically, that means a reader code with the structure:
        
        ::
        
            >>> def reader(stream, n=None):
            ...     snapshot = stream.save()
            ...     try:
            ...         pass # do what you have to do.
            ...     except ReadError: 
            ...         stream.restore(snapshot)
            ...         raise
        
        Make convenience functions (with context managers) for those use cases ?
        For the "light-weight copy" that would be easy (under what name ?) but
        for the reader, that's not obvious, the reader developer may be willing
        to analyze the error and customize the error message before a re-raise
        ...
        
        Unit Tests
        ==========
        
        The text version of the document you are reading is also an executable
        specification. Check that the code examples produce the expected results
        with
        
        ::
        
            $ python -m doctest -v manual.txt
        
        Examples
        ========
        
        Unary coder / Rice coder ? Huffman tree/table coder ?
        
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Cython
