% Bitstream 1.0.0-alpha.1 Manual
% Sébastien Boisgérault
% Sat, 04 May 2013

**About this document:**

This document is available under a [Creative Commons Attribution 3.0][CC-BY-3.0]
license. 

Use [`pandoc`][pandoc] to convert this text file to other formats, 
such as PDF or HTML. 
This document is also an executable specification: you can check that its code 
examples produce the expected results with:

    $ python -m doctest -v spec.txt

Send bug reports and requests to <Sebastien.Boisgerault@mines-paristech.fr>.

**TODO: ref to the github repo. instead. Still find a way to give my contact.**

[CC-BY-3.0]: http://creativecommons.org/licenses/by/3.0/
[markdown]: http://daringfireball.net/projects/markdown/
[pandoc]: http://johnmacfarlane.net/pandoc/


About Bitstream
================================================================================

Bitstream provides a binary data structure with a stream API for [Python 2.7][Python].

It is tightly integrated with the [NumPy][NumPy] library with readers 
and writers for its integer and floating-point number types (*but not all* !
Refer to NumPy *most common types* ?). 
Vectorization with NumPy arrays (*explain*).

User-defined readers and writers ... (and family / factory).

Implemented with [Cython][Cython], it should be fast enough for your use case.

[Python]: http://www.python.org/
[Cython]: http://www.cython.org

**TODO: describe key features: binary data structure, stream-like API, 
integration with NumPy arrays, integers and float data, user-defined 
readers/writers (factories) scheme, etc.**

Installation
================================================================================

**TODO: install with pip, reference to PyPi.**

**TODO: get from github, install with setup.py (dependencies : cython. 
Version ?)**

Getting Started
================================================================================

...

Ante
================================================================================  
    
Most of the features of bitstream are available via the `BitStream` class.

    >>> from bitstream import BitStream

The module is tightly integrated with the [NumPy][NumPy] library. 
For convenience, we import all symbols from its top-level module.

    >>> from numpy import *

[NumPy]: http://www.numpy.org/


Quick overview of `bitstream` features
================================================================================ 

    >>> stream = BitStream()
    >>> stream
    <BLANKLINE>
    >>> stream.write(True, bool)
    >>> stream
    1
    >>> stream.write(False, bool)
    >>> stream
    10
    >>> stream.write(-128, int8)
    >>> stream
    1010000000
    >>> stream.write("AB", str)
    >>> stream
    10100000000100000101000010
    >>> stream.read(bool, 2)
    [True, False]
    >>> stream
    100000000100000101000010
    >>> stream.read(int8, 1)
    array([-128], dtype=int8)
    >>> stream
    0100000101000010
    >>> stream.read(str, 2)
    'AB'
    >>> stream
    <BLANKLINE>


Readers and Writers
================================================================================

Bools
--------------------------------------------------------------------------------

Write single bits to a bitstream with the arguments `True` and `False`:

    >>> stream = BitStream()
    >>> stream.write(False, bool)
    >>> stream.write(True , bool)
    >>> stream
    01

Lists of booleans may be used too write multiple bits at once:

    >>> stream = BitStream()
    >>> stream.write([], bool)
    >>> stream
    <BLANKLINE>
    >>> stream.write([False], bool)
    >>> stream.write([True] , bool)
    >>> stream
    01
    >>> stream.write([False, True], bool)
    >>> stream
    0101

The second argument to the `write` method -- the type information -- can 
also be specified with the keyword argument `type`:

    >>> stream = BitStream()
    >>> stream.write(False, type=bool)
    >>> stream.write(True , type=bool)
    >>> stream
    01

For single bools or lists of bools, the type information is optional:

    >>> stream = BitStream()
    >>> stream.write(False)
    >>> stream.write(True)
    >>> stream.write([])
    >>> stream.write([False])
    >>> stream.write([True])
    >>> stream.write([False, True])
    >>> stream
    010101

Numpy `bool_` scalars or one-dimensional arrays can be used instead:

    >>> bool_
    <type 'numpy.bool_'>
    >>> stream = BitStream()
    >>> stream.write(bool_(False)  , bool)
    >>> stream.write(bool_(True)   , bool)
    >>> stream
    01

    >>> stream = BitStream()
    >>> empty = array([], dtype=bool)
    >>> stream.write(empty, bool)
    >>> stream
    <BLANKLINE>
    >>> stream.write(array([False]), bool)
    >>> stream.write(array([True]) , bool)
    >>> stream.write(array([False, True]), bool)
    >>> stream
    0101

For such data, the type information is also optional:

    >>> stream = BitStream()
    >>> stream.write(bool_(False))
    >>> stream.write(bool_(True))
    >>> stream.write(array([], dtype=bool))
    >>> stream.write(array([False]))
    >>> stream.write(array([True]))
    >>> stream.write(array([False, True]))
    >>> stream
    010101

Python and Numpy numeric types are also valid arguments: 
zero is considered false and nonzero numbers are considered true.


**Q:** Use a predicate instead (non-zero) ? and check iff ?

    >>> small_integers = range(0, 64)
    >>> stream = BitStream()
    >>> for integer in small_integers:
    ...     stream.write(integer, bool)
    >>> stream
    0111111111111111111111111111111111111111111111111111111111111111
    >>> stream = BitStream()
    >>> for integer in small_integers:
    ...     stream.write(-integer, bool)
    >>> stream
    0111111111111111111111111111111111111111111111111111111111111111

    >>> large_integers = [2**i for i in range(6, 64)]
    >>> stream = BitStream()
    >>> for integer in large_integers:
    ...     stream.write(integer, bool)
    >>> stream
    1111111111111111111111111111111111111111111111111111111111
    >>> stream = BitStream()
    >>> for integer in large_integers:
    ...     stream.write(-integer, bool)
    >>> stream
    1111111111111111111111111111111111111111111111111111111111

**TODO:** use iinfo(type).min/max

**TODO:** write `sample(type, r)` iterator.

    >>> def irange(start, stop, r=1.0):
    ...     i = 0
    ...     while i < stop:
    ...         yield i
    ...         i = max(i+1, int(i*r))

    >>> unsigned = [uint8, uint16, uint32]
    >>> for integer_type in unsigned:
    ...     _min, _max = iinfo(integer_type).min, iinfo(integer_type).max
    ...     for i in irange(_min, _max + 1, r=1.001):
    ...         integer = integer_type(i)
    ...         if integer and BitStream(integer, bool) != BitStream(True):
    ...             type_name = integer_type.__name__
    ...             print "Failure for {0}({1})".format(type_name, integer)





    >>> stream = BitStream()
    >>> stream.write(0.0, bool)
    >>> stream.write(1.0, bool)
    >>> stream.write(pi , bool)
    >>> stream.write(float64(0.0), bool)
    >>> stream.write(float64(1.0), bool)
    >>> stream.write(float64(pi) , bool)
    >>> stream
    011011

**TODO:** arrays of numeric type (non-bools), written as bools

-----

**TODO:** Mark all following behaviors as undefined ? Probably safer ...

Actually, any single data written as a bool, is conceptually cast into a bool 
first, with the semantics of the `bool` constructor.
List and one-dimensional numpy array arguments are considered holders of 
multiple data, each of which is converted to bool.
Any other sequence type (strings, tuples, etc.) is considered single data.

    >>> bool("")
    False
    >>> bool(" ")
    True
    >>> bool("A")
    True
    >>> bool("AAA")
    True

    >>> stream = BitStream()
    >>> stream.write("", bool)
    >>> stream.write(" ", bool)
    >>> stream.write("A", bool)
    >>> stream.write("AAA", bool)
    >>> stream
    0111
    >>> stream = BitStream()
    >>> stream.write(["", " " , "A", "AAA"], bool)
    >>> stream
    0111
    >>> stream = BitStream()
    >>> stream.write(array(["", " " , "A", "AAA"]), bool)
    >>> stream
    0111

    >>> stream = BitStream()
    >>> stream.write(    (), bool)
    >>> stream.write(  (0,), bool)
    >>> stream.write((0, 0), bool)
    >>> stream
    011

    >>> stream = BitStream()
    >>> stream.write([[], [0], [0, 0]], bool)
    >>> stream
    011

    >>> class BoolLike(object):
    ...     def __init__(self, value):
    ...         self.value = bool(value)
    ...     def __nonzero__(self):
    ...         return self.value
    >>> false = BoolLike(False)
    >>> true = BoolLike(True)
    >>> stream = BitStream()
    >>> stream.write(false, bool)
    >>> stream.write(true, bool)
    >>> stream.write([false, true], bool)
    >>> stream
    0101


TODO: 

  - direct call to `write_bool` (import the symbol first)
  - reader tests


Floating-Point Numbers
--------------------------------------------------------------------------------

    >>> import struct
    >>> struct.pack(">d", pi)
    '@\t!\xfbTD-\x18'

    >>> stream = BitStream()
    >>> stream.write(0.0)
    >>> stream.write([1.0, 2.0, 3.0])
    >>> stream.write(arange(4.0, 10.0))
    >>> len(stream)
    640
    >>> output = stream.read(float, 10)
    >>> type(output)
    <type 'numpy.ndarray'>
    >>> all(output == arange(10.0))
    True

    >>> BitStream(1.0) == BitStream(1.0, float) == BitStream(1.0, float64)
    True
    >>> BitStream(1.0) == BitStream([1.0]) == BitStream(ones(1))
    True

The byte order is big endian:
    
    >>> BitStream(struct.pack(">d", pi)) == BitStream(pi)
    True

Extra `BitStream` Methods
================================================================================


**TODO:**:

  - length

  - str, repr

  - _extend ? Make it public ? This is low-level ... but may be necesssary to
    implement new readers/writers. Don't specify it now, as we don't specify
    the offsets / stream state, let the user only rely on the high-level 
    methods.

  - copy

  - hash, comparison.

User-Defined Readers and Writers
================================================================================

    >>> import bitstream

Definition and Registration of Writers and Readers
--------------------------------------------------------------------------------

Let's define a writer for the binary representation of natural numbers:

    >>> def write_integer(stream, data):
    ...     if isinstance(data, list):
    ...         for integer in data:
    ...             write_integer(stream, integer)
    ...     else:
    ...         integer = int(data)
    ...         if integer < 0:
    ...             error = "negative integers cannot be encoded"
    ...             raise ValueError(error)
    ...         bools = []
    ...         while integer:
    ...             bools.append(integer & 1)
    ...             integer = integer >> 1
    ...         bools.reverse()
    ...         stream.write(bools, bool)

We can check that this writer behaves as expected:

    >>> stream = BitStream()
    >>> write_integer(stream, 42)
    >>> stream
    101010
    >>> write_integer(stream, [1, 2, 3])
    >>> stream
    10101011011

Then, we can associate it to the type `int`:

    >>> bitstream.register(int, writer=write_integer)

After this step, `BitStream` will redirect all data of type `int` to this writer:

    >>> BitStream(42)
    101010
    >>> BitStream([1, 2, 3])
    11011

If the type information is explicit, other kind of data can use this writer too:

    >>> BitStream(uint8(42), int)
    101010
    >>> BitStream("42", int)
    101010

A possible implementation of the corresponding reader is given by:

    >>> def read_integer(stream, n=None):
    ...     if n is not None:
    ...         error = "unsupported argument n"
    ...         raise NotImplementedError(error)
    ...     else:
    ...         integer = 0
    ...         for _ in range(len(stream)):
    ...             integer = integer << 1
    ...             if stream.read(bool):
    ...                 integer += 1
    ...     return integer

    >>> read_integer(BitStream(42))
    42

Once this reader is registered with

    >>> bitstream.register(int, reader=read_integer)

the calls to `read_integer` can be made through the `read` method of `BitStream`.

    >>> BitStream(42).read(int)
    42

In all readers, the second argument of readers, named `n`, 
represents the number of values to read from the stream. 
Here, this argument is not supported, instead any call to this reader 
interprets the complete stream content as a single value.

Writer and Reader Factories
--------------------------------------------------------------------------------

We actually had a legitimate reason not to support the number of values argument 
in the binary representation reader. Indeed, when the binary representation 
is used to code sequence of integers instead of a single integer, it becomes 
ambiguous: the same bitstream may represent several sequences of integers. 
For example, we have:

    >>> BitStream(255)
    11111111
    >>> BitStream([15, 15])
    11111111
    >>> BitStream([3, 7, 3, 1])
    11111111
    >>> BitStream([3, 3, 3, 3])
    11111111

We sometimes say that this code is not *self-delimiting*, because there is 
no way to know the boundary between the bits coding different integers. 

For natural numbers with known bounds, we may solve this problem by setting
a number of bits to be used for each integer. However, to do that, we
would have to define and register a new writer for every possible number
of bits. Instead, we register a single but configurable writer, defined
by a writer factory.

Let's define a type tag `uint` whose instances hold a number of bits:

    >>> class uint(object):
    ...     def __init__(self, num_bits):
    ...         self.num_bits = num_bits

Then, we define a factory that given a `uint` instance, 
returns a stream writer:

    >>> def write_uint_factory(instance):
    ...     num_bits = instance.num_bits
    ...     def write_uint(stream, data):
    ...         if isinstance(data, list):
    ...             for integer in data:
    ...                 write_uint(stream, integer)
    ...         else:
    ...             integer = int(data)
    ...             if integer < 0:
    ...                 error = "negative integers cannot be encoded"
    ...                 raise ValueError(error)
    ...             bools = []
    ...             for _ in range(num_bits):
    ...                 bools.append(integer & 1)
    ...                 integer = integer >> 1
    ...             bools.reverse()
    ...             stream.write(bools, bool)
    ...     return write_uint

Finally, we register this writer factory with `bitstream`:

    >>> bitstream.register(uint, writer=write_uint_factory)

To select a writer, we use the proper instance of type tag:

    >>> BitStream(255, uint(8))
    11111111
    >>> BitStream(255, uint(16))
    0000000011111111
    >>> BitStream(42, uint(8))
    00101010
    >>> BitStream(0, uint(16))
    0000000000000000


**TODO: reader, give details, comment.**

    >>> def read_uint_factory(instance): # use the name factory ?
    ...     num_bits = instance.num_bits
    ...     def read_uint(stream, n=None):
    ...         if n is None:
    ...             integer = 0
    ...             for _ in range(num_bits):
    ...                 integer = integer << 1
    ...                 if stream.read(bool):
    ...                     integer += 1
    ...             return integer
    ...         else:
    ...             integers = [read_uint(stream) for _ in range(n)]
    ...             return integers
    ...     return read_uint

    >>> bitstream.register(uint, reader=read_uint_factory)

    >>> stream = BitStream([0, 1, 2, 3, 4], uint(8))
    >>> stream.read(uint(8))
    0
    >>> stream.read(uint(8), 1)
    [1]
    >>> stream.read(uint(8), 3)
    [2, 3, 4]

