% Bitstream 1.0.0-alpha.4 Manual
% Sébastien Boisgérault
% Sat, 04 May 2013

This document is available under a [Creative Commons Attribution 3.0][CC-BY-3.0]
license. 

In its original text form, this manual is also an executable specification. 
Check that the code examples produce the expected results with

    $ python -m doctest -v spec.txt

Use [`pandoc`][pandoc] to convert it to other formats, such as ReST, PDF or HTML.

Follow Bitstream developments on [GitHub][] and [PyPI][].


[CC-BY-3.0]: http://creativecommons.org/licenses/by/3.0/
[markdown]: http://daringfireball.net/projects/markdown/
[pandoc]: http://johnmacfarlane.net/pandoc/
[GitHub]: https://github.com/boisgera/bitstream
[PyPI]: https://pypi.python.org/pypi/bitstream

-----------

What is Bitstream ?
================================================================================

Bitstream provides a binary data structure with a stream API for [Python][Python].

It is tightly integrated with the [NumPy][NumPy] library with readers 
and writers for its integer and floating-point number types (*but not all* !
Refer to NumPy *most common types* ?). 
Vectorization with NumPy arrays (*explain*).

User-defined readers and writers ... (and family / factory).

Implemented with [Cython][Cython], it should be fast enough for your use case.

[Python]: http://www.python.org/
[Cython]: http://www.cython.org

**TODO: describe key features: binary data structure, stream-like API, 
integration with NumPy arrays, integers and float data, user-defined 
readers/writers (factories) scheme, etc.**

Installation
================================================================================

Bitstream targets [Python 2.7][], you will need to install it first.

Then, several installation options are available:

  - **Easy install:** if the [pip][] package manager is available, execute the 
    following command as root:

        $ pip install bitstream

    The dependencies of Bitstream will be handled automatically.
    If you don't have root privileges, use [virtualenv][].

  - **Install from source:** the releases of Bitstream are available
    on the [Python Package Index (PyPi)][PyPi]. Once you have 
    downloaded and unpacked the archive, to build the Bitstream module, 
    you need [setuptools][] (or its fork [distribute][]).
    You also need to install the [NumPy][] package, version 1.6.1 or later. 
    Then, as root, execute

        $ python setup.py install

  - **Hack with git:** to experiment with the latest version of Bitstream, 
    clone the GitHub repository:

        $ git clone git://github.com/boisgera/bitstream.git

    To actually build the module, you will need everything you need to build
    from source and will execute the same command. If in addition, you want
    to edit the source files, you will also need the [Cython][] compiler, 
    version 0.15.1 or later and will execute instead:

        $ python setup.py install --with-cython


[Python 2.7]: http://www.python.org/download/releases/2.7
[Cython]: http://www.cython.org
[pip]: https://pypi.python.org/pypi/pip
[virtualenv]: https://pypi.python.org/pypi/virtualenv
[PyPi]: https://pypi.python.org/pypi/bitstream/
[setuptools]: https://pypi.python.org/pypi/setuptools
[distribute]: http://pythonhosted.org/distribute/
[NumPy]: http://www.numpy.org

Getting Started
================================================================================

...

Ante
================================================================================  
    
Most of the features of bitstream are available via the `BitStream` class.

    >>> from bitstream import BitStream

The module is tightly integrated with the [NumPy][NumPy] library. 
For convenience, we import all symbols from its top-level module.

    >>> from numpy import *

[NumPy]: http://www.numpy.org/


Overview of Bitstream Features
================================================================================ 

    >>> stream = BitStream()
    >>> stream
    <BLANKLINE>
    >>> stream.write(True, bool)
    >>> stream
    1
    >>> stream.write(False, bool)
    >>> stream
    10
    >>> stream.write(-128, int8)
    >>> stream
    1010000000
    >>> stream.write("AB", str)
    >>> stream
    10100000000100000101000010
    >>> stream.read(bool, 2)
    [True, False]
    >>> stream
    100000000100000101000010
    >>> stream.read(int8, 1)
    array([-128], dtype=int8)
    >>> stream
    0100000101000010
    >>> stream.read(str, 2)
    'AB'
    >>> stream
    <BLANKLINE>


Built-in Readers and Writers
================================================================================

Bools
--------------------------------------------------------------------------------

Write single bits to a bitstream with the arguments `True` and `False`:

    >>> stream = BitStream()
    >>> stream.write(False, bool)
    >>> stream.write(True , bool)
    >>> stream
    01

Lists of booleans may be used too write multiple bits at once:

    >>> stream = BitStream()
    >>> stream.write([], bool)
    >>> stream
    <BLANKLINE>
    >>> stream.write([False], bool)
    >>> stream.write([True] , bool)
    >>> stream
    01
    >>> stream.write([False, True], bool)
    >>> stream
    0101

The second argument to the `write` method -- the type information -- can 
also be specified with the keyword argument `type`:

    >>> stream = BitStream()
    >>> stream.write(False, type=bool)
    >>> stream.write(True , type=bool)
    >>> stream
    01

For single bools or lists of bools, the type information is optional:

    >>> stream = BitStream()
    >>> stream.write(False)
    >>> stream.write(True)
    >>> stream.write([])
    >>> stream.write([False])
    >>> stream.write([True])
    >>> stream.write([False, True])
    >>> stream
    010101

Numpy `bool_` scalars or one-dimensional arrays can be used instead:

    >>> bool_
    <type 'numpy.bool_'>
    >>> stream = BitStream()
    >>> stream.write(bool_(False)  , bool)
    >>> stream.write(bool_(True)   , bool)
    >>> stream
    01

    >>> stream = BitStream()
    >>> empty = array([], dtype=bool)
    >>> stream.write(empty, bool)
    >>> stream
    <BLANKLINE>
    >>> stream.write(array([False]), bool)
    >>> stream.write(array([True]) , bool)
    >>> stream.write(array([False, True]), bool)
    >>> stream
    0101

For such data, the type information is also optional:

    >>> stream = BitStream()
    >>> stream.write(bool_(False))
    >>> stream.write(bool_(True))
    >>> stream.write(array([], dtype=bool))
    >>> stream.write(array([False]))
    >>> stream.write(array([True]))
    >>> stream.write(array([False, True]))
    >>> stream
    010101

Python and Numpy numeric types are also valid arguments: 
zero is considered false and nonzero numbers are considered true.


**Q:** Use a predicate instead (non-zero) ? and check iff ?

    >>> small_integers = range(0, 64)
    >>> stream = BitStream()
    >>> for integer in small_integers:
    ...     stream.write(integer, bool)
    >>> stream
    0111111111111111111111111111111111111111111111111111111111111111
    >>> stream = BitStream()
    >>> for integer in small_integers:
    ...     stream.write(-integer, bool)
    >>> stream
    0111111111111111111111111111111111111111111111111111111111111111

    >>> large_integers = [2**i for i in range(6, 64)]
    >>> stream = BitStream()
    >>> for integer in large_integers:
    ...     stream.write(integer, bool)
    >>> stream
    1111111111111111111111111111111111111111111111111111111111
    >>> stream = BitStream()
    >>> for integer in large_integers:
    ...     stream.write(-integer, bool)
    >>> stream
    1111111111111111111111111111111111111111111111111111111111

**TODO:** use iinfo(type).min/max

**TODO:** write `sample(type, r)` iterator.

    >>> def irange(start, stop, r=1.0):
    ...     i = 0
    ...     while i < stop:
    ...         yield i
    ...         i = max(i+1, int(i*r))

    >>> unsigned = [uint8, uint16, uint32]
    >>> for integer_type in unsigned:
    ...     _min, _max = iinfo(integer_type).min, iinfo(integer_type).max
    ...     for i in irange(_min, _max + 1, r=1.001):
    ...         integer = integer_type(i)
    ...         if integer and BitStream(integer, bool) != BitStream(True):
    ...             type_name = integer_type.__name__
    ...             print "Failure for {0}({1})".format(type_name, integer)





    >>> stream = BitStream()
    >>> stream.write(0.0, bool)
    >>> stream.write(1.0, bool)
    >>> stream.write(pi , bool)
    >>> stream.write(float64(0.0), bool)
    >>> stream.write(float64(1.0), bool)
    >>> stream.write(float64(pi) , bool)
    >>> stream
    011011

**TODO:** arrays of numeric type (non-bools), written as bools

-----

**TODO:** Mark all following behaviors as undefined ? Probably safer ...

Actually, any single data written as a bool, is conceptually cast into a bool 
first, with the semantics of the `bool` constructor.
List and one-dimensional numpy array arguments are considered holders of 
multiple data, each of which is converted to bool.
Any other sequence type (strings, tuples, etc.) is considered single data.

    >>> bool("")
    False
    >>> bool(" ")
    True
    >>> bool("A")
    True
    >>> bool("AAA")
    True

    >>> stream = BitStream()
    >>> stream.write("", bool)
    >>> stream.write(" ", bool)
    >>> stream.write("A", bool)
    >>> stream.write("AAA", bool)
    >>> stream
    0111
    >>> stream = BitStream()
    >>> stream.write(["", " " , "A", "AAA"], bool)
    >>> stream
    0111
    >>> stream = BitStream()
    >>> stream.write(array(["", " " , "A", "AAA"]), bool)
    >>> stream
    0111

    >>> stream = BitStream()
    >>> stream.write(    (), bool)
    >>> stream.write(  (0,), bool)
    >>> stream.write((0, 0), bool)
    >>> stream
    011

    >>> stream = BitStream()
    >>> stream.write([[], [0], [0, 0]], bool)
    >>> stream
    011

    >>> class BoolLike(object):
    ...     def __init__(self, value):
    ...         self.value = bool(value)
    ...     def __nonzero__(self):
    ...         return self.value
    >>> false = BoolLike(False)
    >>> true = BoolLike(True)
    >>> stream = BitStream()
    >>> stream.write(false, bool)
    >>> stream.write(true, bool)
    >>> stream.write([false, true], bool)
    >>> stream
    0101


TODO: 

  - direct call to `write_bool` (import the symbol first)
  - reader tests


Integers
--------------------------------------------------------------------------------

**TODO**


Floating-Point Numbers
--------------------------------------------------------------------------------

    >>> import struct
    >>> struct.pack(">d", pi)
    '@\t!\xfbTD-\x18'

    >>> stream = BitStream()
    >>> stream.write(0.0)
    >>> stream.write([1.0, 2.0, 3.0])
    >>> stream.write(arange(4.0, 10.0))
    >>> len(stream)
    640
    >>> output = stream.read(float, 10)
    >>> type(output)
    <type 'numpy.ndarray'>
    >>> all(output == arange(10.0))
    True

    >>> BitStream(1.0) == BitStream(1.0, float) == BitStream(1.0, float64)
    True
    >>> BitStream(1.0) == BitStream([1.0]) == BitStream(ones(1))
    True

The byte order is big endian:
    
    >>> BitStream(struct.pack(">d", pi)) == BitStream(pi)
    True

Extra Methods
================================================================================


**TODO:**:

  - length

  - str, repr

  - _extend ? Make it public ? This is low-level ... but may be necesssary to
    implement new readers/writers. Don't specify it now, as we don't specify
    the offsets / stream state, let the user only rely on the high-level 
    methods.

  - copy

  - hash, comparison.

Custom Writers and Readers
================================================================================

    >>> import bitstream

Definition and Registration of Writers and Readers
--------------------------------------------------------------------------------

Let's define a writer for the binary representation of natural numbers:

    >>> def write_integer(stream, data):
    ...     if isinstance(data, list):
    ...         for integer in data:
    ...             write_integer(stream, integer)
    ...     else:
    ...         integer = int(data)
    ...         if integer < 0:
    ...             error = "negative integers cannot be encoded"
    ...             raise ValueError(error)
    ...         bools = []
    ...         while integer:
    ...             bools.append(integer & 1)
    ...             integer = integer >> 1
    ...         bools.reverse()
    ...         stream.write(bools, bool)

We can check that this writer behaves as expected:

    >>> stream = BitStream()
    >>> write_integer(stream, 42)
    >>> stream
    101010
    >>> write_integer(stream, [1, 2, 3])
    >>> stream
    10101011011

Then, we can associate it to the type `int`:

    >>> bitstream.register(int, writer=write_integer)

After this step, `BitStream` will redirect all data of type `int` to this writer:

    >>> BitStream(42)
    101010
    >>> BitStream([1, 2, 3])
    11011

If the type information is explicit, other kind of data can use this writer too:

    >>> BitStream(uint8(42), int)
    101010
    >>> BitStream("42", int)
    101010

A possible implementation of the corresponding reader is given by:

    >>> def read_integer(stream, n=None):
    ...     if n is not None:
    ...         error = "unsupported argument n"
    ...         raise NotImplementedError(error)
    ...     else:
    ...         integer = 0
    ...         for _ in range(len(stream)):
    ...             integer = integer << 1
    ...             if stream.read(bool):
    ...                 integer += 1
    ...     return integer

    >>> read_integer(BitStream(42))
    42

Once this reader is registered with

    >>> bitstream.register(int, reader=read_integer)

the calls to `read_integer` can be made through the `read` method of `BitStream`.

    >>> BitStream(42).read(int)
    42

In all readers, the second argument of readers, named `n`, 
represents the number of values to read from the stream. 
Here, this argument is not supported, instead any call to this reader 
interprets the complete stream content as a single value.

Writer and Reader Factories
--------------------------------------------------------------------------------

We actually had a legitimate reason not to support the number of values argument 
in the binary representation reader. Indeed, when the binary representation 
is used to code sequence of integers instead of a single integer, it becomes 
ambiguous: the same bitstream may represent several sequences of integers. 
For example, we have:

    >>> BitStream(255)
    11111111
    >>> BitStream([15, 15])
    11111111
    >>> BitStream([3, 7, 3, 1])
    11111111
    >>> BitStream([3, 3, 3, 3])
    11111111

We sometimes say that this code is not *self-delimiting*, because there is 
no way to know the boundary between the bits coding different integers. 

For natural numbers with known bounds, we may solve this problem by setting
a number of bits to be used for each integer. However, to do that, we
would have to define and register a new writer for every possible number
of bits. Instead, we register a single but configurable writer, defined
by a writer factory.

Let's define a type tag `uint` whose instances hold a number of bits:

    >>> class uint(object):
    ...     def __init__(self, num_bits):
    ...         self.num_bits = num_bits

Then, we define a factory that given a `uint` instance, 
returns a stream writer:

    >>> def write_uint_factory(instance):
    ...     num_bits = instance.num_bits
    ...     def write_uint(stream, data):
    ...         if isinstance(data, list):
    ...             for integer in data:
    ...                 write_uint(stream, integer)
    ...         else:
    ...             integer = int(data)
    ...             if integer < 0:
    ...                 error = "negative integers cannot be encoded"
    ...                 raise ValueError(error)
    ...             bools = []
    ...             for _ in range(num_bits):
    ...                 bools.append(integer & 1)
    ...                 integer = integer >> 1
    ...             bools.reverse()
    ...             stream.write(bools, bool)
    ...     return write_uint

Finally, we register this writer factory with `bitstream`:

    >>> bitstream.register(uint, writer=write_uint_factory)

To select a writer, we use the proper instance of type tag:

    >>> BitStream(255, uint(8))
    11111111
    >>> BitStream(255, uint(16))
    0000000011111111
    >>> BitStream(42, uint(8))
    00101010
    >>> BitStream(0, uint(16))
    0000000000000000


**TODO: reader, give details, comment.**

    >>> def read_uint_factory(instance): # use the name factory ?
    ...     num_bits = instance.num_bits
    ...     def read_uint(stream, n=None):
    ...         if n is None:
    ...             integer = 0
    ...             for _ in range(num_bits):
    ...                 integer = integer << 1
    ...                 if stream.read(bool):
    ...                     integer += 1
    ...             return integer
    ...         else:
    ...             integers = [read_uint(stream) for _ in range(n)]
    ...             return integers
    ...     return read_uint

    >>> bitstream.register(uint, reader=read_uint_factory)

    >>> stream = BitStream([0, 1, 2, 3, 4], uint(8))
    >>> stream.read(uint(8))
    0
    >>> stream.read(uint(8), 1)
    [1]
    >>> stream.read(uint(8), 3)
    [2, 3, 4]

Examples
================================================================================

Unary coder / Rice coder ? Huffman tree/table coder ?



