Notes
=====

Most users of the database will want to get acquainted with the
information in this section, especially before deployment.

Configuration
-------------

The included database implementation writes transactions to a single
file. Multiple processes may connect to the same file and share the
same database. No further configuration is required; the database uses
native file-locking to ensure exclusive write-access.

.. warning:: To avoid memory thrashing, limit the physical memory allowance of your Python processes and make sure there is enough virtual memory available (at least the size of your database) [#]_.

You may want to compile Python with the ``--without-pymalloc`` flag to
use native memory allocation. This may improve performance in
applications that connect to large databases due to better paging.

.. [#] On UNIX the ``ulimit`` command can be used limit physical memory
 usage; this prevents thrashing when working with large databases.

Motivation
----------

There are other object databases available for Python, most
importantly the ZODB from Zope Corporation (available under the
BSD-like ZPL license).

Notable differences:

- Dobbin is pure Python
- 1/20 the codebase
- Less overhead

The assumptions that Dobbin makes lead to a simple design; the case of
the ZODB is the exact opposite. Which is more reasonable comes down to
these assumptions.

Architecture
------------

Dobbin is designed to support multi-threaded applications without
significant memory overhead. It does this by keeping objects in a
shared (between all threads) state when possible, and only as little
thread-local state as is required by the concurrency model.

It does not, however, manage memory consumption in terms of data on
disk versus data in RAM. On startup, the entire database is loaded
into virtual memory and to this extent some data may be loaded
directly into the system page file. The operating system's memory
manager is expected to keep much-used objects readily available and
lesser-used objects available on request.

One assumption that lead to this decision is that the virtual memory
manager is able to write CPython-objects to disk while a Python-based
memory manager will need to serialize objects first. It is difficult
to implement a memory manager in Python and the complexity is certain
to cause a logic overhead which will hurt performance.

MVCC
----

Dobbin provides a MVCC concurrency model.

A transaction (tx1) begins; an object is checked out. Meanwhile a
second transaction (tx2) begins.

When tx1 is committed, the shared state of the object is updated, too,
although first tx2 gets assigned local changes to the object in which
said changes are reversed.

Such "reversing" changesets are applied for all active threads (which
have an active transaction that predates the commit).

If a third transaction is begun (again in a separate thread), it
simply uses the shared state of the object (having no local state of
its own).
