.. _parallel_db:

=======================
IPython's Task Database
=======================

The IPython Hub stores all task requests and results in a database. Currently supported backends
are: MongoDB, SQLite (the default), and an in-memory DictDB.  The most common use case for
this is clients requesting results for tasks they did not submit, via:

.. sourcecode:: ipython

    In [1]: rc.get_result(task_id)

However, since we have this DB backend, we provide a direct query method in the :class:`client`
for users who want deeper introspection into their task history. The :meth:`db_query` method of
the Client is modeled after MongoDB queries, so if you have used MongoDB it should look
familiar.  In fact, when the MongoDB backend is in use, the query is relayed directly.  However,
when using other backends, the interface is emulated and only a subset of queries is possible.

.. seealso::

    MongoDB query docs: http://www.mongodb.org/display/DOCS/Querying

:meth:`Client.db_query` takes a dictionary query object, with keys from the TaskRecord key list,
and values of either exact values to test, or MongoDB queries, which are dicts of The form:
``{'operator' : 'argument(s)'}``. There is also an optional `keys` argument, that specifies
which subset of keys should be retrieved. The default is to retrieve all keys excluding the
request and result buffers. :meth:`db_query` returns a list of TaskRecord dicts. Also like
MongoDB, the `msg_id` key will always be included, whether requested or not.

TaskRecord keys:

=============== =============== =============
Key             Type            Description
=============== =============== =============
msg_id          uuid(bytes)     The msg ID
header          dict            The request header
content         dict            The request content (likely empty)
buffers         list(bytes)     buffers containing serialized request objects
submitted       datetime        timestamp for time of submission (set by client)
client_uuid     uuid(bytes)     IDENT of client's socket
engine_uuid     uuid(bytes)     IDENT of engine's socket
started         datetime        time task began execution on engine
completed       datetime        time task finished execution (success or failure) on engine
resubmitted     datetime        time of resubmission (if applicable)
result_header   dict            header for result
result_content  dict            content for result
result_buffers  list(bytes)     buffers containing serialized request objects
queue           bytes           The name of the queue for the task ('mux' or 'task')
pyin            <unused>        Python input (unused)
pyout           <unused>        Python output (unused)
pyerr           <unused>        Python traceback (unused)
stdout          str             Stream of stdout data
stderr          str             Stream of stderr data

=============== =============== =============

MongoDB operators we emulate on all backends:

==========  =================
Operator    Python equivalent
==========  =================
  '$in'       in
  '$nin'      not in
  '$eq'       ==
  '$ne'       !=
  '$ge'       >
  '$gte'      >=
  '$le'       <
  '$lte'      <=
==========  =================


The DB Query is useful for two primary cases:

1. deep polling of task status or metadata
2. selecting a subset of tasks, on which to perform a later operation (e.g. wait on result, purge records, resubmit,...)

Example Queries
===============


To get all msg_ids that are not completed, only retrieving their ID and start time:

.. sourcecode:: ipython

    In [1]: incomplete = rc.db_query({'complete' : None}, keys=['msg_id', 'started'])

All jobs started in the last hour by me:

.. sourcecode:: ipython

    In [1]: from datetime import datetime, timedelta

    In [2]: hourago = datetime.now() - timedelta(1./24)

    In [3]: recent = rc.db_query({'started' : {'$gte' : hourago },
                                    'client_uuid' : rc.session.session})

All jobs started more than an hour ago, by clients *other than me*:

.. sourcecode:: ipython

    In [3]: recent = rc.db_query({'started' : {'$le' : hourago },
                                    'client_uuid' : {'$ne' : rc.session.session}})

Result headers for all jobs on engine 3 or 4:

.. sourcecode:: ipython

    In [1]: uuids = map(rc._engines.get, (3,4))

    In [2]: hist34 = rc.db_query({'engine_uuid' : {'$in' : uuids }, keys='result_header')
