DSE - Simplified "bulk" insert/update/delete for Django
=======================================================


Version : 2.0.0-BETA #9
Author : Thomas Weholt <thomas@weholt.org>
License : GPL v3.0
Status : BETA
Url : https://bitbucket.org/weholt/dse2
Docs at http://readthedocs.org/docs/dse/en/latest/index.html


Background
----------

NB! This is the first series of beta releases of DSE v.2.0.0 - a rewrite of DSE, now django only.
When the beta version 2.0.0 is tested some more it will replace the 1.x-branch.
It also contains some backwards incompatible changes, mainly renaming the add_items
method to just add. It also features caching of items to delete as well. All the
code not related to django has been removed as well, so it's now a django only project.

* DSE is available for one reason - to insert/update lots of data as fast as possible.

* DSE vs Django ORM: typical speed gain is around 10X. 

* DSE is aware of default values specified in your Django models and will use 
those if no value is given for a field in an insert statement.

* DSE caches SQL-statements, both inserts and updates, and executes them when 
a specified number of statements has been prepared or when it`s told manually to flush
cached statements to the database. The actual sql execution is done using 
DB API cursor.executemany and this is much faster than executing SQL-statements 
in sequence and way faster than using the Django ORM.

* DSE uses a dictionary to specify what fields to update. It differs between
update and insert statements by looking for a key similar to the primary key for
a given model.

* DSE will try to execute as many statements as possible, but will detect update
statements where only a few sql fields/columns are updated and execute those 
statements individually.

* My hope is to see something like this in the django core.


Installation
------------

pip install dse

or

hg clone https://bitbucket.org/weholt/dse2


Example usage
-------------

You got a model like::

    gender =  (('M', 'Male'), ('F', 'Female'))
    
    class Person(models.Model):
        name = models.CharField(max_length = 30)
        age = models.IntegerField(default = 30)
        sex = models.CharField(max_length = 1, choices = gender, default = 'M')

Using dse::

    import dse
    dse.patch_models() # Monkey patch all your models and expose dse for all models:

    with Person.dse as d:
        for name, age, sex in (('Thomas', 36, 'M'), ('Joe', 40, 'M'), ('Jane', 28, 'F')):
             d.add(dict(name = name, age = age, sex = sex))

Nothing will be inserted into the database before the loop is done ( or you 
insert 1000 items ). Then the items will be inserted using cursor.executemany, 
using plain SQL - no ORM in sight.

DSE features singletons (NB! very experimental, no locking/thread support yet!)::

    import dse.singleton
    
    p1 = dse.singleton.Models.Person()
    p2 = dse.singleton.Models.Person()
    print p1 is p2 # should print True
    p1.add(dict(name = 'Joe'))
    p2.flush()
    print Person.objects.all().count() # should print 1
    
Singletons makes it possible to cache entries across sections of code and cache 
even more data, hitting the db less.

DSE using default values defined in your model::
    
    with Person.dse as d:    
        # Adding an item, just defining a name and using the default values from the model:
        d.add({'name': 'John'})

        # Overriding the default values? Just specify a valid value
        d.add({'name': 'Thomas', 'age': 36, 'sex': 'M'})

        # Update record with id = 1 and set its name to John. This will trigger 
        # a SQL-statement for this update alone, since not all columns are specified:
        d.add({'id': 1, 'name': 'John'})
  
To use delayed execution of SQL statements::

    Person.dse.add({'id': 2, 'name': 'Al Capone'}) # will NOT trigger anything
    Person.dse.add({'id': 3, 'name': 'John Dillinger'}) # will NOT trigger anything
    Person.dse.add({'name': 'Scarface'}) # will NOT trigger anything
    Person.dse.flush() # will execute both update statements and insert a record for "Scarface"

Say you want to update all records with some calculated value, something you 
couldn`t find a way to do in SQL. Using dse this is easy and fast::

    with Person.dse as d:    
        # Use Djangos ORM to generate dictionaries to use in DSE; objects.all().values().
        for item in Person.objects.all().values():
            item['somevar'] = calculated_value
            d.add(item)

I`ve recieved some questions about transaction handling. Below is an simple example, 
but I`m looking into other ways of handling transactions as well::
 
    from django.db import transaction
    import dse

    @transaction.commit_manually
    def some_method():
        with SomeModel.dse as d:
            for item in somelist:
                SomeModel.dse.add(
                  {'some_column': item.some_value,
                   'another_column': item.another_value})    
        transaction.commit()

You can also cache items to delete::

    with foo.dse as d:
        for person in person.objects.all():
            if person.likes_perl:
                foo.dse.delete(person.id) # won't trigger anything
    # here all cached items for deletions are deleted using plain SQL, no orm.
        
DSE caches id's and deletes them when 1000 items are cached or flush/close are called.
It uses sql similar to "delete from tablename where id in (<list of ids>)".


Performance compared to the django orm using the django test framework
----------------------------------------------------------------------

Lots of updates:
DSE : 0.116728067398 seconds.
ORM : 15.0528171062 seconds.

Lots of inserts:
DSE: 0.0825960636139 seconds.
ORM: 0.830269098282 seconds.

Iterate of objects and updating them:
DSE: 0.817410230637 seconds.
ORM: 60.4543881416 seconds.

Delete lots of objects:
DSE: 0.495353937149 seconds.
ORM: 13.3724360466 seconds.

You can see the code for the test producing these numbers here:
https://bitbucket.org/weholt/dse2/src/dbd87b0e2cdb/tests/performance_tests.py

NB! These numbers do NOT represent real world performance, but merely show the overhead
of using the django orm compared to DSE. My latests tests shows DSE to be about 3 
times faster on inserts, it uses about 14% of the time compared to the orm when doing updates
and in some special cases DSE can speed up delete-operations also.


Release notes
-------------

2.0.0-BETA#9 : added FileExport-class to ease debugging what is processed during testing. Writes SQL-data to file. See source/testsuite for usage.

2.0.0-BETA#4 : started refactoring DSE to remove all non-django specific code, mostly to gain speed and simply code.

1.0.2 : reconnect if cursor is dead.

1.0.1 : fixed issue #9 "Factory can eat up memory" reported by vangheem. When finding fields related to a table only the top row is fetched.

1.0.0 : Version bump. Added unittest for issue #8.

1.0.0-RC1 : updated README.txt. 

0.9.4 : - PEP8 and pyflake.

0.9.3 : - Fixed issue #7: dse causes django-debug-toolbar to crash. Thanks to ringemup for pointing that out. Added some docstrings.

0.9.2 : - Corrected type in usage.rst and README.txt.

0.9.1 : - Refactored code even more, added usage.rst, singleton support in the singleton-package and some performance tests. Models not monkey patched be default anymore, must call dse.patch_models().

0.9.0 : - Refactored code and cleaned up tests folder. Focus on getting singleton support in before 1.0.0. And more tests.

0.8.2 : - added 'pysqlite2' to _DBMAP. Thanks to David Marble for 0.8.1 and 0.8.2.

0.8.1 : - attempt to fix quoting problems with fields on postgresql.

0.8.0 : - fixed crash when more than one database connection has been configured. No ModelFactory will be triggered.

0.7.0 : - don`t remember.

0.6.0 : - added support for the with-statement.
        - added an ModelDelayedExecutor-instance to each model, so you can do Model.dse.add_item
          instead of dse.ModelFactory.Model.add_item.
        - renamed dse.modelfactory to dse.ModelFactory to be more style-compliant.

0.5.1 : just some notes on transaction handling.

0.5.0 :
    - added modelfactory. Upon first import a modelfactory will be created in the DSE module. It`s basically just a
    helper-class containing ModelDelayedExecutor-instances for all models in all apps found in INSTALLED_APPS in
    settings.py.
    - to change the default item limit before automatic execution of cached SQL statements to 10000 instead of the default 1000::

    import dse
    dse.ITEM_LIMIT = 10000

0.4.0 :
    - fixed serious bug when using mass updates. Using cursor.executemany is only possible when values
    for all columns are specified. If only values for a subset of the columns is specified that will be
    executed as a seperate SQL-call. NOTE! Using dex.get_items() or Djangos Model.objects.values() will give you
    all the fields.
    - code clean-up.
    - added custom exceptions; UpdateManyException, UpdateOneException and InsertManyException.
