Metadata-Version: 1.1
Name: rdbms-subsetter
Version: 0.2.0
Summary: Generate consistent subset of an RDBMS
Home-page: https://github.com/18f/https://github.com/18F/rdbms-subsetter
Author: Catherine Devlin
Author-email: catherine.devlin@gsa.gov
License: CC0
Description: rdbms-subsetter
        ===============
        
        .. image:: https://travis-ci.org/18F/rdbms-subsetter.svg?branch=master
           :target: https://travis-ci.org/18F/rdbms-subsetter
        
        Generate a random sample of rows from a relational database that preserves
        referential integrity - so long as constraints are defined, all parent rows
        will exist for child rows.
        
        Good for creating test/development databases from production.  It's slow,
        but how often do you need to generate a test/development database?
        
        Usage::
        
            rdbms-subsetter <source SQLAlchemy connection string> <destination connection string> <fraction of rows to use>
        
        Example::
        
            rdbms-subsetter postgresql://:@/bigdb postgresql://:@/littledb 0.05
        
        Valid SQLAlchemy connection strings are described
        `here <docs.sqlalchemy.org/en/latest/core/engines.html#database-urls#database-urls>`_.
        
        ``rdbms-subsetter`` promises that each child row will have whatever parent rows are
        required by its foreign keys.  It will also *try* to include most child rows belonging
        to each parent row (up to the supplied ``--children`` parameter, default 3 each), but it
        can't make any promises.  (Demanding all children can lead to infinite propagation in
        thoroughly interlinked databases, as every child record demands new parent records,
        which demand new child records, which demand new parent records...
        so increase ``--children`` with caution.)
        
        When row numbers in your tables vary wildly (tens to billions, for example),
        consider using the ``-l`` flag, which reduces row counts by a logarithmic formula.  If ``f`` is
        the fraction specified, and ``-l`` is set, and the original table has ``n`` rows,
        then each new table's row target will be::
        
            math.pow(10, math.log10(n)*f)
        
        A fraction of ``0.5`` seems to produce good results, converting 10 rows to 3,
        1,000,000 to 1,000,000, and 1,000,000,000 to 31,622.
        
        Rows are selected randomly, but for tables with a single primary key column, you
        can force rdbms-subsetter to include specific rows (and their dependencies) with
        ``force=<tablename>:<primary key value>``.  The immediate children of these rows
        are also exempted from the ``--children`` limit.
        
        rdbms-subsetter only performs the INSERTS; it's your responsibility to set
        up the target database first, with its foreign key constraints.  The easiest
        way to do this is with your RDBMS's dump utility.  For example, for PostgreSQL,
        
        ::
        
            pg_dump --schema-only -f schemadump.sql bigdb
            createdb littledb
            psql -f schemadump.sql littledb
        
        Currently rdbms-subsetter takes no account of schema names and simply assumes all
        tables live in the same schema.  This will probably cause horrible errors if used
        against databases where foreign keys span schemas.
        
        Installing
        ----------
        
        ::
        
            pip install rdbms-subsetter
        
        Then the DB-API2 module for your RDBMS; for example, for PostgreSQL,
        
        ::
        
            pip install psycopg2
        
        Memory
        ------
        
        Will consume memory roughly equal to the size of the *extracted* database.
        (Not the size of the *source* database!)
        
        Development
        -----------
        
        https://github.com/18F/rdbms-subsetter
        
        See also
        --------
        
        * `Jailer <http://jailer.sourceforge.net/home.htm>`_
        
        
        rdbms-subsetter
        ===============
        
        .. image:: https://travis-ci.org/18F/rdbms-subsetter.svg?branch=master
           :target: https://travis-ci.org/18F/rdbms-subsetter
        
        Generate a random sample of rows from a relational database that preserves
        referential integrity - so long as constraints are defined, all parent rows
        will exist for child rows.
        
        Good for creating test/development databases from production.  It's slow,
        but how often do you need to generate a test/development database?
        
        Usage::
        
            rdbms-subsetter <source SQLAlchemy connection string> <destination connection string> <fraction of rows to use>
        
        Example::
        
            rdbms-subsetter postgresql://:@/bigdb postgresql://:@/littledb 0.05
        
        Valid SQLAlchemy connection strings are described
        `here <docs.sqlalchemy.org/en/latest/core/engines.html#database-urls#database-urls>`_.
        
        ``rdbms-subsetter`` promises that each child row will have whatever parent rows are
        required by its foreign keys.  It will also *try* to include most child rows belonging
        to each parent row (up to the supplied ``--children`` parameter, default 3 each), but it
        can't make any promises.  (Demanding all children can lead to infinite propagation in
        thoroughly interlinked databases, as every child record demands new parent records,
        which demand new child records, which demand new parent records...
        so increase ``--children`` with caution.)
        
        When row numbers in your tables vary wildly (tens to billions, for example),
        consider using the ``-l`` flag, which reduces row counts by a logarithmic formula.  If ``f`` is
        the fraction specified, and ``-l`` is set, and the original table has ``n`` rows,
        then each new table's row target will be::
        
            math.pow(10, math.log10(n)*f)
        
        A fraction of ``0.5`` seems to produce good results, converting 10 rows to 3,
        1,000,000 to 1,000,000, and 1,000,000,000 to 31,622.
        
        Rows are selected randomly, but for tables with a single primary key column, you
        can force rdbms-subsetter to include specific rows (and their dependencies) with
        ``force=<tablename>:<primary key value>``.  The immediate children of these rows
        are also exempted from the ``--children`` limit.
        
        rdbms-subsetter only performs the INSERTS; it's your responsibility to set
        up the target database first, with its foreign key constraints.  The easiest
        way to do this is with your RDBMS's dump utility.  For example, for PostgreSQL,
        
        ::
        
            pg_dump --schema-only -f schemadump.sql bigdb
            createdb littledb
            psql -f schemadump.sql littledb
        
        Currently rdbms-subsetter takes no account of schema names and simply assumes all
        tables live in the same schema.  This will probably cause horrible errors if used
        against databases where foreign keys span schemas.
        
        Installing
        ----------
        
        ::
        
            pip install rdbms-subsetter
        
        Then the DB-API2 module for your RDBMS; for example, for PostgreSQL,
        
        ::
        
            pip install psycopg2
        
        Memory
        ------
        
        Will consume memory roughly equal to the size of the *extracted* database.
        (Not the size of the *source* database!)
        
        Development
        -----------
        
        https://github.com/18F/rdbms-subsetter
        
        See also
        --------
        
        * `Jailer <http://jailer.sourceforge.net/home.htm>`_
        
Keywords: database testing
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: License :: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Testing
