Database Engines {@name=dbengine}
============================

A database engine is a subclass of `sqlalchemy.sql.Engine`, and is the starting point for where SQLAlchemy provides a layer of abstraction on top of the various DBAPI2 database modules.  For all databases supported by SA, there is a specific "implementation" module, found in the `sqlalchemy.databases` package, that provides all the objects an `Engine` needs in order to perform its job.  A typical user of SQLAlchemy never needs to deal with these modules directly.  For many purposes, the only knowledge that's needed is how to create an Engine for a particular connection URL.  When dealing with direct execution of SQL statements, one would also be aware of Result, Connection, and Transaction objects.  The primary public facing objects are:

* **URL** - represents the identifier for a particular database.  URL objects are usually created automatically based on a given connect string passed to the `create_engine()` function.
* **Engine** - Combines a connection-providing resource with implementation-provided objects that know how to generate, execute, and gather information about SQL statements.  It also provides the primary interface by which Connections are obtained, as well as a context for constructed SQL objects and schema constructs to "implicitly execute" themselves, which is an optional feature of SA 0.2.  The Engine object that is normally dealt with is an instance of `sqlalchemy.engine.base.ComposedSQLEngine`.
* **Connection** - represents a connection to the database.  The underlying connection object returned by a DBAPI's connect() method is referenced internally by the Connection object.  Connection provides methods that handle the execution of SQLAlchemy's own SQL constructs, as well as literal string-based statements.  
* **Transaction** - represents a transaction on a single Connection.  Includes `begin()`, `commit()` and `rollback()` methods that support basic "nestable" behavior, meaning an outermost transaction is maintained against multiple nested calls to begin/commit.
* **ResultProxy** - Represents the results of an execution, and is most analgous to the cursor object in DBAPI.  It primarily allows iteration over result sets, but also provides an interface to information about inserts/updates/deletes, such as the count of rows affected, last inserted IDs, etc.
* **RowProxy** -  Represents a single row returned by the fetchone() method on ResultProxy.

Underneath the public-facing API of `ComposedSQLEngine`, several components are provided by database implementations to provide the full behavior, including:

* **Dialect** - this object is provided by database implementations to describe the behavior of a particular database.  It acts as a repository for metadata about a database's characteristics, and provides factory methods for other objects that deal with generating SQL strings and objects that handle some of the details of statement execution.  
* **ConnectionProvider** - this object knows how to return a DBAPI connection object.  It typically talks to a connection pool which maintains one or more connections in memory for quick re-use.
* **ExecutionContext** - this object is created for each execution of a single SQL statement, and tracks information about its execution such as primary keys inserted, the total count of rows affected, etc.  It also may implement any special logic that various DBAPI implementations may require before or after a statement execution.
* **Compiler** - receives SQL expression objects and assembles them into strings that are suitable for direct execution, as well as collecting bind parameters into a dictionary or list to be sent along with the statement.
* **SchemaGenerator** - receives collections of Schema objects and knows how to generate the appropriate SQL for `CREATE` and `DROP` statements.

### Supported Databases {@name=supported}

Engines exist for SQLite, Postgres, MySQL, and Oracle, using the Pysqlite, Psycopg2 (Psycopg1 will work to some degree but its typing model is not supported...install Psycopg2!), MySQLDB, and cx_Oracle modules.  There is also preliminary support for MS-SQL using adodbapi or pymssql, as well as Firebird.   For each engine, a distinct Python module exists in the `sqlalchemy.databases` package, which provides implementations of some of the objects mentioned in the previous section.

Downloads for each DBAPI at the time of this writing are as follows:

* Postgres:  [psycopg2](http://www.initd.org/tracker/psycopg)
* SQLite:  [pysqlite](http://initd.org/tracker/pysqlite)
* MySQL:   [MySQLDB](http://sourceforge.net/projects/mysql-python)
* Oracle:  [cx_Oracle](http://www.cxtools.net/default.aspx?nav=home)
* MS-SQL:  [adodbapi](http://adodbapi.sourceforge.net/)  [pymssql](http://pymssql.sourceforge.net/)
* Firebird:  [kinterbasdb](http://kinterbasdb.sourceforge.net/)

### Establishing a Database Engine {@name=establishing}

SQLAlchemy 0.2 indicates the source of an Engine strictly via [RFC-1738](http://rfc.net/rfc1738.html) style URLs, combined with optional keyword arguments to specify options for the Engine.  The form of the URL is:

    $ driver://username:password@host:port/database

Available drivernames are `sqlite`, `mysql`, `postgres`, `oracle`, `mssql`, and `firebird`.  For sqlite, the database name is the filename to connect to, or the special name ":memory:" which indicates an in-memory database.  The URL is typically sent as a string to the `create_engine()` function:

    {python}
    pg_db = create_engine('postgres://scott:tiger@localhost:5432/mydatabase')
    sqlite_db = create_engine('sqlite:///mydb.txt')
    mysql_db = create_engine('mysql://localhost/foo')

    # oracle via TNS name
    oracle_db = create_engine('oracle://scott:tiger@dsn')

    # oracle will feed host/port/SID into cx_oracle.makedsn
    oracle_db = create_engine('oracle://scott:tiger@127.0.0.1:1521/sidname')

The `Engine` will create its first connection to the database when a SQL statement is executed.  As concurrent statements are executed, the underlying connection pool will grow to a default size of five connections, and will allow a default "overflow" of ten.   Since the `Engine` is essentially "home base" for the connection pool, it follows that you should keep a single `Engine` per database established within an application, rather than creating a new one for each connection.

### Database Engine Options {@name=options}

Keyword options can also be specified to `create_engine()`, following the string URL as follows:

    {python}
    db = create_engine('postgres://...', encoding='latin1', echo=True, module=psycopg1)

Options that can be specified include the following:

* strategy='plain' : the Strategy describes the general configuration used to create this Engine.  The two available values are `plain`, which is the default, and `threadlocal`, which applies a "thread-local context" to implicit executions performed by the Engine.  This context is further described in [dbengine_connections_context](rel:dbengine_connections_context).
* pool=None : an instance of `sqlalchemy.pool.Pool` to be used as the underlying source for connections, overriding the engine's connect arguments (pooling is described in [pooling](rel:pooling)).  If None, a default `Pool` (usually `QueuePool`, or `SingletonThreadPool` in the case of SQLite) will be created using the engine's connect arguments.

Example:

    {python}
    from sqlalchemy import *
    import sqlalchemy.pool as pool
    import MySQLdb
    
    def getconn():
        return MySQLdb.connect(user='ed', dbname='mydb')
    
    engine = create_engine('mysql', pool=pool.QueuePool(getconn, pool_size=20, max_overflow=40))

* pool_size=5 : the number of connections to keep open inside the connection pool.  This is only used with `QueuePool`.
* max_overflow=10 : the number of connections to allow in "overflow", that is connections that can be opened above and beyond the initial five.  this is only used with `QueuePool`.
* pool_timeout=30 : number of seconds to wait before giving up on getting a connection from the pool.  This is only used with `QueuePool`.
* echo=False : if True, the Engine will log all statements as well as a repr() of their parameter lists to the engines logger, which defaults to sys.stdout.  The `echo` attribute of `ComposedSQLEngine` can be modified at any time to turn logging on and off.  If set to the string `"debug"`, result rows will be printed to the standard output as well.
* logger=None : a file-like object where logging output can be sent, if echo is set to True.  Newlines will not be sent with log messages.  This defaults to an internal logging object which references `sys.stdout`.
* module=None : used by database implementations which support multiple DBAPI modules, this is a reference to a DBAPI2 module to be used instead of the engine's default module.  For Postgres, the default is psycopg2, or psycopg1 if 2 cannot be found.  For Oracle, its cx_Oracle.
* use_ansi=True : used only by Oracle;  when False, the Oracle driver attempts to support a particular "quirk" of Oracle versions 8 and previous, that the LEFT OUTER JOIN SQL syntax is not supported, and the "Oracle join" syntax of using `&lt;column1&gt;(+)=&lt;column2&gt;` must be used in order to achieve a LEFT OUTER JOIN.  
* threaded=True : used by cx_Oracle; sets the `threaded` parameter of the connection indicating thread-safe usage.  cx_Oracle docs indicate setting this flag to `False` will speed performance by 10-15%.  While this defaults to `False` in cx_Oracle, SQLAlchemy defaults it to `True`, preferring stability over early optimization.
* use_oids=False : used only by Postgres, will enable the column name "oid" as the object ID column, which is also used for the default sort order of tables.  Postgres as of 8.1 has object IDs disabled by default.
* convert_unicode=False : if set to True, all String/character based types will convert Unicode values to raw byte values going into the database, and all raw byte values to Python Unicode coming out in result sets.  This is an engine-wide method to provide unicode across the board.  For unicode conversion on a column-by-column level, use the `Unicode` column type instead.
* encoding='utf-8' : the encoding to use for all Unicode translations, both by engine-wide unicode conversion as well as the `Unicode` type object.

### Using Connections {@name=connections}

In this section we describe the SQL execution interface available from an `Engine` instance.  Note that when using the Object Relational Mapper (ORM) as well as when dealing with with "bound" metadata objects (described later), SQLAlchemy deals with the Engine for you and you generally don't need to know much about it; in those cases, you can skip this section and go to [metadata](rel:metadata).

The Engine provides a `connect()` method which returns a `Connection` object.  This object provides methods by which literal SQL text as well as SQL clause constructs can be compiled and executed.  

    {python}
    engine = create_engine('sqlite:///:memory:')
    connection = engine.connect()
    result = connection.execute("select * from mytable where col1=:col1", col1=5)
    for row in result:
        print row['col1'], row['col2']
    connection.close()

The `close` method on `Connection` does not actually remove the underlying connection to the database, but rather indicates that the underlying resources can be returned to the connection pool.  When using the `connect()` method, the DBAPI connection referenced by the `Connection` object is not referenced anywhere else. 


In both execution styles above, the `Connection` object will also automatically return its resources to the connection pool when the object is garbage collected, i.e. its `__del__()` method is called.  When using the standard C implementation of Python, this method is usually called immediately as soon as the object is dereferenced.  With other Python implementations such as Jython, this is not so guaranteed.  
    
The execute method on `Engine` and `Connection` can also receive SQL clause constructs as well, which are described in [sql](rel:sql):

    {python}
    connection = engine.connect()
    result = connection.execute(select([table1], table1.c.col1==5))
    for row in result:
        print row['col1'], row['col2']
    connection.close()

Both `Connection` and `Engine` fulfill an interface known as `Connectable` which specifies common functionality between the two objects, such as getting a `Connection` and executing queries.  Therefore, most SQLAlchemy functions which take an `Engine` as a parameter with which to execute SQL will also accept a `Connection`:

    {python title="Specify Engine or Connection"}
    engine = create_engine('sqlite:///:memory:')
    
    # specify some Table metadata
    metadata = MetaData()
    table = Table('sometable', metadata, Column('col1', Integer))
    
    # create the table with the Engine
    table.create(engine=engine)
    
    # drop the table with a Connection off the Engine
    connection = engine.connect()
    table.drop(engine=connection)

#### Implicit Connection Contexts {@name=context}

An **implicit connection** refers to connections that are allocated by the `Engine` internally.  There are two general cases when this occurs:  when using the various `execute()` methods that are available off the `Engine` object itself, and when calling the `execute()` method on constructed SQL objects, which are described in [sqlconstruction](rel:sqlconstruction).

    {python title="Implicit Connection"}
    engine = create_engine('sqlite:///:memory:')
    result = engine.execute("select * from mytable where col1=:col1", col1=5)
    for row in result:
        print row['col1'], row['col2']
    result.close()

When using implicit connections, the returned `ResultProxy` has a `close()` method which will return the resources used by the underlying `Connection`.   

The `strategy` keyword argument to `create_engine()` affects the algorithm used to retreive the underlying DBAPI connection used by implicit executions.  When set to `plain`, each implicit execution requests a unique connection from the connection pool, which is returned to the pool when the resulting `ResultProxy` falls out of scope (i.e. `__del__()` is called) or its `close()` method is called.  If a second implicit execution occurs while the `ResultProxy` from the previous execution is still open, then a second connection is pulled from the pool.

When `strategy` is set to `threadlocal`, the `Engine` still checks out a connection which is closeable in the same manner via the `ResultProxy`, except the connection it checks out will be the **same** connection as one which is already checked out, assuming the operation is in the same thread.  When all `ResultProxy` objects are closed, the connection is returned to the pool normally.

It is crucial to note that the `plain` and `threadlocal` contexts **do not impact the connect() method on the Engine.**  `connect()` always returns a unique connection.  Implicit connections use a different method off of `Engine` for their operations called `contextual_connect()`.

The `plain` strategy is better suited to an application that insures the explicit releasing of the resources used by each execution.  This is because each execution uses its own distinct connection resource, and as those resources remain open, multiple connections can be checked out from the pool quickly.  Since the connection pool will block further requests when too many connections have been checked out, not keeping track of this can impact an application's stability.

    {python title="Plain Strategy"}
    db = create_engine('mysql://localhost/test', strategy='plain')
    
    # execute one statement and receive results.  r1 now references a DBAPI connection resource.
    r1 = db.execute("select * from table1")
    
    # execute a second statement and receive results.  r2 now references a *second* DBAPI connection resource.
    r2 = db.execute("select * from table2")
    for row in r1:
        ...
    for row in r2:
        ...
    # release connection 1
    r1.close()
    
    # release connection 2
    r2.close()

Advantages to `plain` include that connection resources are immediately returned to the connection pool, without any reliance upon the `__del__()` method; there is no chance of resources being left around by a Python implementation that doesn't necessarily call `__del__()` immediately. 

The `threadlocal` strategy is better suited to a programming style which relies upon the `__del__()` method of Connection objects in order to return them to the connection pool, rather than explicitly issuing a `close()` statement upon the `ResultProxy` object.   This is because all of the executions within a single thread will share the same connection, if one has already been checked out in the current thread.  Using this style, an application will use only one connection per thread at most within the scope of all implicit executions.

    {python title="Threadlocal Strategy"}
    db = create_engine('mysql://localhost/test', strategy='threadlocal')
    
    # execute one statement and receive results.  r1 now references a DBAPI connection resource.
    r1 = db.execute("select * from table1")
    
    # execute a second statement and receive results.  r2 now references the *same* resource as r1
    r2 = db.execute("select * from table2")
    
    for row in r1:
        ...
    for row in r2:
        ...
    # dereference r1.  the connection is still held by r2.
    r1 = None
    
    # dereference r2.  with no more references to the underlying connection resources, they
    # are returned to the pool.
    r2 = None

Advantages to `threadlocal` include that resources can be left to clean up after themselves, application code can be more minimal, its guaranteed that only one connection is used per thread, and there is no chance of a "connection pool block", which is when an execution hangs because the current thread has already checked out all remaining resources.

To get at the actual `Connection` object which is used by implicit executions, call the `contextual_connection()` method on `Engine`:

    {python title="Contextual Connection"}
    # threadlocal strategy
    db = create_engine('mysql://localhost/test', strategy='threadlocal')
    
    conn1 = db.contextual_connection()
    conn2 = db.contextual_connection()

    >>> assert conn1.connection is conn2.connection
    True

When the `plain` strategy is used, the `contextual_connection()` method is synonymous with the `connect()` method; both return a distinct connection from the pool.

### Transactions {@name=transactions}

The `Connection` object provides a `begin()` method which returns a `Transaction` object.  This object is usually used within a try/except clause so that it is guaranteed to `rollback()` or `commit()`:

    {python}
    trans = connection.begin()
    try:
        r1 = connection.execute(table1.select())
        connection.execute(table1.insert(), col1=7, col2='this is some data')
        trans.commit()
    except:
        trans.rollback()
        raise
    
The `Transaction` object also handles "nested" behavior by keeping track of the outermost begin/commit pair.  In this example, two functions both issue a transaction on a Connection, but only the outermost Transaction object actually takes effect when it is committed.

    {python}
    # method_a starts a transaction and calls method_b
    def method_a(connection):
        trans = connection.begin() # open a transaction
        try:
            method_b(connection)
            trans.commit()  # transaction is committed here
        except:
            trans.rollback() # this rolls back the transaction unconditionally
            raise

    # method_b also starts a transaction
    def method_b(connection):
        trans = connection.begin() # open a transaction - this runs in the context of method_a's transaction
        try:
            connection.execute("insert into mytable values ('bat', 'lala')")
            connection.execute(mytable.insert(), col1='bat', col2='lala')
            trans.commit()  # transaction is not committed yet
        except:
            trans.rollback() # this rolls back the transaction unconditionally
            raise
        
    # open a Connection and call method_a
    conn = engine.connect()                
    method_a(conn)
    conn.close()
            
Above, `method_a` is called first, which calls `connection.begin()`.  Then it calls `method_b`. When `method_b` calls `connection.begin()`, it just increments a counter that is decremented when it calls `commit()`.  If either `method_a` or `method_b` calls `rollback()`, the whole transaction is rolled back.  The transaction is not committed until `method_a` calls the `commit()` method.
       
Note that SQLAlchemy's Object Relational Mapper also provides a way to control transaction scope at a higher level; this is described in [unitofwork_transaction](rel:unitofwork_transaction).

