execsql.py
Run a SQL query script against an Access database

execsql.py is a Python module and program that allows you to apply a SQL query
script stored in a text file to an Access database. The output of the last SQL
query in the script can be automatically saved in a CSV file or routed to--and
then viewed in--an Excel worksheet.


Contents
======================

  Installation
  Capabilities
  Syntax and Options
  Notes
  Requirements
  Support for CREATE TEMPORARY QUERY DDL
  Using Script Files
  A Few Comments on SQL used with Jet and DAO
  Example
  Copyright and License


Installation
======================

Because the execsql.py program is specifically intended to work with Microsoft
Access files, installation on Windows is assumed.  You can install the program
either manually or using Python's installation tools.


Manual Installation
--------------------

Because the functionality of the execsql program is contained in a single
Python script, you can simply extract the execsql.py script from the zipped
package file, and put the script wherever you want.  This may be in the
standard location for Python scripts, or in any other location from which you
want to execute it.


Automated Installation
-----------------------

The automated installation process uses Python's standard setup tool to place
the program in the standard location for third-party scripts (e.g., 
C:\Python\Scripts or C:\Program Files\Python\Scripts).  Follow these steps to 
perform an automated installation:

1. Unzip the package file.  The file can be unzipped in a temporary location,
   such as C:\Temp.  The contents of the package will be placed in a
   subdirectory named for the program and release version.
2. Open a command window in the subdirectory containing the unzipped files and
   type the following command at the prompt:

		python setup.py install

If you want to install to a non-standard location, instructions for a
customized installation can be found at
http://docs.python.org/install/index.html#inst-alt-install.



Capabilities
======================

You can use the execsql program to:

  * Execute a complex query script that selects and summarizes data (see the
    example), and capture the output in a comma-separated-value (CSV) file.
    The CSV file can then be used as input to other software, including
    statistical, spreadsheet, and database applications.
  * Execute a query script that selects and summarizes data, and view the
    results directly in Excel.
  * Execute SQL Data Definition Language (DDL) and Data Manipulation Language
    (DML) commands, stored in a text file, to create database tables and alter
    data in the database. Using execsql and other tools such as batch files
    and the Windows task scheduler, these operations can be set up so that
    they occur automatically without the need for any interactive use of Access.
  * Execute "CREATE QUERY..." and "CREATE TEMPORARY QUERY..." DDL statements,
    which are not natively supported by Access. This feature allows you to
    maintain a standard set of queries, or task-specific queries, in a text
    file and easily add them to any Access database. Support for
    "CREATE TEMPORARY QUERY..." DDL is also important to allow non-trivial data
    summarization scripts; use of this feature is described in more detail below.


Syntax and Options
======================

execsql.py <sqlfile> <Access_db> [output_spec]

Arguments:
  <sqlfile>     The name of a text file of SQL commands to be executed.
                Required argument.
  <Access_db>   The name of the Access database against which to run the SQL.
                Required argument.
  [output_spec] The name of a CSV file to which the output of the last SQL
                command should be written, or the literal string "Excel" to
                send output to a Microsoft Excel workbook in memory.


Notes
======================

  * If the program is run without any arguments it will print a help message
    simiar to the usage description above.
  * Script files can contain comments, which are identified by two dashes
    ("--") at the start of a line.
  * With the exception of the "CREATE TEMPORARY QUERY..." statement, the
    execsql program does not parse or interpret SQL syntax in any way. The
    program recognizes a SQL statement as consisting of any sequence of
    non-comment lines that ends with a line ending with a semicolon. SQL
    syntax used in the script must conform to that recognized by the Jet
    database engine.
  * Only the results of the final query in the script will be saved as CSV or
    routed to Excel. If the final query includes the
    "CREATE TEMPORARY QUERY..." prefix, then no output will be produced
    regardless of any output specification that is provided on the command
    line. If the final query is a DDL command, or a DML command other than a
    SELECT, UNION, or TRANSFORM statement, the output may not be produced, or
    may be empty or meaningless.
  * The program *will not* automatically add the ".csv" file extension to the
    output file name given. Specifying an output file name with a different
    extension (e.g., ".xls") will not alter the format of the output from CSV.
  * If Excel output is desired, the output specification must be literally
   "Excel"--proper case matters. If, for example, the output specification is
    given as "excel", then a CSV file named "excel" will be created on disk.
    If query output is sent to Excel, the Excel workbook that is created will
    be activated (i.e., made visible) after the query script is complete. The
    Excel workbook will not be automatically saved as a file. If the Excel
    output format is used, bear in mind that Excel has row and column
    limitations, whereas CSV files do not.
  * The COM library for Excel is very slow. Unless only a very small amount of
    query output is expected, creating a CSV file and then opening it in Excel
    will generally be faster than routing the data directly to Excel in memory.
  * Scripts that use temporary queries will result in those queries being
    created in the Access database, and then removed, every time the scripts
    are run. This can lead to a gradual increase in the size of the Access
    database file. Access' "Compact and Repair" tool can be run to reclaim the
    space that is added to the .mdb file when temporary queries are created.


Requirements
======================

The execsql program used the Common Object Model (COM) to communicate with
Access and Excel. Python does not include a standard library that supports
COM, and so execsql uses Mark Hammond's pywin32 extension for Windows to
provide COM support. A version of pywin32 compatible with your version of
Python must be installed so that execsql can be used.


Support for CREATE TEMPORARY QUERY DDL
=======================================

The syntax of the "CREATE TEMPORARY QUERY" DDL supported by execsql is:

  CREATE [TEMP[ORARY]] QUERY|VIEW <query_name> AS <sql_command>

The "TEMPORARY" specification is optional: if it is included, the query will
be deleted after the entire script has been executed, and if it is not, the
query will remain defined in the database after the script completes. If a
query of the same name is already defined in the Access database when the
script runs, the existing query will be deleted before the new one is
created--no check is performed to determine whether the new and old queries
have the same definition, and no warning is issued by execsql that a query
definition has been replaced.

The keyword "VIEW" can be used in place of the keyword "QUERY". This
alternative provides compatibility with the "CREATE TEMPORARY VIEW" command
in PostgreSQL, and minimizes the need to edit any scripts that are intended
to be run against both Access and PostgreSQL databases.


Using Script Files
======================

Using script files to store and execute task-specific queries has a number of
advantages over using separate individual queries within the Access database
itself:

  * Maintenance and management of queries is easier and more reliable. Because
    all of the SQL needed for a specific data summarization task is kept
    together in one text file, there is little or no risk that one of the
    invididual queries needed to complete a specific task will be either
    deleted or altered. The clutter of queries that can accumulate in an
    Access database can be substantially reduced or eliminated.
  * Creation of a set of queries for a new task is much easier: the script
    file can be simply copied and edited. Search and replace operations can be
    used to easily and reliably make changes throughout the entire set of
    queries that is needed for a particular task.
  * Complete documentation can (and should!) be included in the script files,
    so that the purpose, assumptions, limitations, and history of changes can
    be easily reviewed by anybody who might consider using or modifying the
    query script.
  * The query script can be more easily preserved to document the way in which
    data were selected or summarized. For example, a query script that is used
    for a particular analysis may be kept in a directory with other files
    associated with that analysis. Query scripts can be easily archived or
    backed up independently of the database. Script files can be made
    read-only so that they cannot easily or accidentally be modified after the
    script for a particular task has been finalized.
  * The ability of execsql to automatically generate CSV or Excel output
    reduces the amount of time that might otherwise be required to
    interactively open the database, run the appropriate query (not to mention
    verifying that the query, or any queries that it depends on, have not been
    altered), and export the result. If the query output will be further
    processed or used in another scriptable application (e.g., to produce
    graphics or statistics using R), the execsql program can be combined with
    other programs in a batch file to further automate the data summarization
    and analysis process.
  * If a database must be maintained in two different formats (e.g., in 
    PostgreSQL for ordinary use, but downloaded to Access for use when a
    network connection is not available), one script file can potentially be
    used to carry out exactly the same data selection and summarization
    operations on both formats of the database.


A Few Comments on SQL used with Jet and DAO
============================================

The version of SQL that is used by the Jet engine and Microsoft's Data Access
Objects (DAO) methods, and thus that must be used in the script files executed
with execsql, is generally equivalent to that used within Access itself, but
is not identical, and is also not the same in all respects as standard SQL;
it also differs in some respects from the SQL that can be used with Access via
ODBC (which is more standards-compliant). A few points to bear in mind when
creating SQL scripts for use with Access:

  * The Jet engine can fail to correctly parse multi-table JOIN expressions.
    In these cases you will need to give it some help by parenthesizing parts
    of the JOIN expression. This means that you have some responsibility for
    constructing optimized (or at least acceptably good) SQL.
  * Visual Basic for Applications (VBA) functions that you might use in Access
    are not all available via DAO. Sometimes these can be worked around with
    slightlier lengthier code. For example, the 'Nz()' function is not
    available, but it can be replaced with an expression such as 'Iif([Column]
    is null, 0, [Column])'.
  * Expressions that should produce a floating-point result ('Double')
    sometimes do not, with the output being truncated or rounded to an integer.
    A workaround is to multiply and then divide the expression by the same
    floating-point number; for example:
    '1.00000001 * <expression> / 1.00000001'.
  * The wildcard character used with the LIKE expression must be "*" rather
    than "%"--this is a case in which the COM interface corresponds to Access
    rather than to the SQL standard or the ODBC interface.
  * The BEGIN TRANSACTION statement (and COMMIT, END, and ROLLBACK statements)
    are not supported as individual SQL statements.


Example
======================

This example illustrates a script that makes use of several temporary queries
to select and summarize data, and a final query that prepares the data for
export or further use. This example is intended to be run against an 
EQuIS database.

-- --------------------------------------------------------------------
-- Get result records that meet specific selection criteria.
-- -------------------------------------------------------------------- 
create temporary view v_seldata as
select
  smp.sys_sample_code,
  rs.test_surrogate_key,
  rs.cas_rn,
  tst.lab_anl_method_name,
  iif(rs.detect_flag='N',
  rs.method_detection_limit,
  rs.result_value) as conc,
  rs.detect_flag='Y' as detected,
  rs.lab_qualifiers like '*J*' as estimated,
  iif(rs.detect_flag='N',
  rs.detection_limit_unit,
  rs.result_unit) as unit
from
  ((((dt_result as rs
  inner join dt_test as tst on tst.test_surrogate_key=rs.test_surrogate_key)
  inner join dt_sample as smp on smp.sys_sample_code=tst.sys_sample_code)
  inner join dt_field_sample as fs on fs.sys_sample_code=smp.sys_sample_code)
  inner join dt_location as loc on loc.sys_loc_code=fs.sys_loc_code)
  inner join rt_analyte as anal on anal.cas_rn=rs.cas_rn
where
  (loc.loc_name like 'SG*' or loc.loc_name like 'SC*')
  and smp.sample_type_code='N'
  and smp.sample_matrix_code='SE'
  and anal.analyte_type in ('ABN', 'PEST', 'PCB', 'LPAH', 'HPAH')
  and rs.reportable_result='Yes'
  and not (rs.result_value is null
  and rs.method_detection_limit is null);
-- --------------------------------------------------------------------
-- Summarize by sample, taking nondetects at half the detection limit.
-- --------------------------------------------------------------------
create temporary view v_samp as
select
  sys_sample_code,
  cas_rn,
  lab_anl_method_name,
  Avg(iif(detected, conc, conc/2.0)) as concentration,
  Max(iif(detected is null, 0, detected)) as detect,
  Min(iif(estimated is null, 0, estimated)) as estimate,
  unit
from
  v_seldata
group by
  sys_sample_code,
  cas_rn,
  lab_anl_method_name,
  unit;
-- --------------------------------------------------------------------
-- Pull in sample location and date information, decode analyte,
-- and reconstruct qualifiers.
-- --------------------------------------------------------------------
select
  loc.loc_name,
  fs.sample_date,
  fs.start_depth,
  fs.end_depth,
  fs.depth_unit,
  smp.sample_name,
  anal.chemical_name,
  dat.lab_anl_method_name,
  iif(dat.detect, concentration, concentration/2.0) as conc,
  (iif(detect, "", "U") & iif(estimate, "J", "")) as qualifiers,
  unit
from
  (((v_samp as dat
  inner join dt_sample as smp on dat.sys_sample_code=smp.sys_sample_code)
  inner join dt_field_sample as fs on fs.sys_sample_code=smp.sys_sample_code)
  inner join dt_location as loc on loc.sys_loc_code=fs.sys_loc_code)
  inner join rt_analyte as anal on anal.cas_rn=dat.cas_rn;

During the execution of this script, the temporary queries will be created in the database. When the script concludes, the temporary queries will be removed. Nothing except the data itself need be kept in the database to use a script like this one.



Copyright and License
======================

Copyright (c) 2007-2009, R.Dreas Nielsen

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version. This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
details. The GNU General Public License is available at
http://www.gnu.org/licenses/.
