replica.py - DirectoryStorage Replication Tool
==============================================

Usage::

  replica.py [options] [user@]MasterHost:MasterDirectory ReplicaDirectory

  MasterHost, MasterUser
      User and host name for ssh to connect to the host that runs the
      replication master.

  MasterDirectory
      Data directory on the replication master.

  ReplicaDirectory
      Data directory on the local replica.

Options are:

  -v
      Make it more verbose.

  -q
      Make it less verbose.

  -d directory
      The DirectoryStorage source installation directory on the remote
      machine. This is only needed if different to the installation
      directory on the local machine.


Operation
---------

This tool can be used to safely, efficiently and robustly update a
local replica with the differences between a remote master storage and
the local replica. Replication is efficient because it uses the normal
storage history information to determine which files ned to be copied.
Replication is robust because the replication event is atomic, and
aligned with transaction boundaries on the master storage. Replication
is safe because the tool performs several checks to eliminate the most
common replication errors.

There must not be any storage process (ZEO server, etc) using the
local replica storage during the replication event. This means that
replication is suitable for maintaining a backup or cold standby
storage. If you want to have a hot standby, you will need to ensure
the the replica storage is shut down before starting the replica, and
restarted again after.


Exit Code
---------

This tool will exit with a zero status code if and only if it has
operated correctly.


Installation
------------

1. First get your master DirectoryStorage working as you want it.

2. Set up a ``config/snapshot.conf`` file on the master.  The
   replication tool needs to force the master storage briefly into
   snapshot mode.

3. This tool uses ssh. Set up ssh such that the user which runs the
   storage on the replica machine can log in to the user account of
   the user which runs the storage on the master, with the correct
   PYTHONPATH environment variable. Setting up ssh is outside the
   scope of this document.

4. Create an initial copy of the storage from the master onto the
   replica. Note that it doesnt matter if this copy is not current.
   There is no standard way to do this; just restore a backup, or
   shutdown the master storage and scp the whole directory.

5. As a first test, run the replica.py command on the replica machine,
   substituting your host and directory names. Within a few seconds it
   should say ``"Replica complete"``.  The -v switch may be helpful if
   there is a problem.

6. That command needs to be run regularly to ensure that the replica
   is kept up to date. cron is a good way to do this.  cron normally
   directs any ouput into an email. If you are replicating every night
   then this may be what you want, but it would be too much if you are
   replicating every minute. Adding the -q switch will ensure the
   replication process is silent unless there is a problem. Note that
   you will be notified that replication has failed while the master
   storage is in snapshot made, being packed or backed up.

7. You may also need to adjust your strategy for packing the master
   storage. replica.py uses the normal storage history to determine
   which files need to be replicated, therefore your packing always
   needs to keep enough history to cover back to the previous
   replication event.

8. Note that replication requires that the only difference between the
   two storages is that the master contains some newer transactions
   that are not present on the replica. If you test your replica by
   starting a storage process, it is prudent to use read-only mode to
   ensure that no transactions are written on the replica during that
   test.
