=======================================================
  Nabu: Installing and Configuring a Publisher Server
=======================================================

.. contents::

   
Introduction
============

This document describes issues in setting up and configuring a Nabu server
handler.


Authentication
==============

The current server requires the clients to authenticate themselves.  That is,
when calling methods on the server handler, a username must be supplied.  This
can be most easily implemented using the HTTP Authentication mechanism well
supported by web servers.


XML-RPC Server
==============

The server handler code is generic, but the client expects to be talking to an
XML-RPC server.  There are various way to implement this, depending on your web
application framework.  The simplest way is probably to create a CGI script to
call the server handler.  There is a convenience method in the server code that
does this.  See code below for the CGI script.


Server Configuration
====================

When creating the server handler, you must decide which extractors you will want
to run after processing.  The server expects a list of pairs, each of which
consisting in the extractor class to instantiate, and an instance of a storage
object used by that extractor to store the information it collects from the
document.


Example
-------

Here is an example CGI script that

#. creates a connection to a data base;
#. finds the current user relying on the HTTP Auth mechanism;
#. creates a per-user source storage in a database;
#. creates and configures the server to run 3 extractors (using the convenience
   function in the server module).

::

  # stdlib imports
  import sys, os
  from os.path import dirname, join
  
  # add the nabu libraries to load path of our CGI script
  root = dirname(dirname(sys.argv[0]))
  sys.path.append(join(root, 'lib', 'python'))
  
  # dbapi import.
  import psycopg2

  # nabu imports
  from nabu import server, sources
  from nabu.extractors import document, link, contact
  
  def main():
      # connect to the PostgreSQL database
      params = {
          'database': 'nabu',
          'user': 'admin',
          'passwd': 'admpassw',
          'host': 'localhost',
      }
      module, conn = psycopg2, psycopg2.connect(**params)

      # make sure we're authenticated
      username = os.environ.get('REMOTE_USER', None)
      if username is None:
          print 'Content-type:', 'text/plain'
          print 'Status: 404 Document Not Found.'
          print
          return
      
      # create a storage space for the uploaded source data
      src_pp = sources.DBSourceStorage(module, conn)
      src = sources.PerUserSourceStorageProxy(src_pp)
  
      # configure this server with whatever transforms we want to apply
      transforms = (
          (document.Extractor, document.Storage(module, conn)),
          (link.Extractor, link.Storage(module, conn)),
          (contact.Extractor, contact.Storage(module, conn)),
          )
  
      handler = server.create_server(src, transforms, username, allow_reset=1)
      server.xmlrpc_handle_cgi(handler)
      
  if __name__ == '__main__':
      main()


Per-user Document Sets
======================

The simple DBSourceStorage uploads database shares the unique ids between all
users.  This means that users can see and erase each other's content ids in the
database.  Clearing the database clears all documents, including other users'.
This is a reasonable arrangment if the data is pushed from a common body of text
files, perhaps controlled with a source-code management system.

There is also a proxy sources storage class provided, which prepends the
usernames to the unique ids stored in the database.  This effectively creates a
separate document set for each user.  This is an interesting setup if users are
expected to contribute disjoint document sets to the pool of information in the
Nabu database.  However, this has important consequences if the users are
pushing documents from a shared body of source texts: those documents will be
stored twice in the database.  You need to choose the policy which matches your
intended use.


Viewing the Contents
====================

To view or debug the contents sent to the database, you can view the contents
via a CGI script that is provided with the source distribution.  This is only
meant as a debugging interface, you should create a custom presentation layer
any other way you like.


