===================================================
 The Problem is Binding URL Paths to Code and Data
===================================================

:Author: Martin Blais <blais@furius.ca>
:Date: 2005-04-24
:Abstract:

  Note about my understanding of RESTfulness and motivation.

Introduction
------------

Today I have realized something important about web applications while designing
a security mechanism for my framework.  I think I have understood the real value
of making a system RESTful.  Even after reading the design of Zope3, which is
making more and more sense now, I had not quite grasped the importance of this.
This document describes my understanding of the issue, with a specific example
that should motivate the desire for RESTfulness.


Stateful Design
---------------

The way my system is currently designed ties the notion of "current user" with
the meaning of a URL path on my server.  For example, when I access the
"/profile" URL, the results are different depending on which is the currently
logged in user.

An important problem with this approach lies in the definition of a flexible
system for permissions.  If, for example, I want to define a permission group
for an administrator to be able to access any user's /profile page, I would have
to create some special codes for the administrator page to access that
functionality.

It is obvious that a better system would be for each URL to represent a unique
resource, and for a permission system to simply restrict which resources are
available depending on the credentials of the currently logged in user.  That
is, to make the system RESTful.


RESTfulness is important
------------------------

REST stands for REpresentational State Tranfer.  The basic idea is that each URL
uniquely represents a resource in the system.  Basically, given a specific URL
(including its GET or POST arguments), you should obtain the same resource,
and/or trigger a specific state transfer that does not depend on
context/session.

You want to be able to separate authentication from session data.  Basically,
the less you can do with session-specific data, the better.


A Specific Example
------------------

In my case, the way I should serve user-specific pages should be::

   /users/<username>/profile

This way, even if a user other than <username> is logged in, and has permissions
to access this page, he can.  This is how administrator users should be able to
access user profiles.  They use the very same URLs/codes, and their credentials
allow them access to those resources.  Users != <username> simply don't have the
credentials to access those pages, but technically, if they could, the resource
URL would be the same.

This allows the creation of very flexible access policies.  These policies could
be created either by writing special code to accept or reject access to specific
resources, or by using a special regexp syntax (a special case of the former).

Another interesting consequence of defining access policies this way is that we
can decouple the implementation of the resources with the implementation of the
access mechanisms.  This makes the system much more flexible.


Requests are Function Calls
---------------------------

Essentially, each request maps onto a function call within our system.  Let's
examine the anatomy of a function call, as it is often represented in most
computer languages, in this case we will choose Python::

   def foo( bar1, bar2, bar3=def1, bar4=def2, *kwds )

A function has:

* a name, ``foo``
* some fixed number of formal parameters, ``bar1`` and ``bar2``
* some optional parameters, ``bar3`` and ``bar4``
* some extra unnamed parameters, ``*kwds``
* a return value (which can be packed as a tuple)

The difference between systems which implement the various RPC protocols and a
URL is simple: a URL is simply an encoding of the function call parameters.

How we map URLs to these functions/arguments is a centrally important feature of
a web framework.  

Any combination of the informations above, availalble in the URL itself, or the
POST values, are candidates for mapping into a function and into arguments.
Also, parts of the URL path can be used as arguments, and arguments could map
into a function name or influence it.


Objects
-------

If we think of "users" as objects in our system, we can think of including their
reference in the path itself, e.g.::

   /users/<username>/edit

This maps naturally on an hierarchical data structure.  This is the approach
that Zope takes: "folders" are containers for objects, which map directly onto
the ZoDB, which have methods associated to them (via the ugly ZCML), to perform
a specific method call on these objects.


Databases and Flexibility
-------------------------

Unfortunately, most efficient and well-tested database systems today exist in
the form of flat SQL databases, with indexes between their tables.  This is a
reality that must be dealt with, and requiring the user to use a specific kind
of storage is not a very flexible approach.  Often an interface needs to be
build around existing data storage systems.

We would like to design a more flexible system of mapping URLs to resources.


Notes
-----

In draco, currently::

  def foo( self, path, args ):

'path' is NEVER used in my web application. I don't see it becoming used in the
future.


Mapping URLs to Resources
-------------------------

One problem that occurs is when you want to include part of the parameters in
the request path itself.  For example, considering that we would want to make
our system RESTful, a natural approach would be to prefer::

   /users/<username>/profile

to ::

   /profile?user=username

So the question is, how do we map URLs to function calls?  We need to create a
scheme to be able to do that efficiently and flexibly.

Also note the possibility of mixed schemes::

   /users/<username>/profile?lang=<lang>


Global Arguments
~~~~~~~~~~~~~~~~

During the lookup, we should be able to recursively (?) process and remove some
input arguments.  For example, if we are to make the system purely RESTful, even
with regards to the current language being used, *all* the pages on a website
should be able to accept the 'lang' argument.  The effect of this argument is to
set a global variable for all string lookups, by changing the very definition of
the _() gettext function itself.  None of the functions called will want to
process the 'lang' input variable.


Kinds of Input
~~~~~~~~~~~~~~

Let's examine what we get as inputs to our system:

- a URL path, with /-separated components;
- query arguments (GET form)
- POST variables (POST form)

Our scheme must consider all of these when mapping to a specific function call.


Abstracting the Internal Organization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Another aspect of the mapping is that we may want to hide the internal
organization of our program codes.  Also, by providing a more flexible approach
we may open the door to future reorganization of the code, providing space for
growth as well as backwards compatibility.


Serving Files
~~~~~~~~~~~~~

How do we serve files?  We should be able to serve files normally.



Automated Argument Parsing
~~~~~~~~~~~~~~~~~~~~~~~~~~

Wouldn't it be great if the argument parsing was performed automatically?

For example, each resource could be represented as a class, with the following
interface::

   class Profile:

       form = ... <form definition>

       def validate( self, args ):
           ... returns an error, redirect, or success

       def handle( self, arg1, arg2, arg3 ):
           ... at this point we can expect the arguments to be legal


Possible Solutions
------------------

Recursive tree lookup
~~~~~~~~~~~~~~~~~~~~~

We could create a tree of resources, like is done in Twisted, where the process
of parsing the URL becomes a chain-of-responsibility by searching down the tree.

.. line-block::

  + we can alter per-component behaviour of the lookup
  - the hierarchy of URL components must match the hierarchy of objects/code

  ? we cannot combine query arguments in the code lookup, well, we could, if
    we provide them in the lookup function, e.g.

       def getChild( prepath, postpath, query_args )


Regular expressions
~~~~~~~~~~~~~~~~~~~

We could create a special-purpose set of rules to lookup code by using regular
expressions on the request URL combined with conditions on the query arguments.

.. line-block::

  + very flexible, we can combine query arguments in the code lookup
  + the hierarchy of components is entirely disconnected from the code
  - it might be a little slow due to the linear list of regular expressions to
    search

  
  



Privileges Mechanism
--------------------

Can we do like in Zope3? e.g.

  role.grant(resource)



Questions
---------

:Question: How do we identity a permission?  In other words, how do we bind a
	   permission to a resource?

:Question: How do we specify permissions for a particular user?

:Question: With the resource tree mapping URL/query to a function, how do we
	   simply serve files?



