================================
Parsing regular path expressions
================================

Imporing the QueryParser
========================

You can import the QueryParser as follow:

    >>> from gocept.objectquery.pathexpressions import RPEQueryParser

Get a QueryParer:

    >>> p = RPEQueryParser()


Query for elements
==================

The Parser returns a data structure, which allows to recursively call the
Join-Functions with two parameters. Here a simple example, where we join the
element *foo* with the element *bar*:

    >>> p.parse('foo/bar')
    ['EEJOIN', ('ELEM', 'foo'), ('ELEM', 'bar')]

It is also possible, to find only root elements:

    >>> p.parse('/foo')
    ['EEJOIN', None, ('ELEM', 'foo')]

But trailing path separators are **not** allowed:

    >>> p.parse('foo/')
    Traceback (most recent call last):
    ...
    SyntaxError: ...

More Pathjoins work like a tree, where the leaves are the elements:

    >>> p.parse('foo/bar/bou')
    ['EEJOIN', ['EEJOIN', ('ELEM', 'foo'), ('ELEM', 'bar')], ('ELEM', 'bou')]


Elements may contains alphanumeric characters, the underscore `_` or a dot `.`:

    >>> p.parse('/foo_bar')
    ['EEJOIN', None, ('ELEM', 'foo_bar')]

    >>> p.parse('/foo.bar')
    ['EEJOIN', None, ('ELEM', 'foo.bar')]

    >>> p.parse('/foo&bar')
    Traceback (most recent call last):
    ...
    SyntaxError: ...


Query for occurrences
=====================

Occurrence works the same way: it's an KCJOIN with the occurrence parameter
(+,?,*) and an element:

    >>> p.parse('/foo?')
    ['EEJOIN', None, ['KCJOIN', '?', ('ELEM', 'foo')]]

    >>> p.parse('foo*/bar')
    ['EEJOIN', ['KCJOIN', '*', ('ELEM', 'foo')], ('ELEM', 'bar')]

A special case is the combination of the all-operator with a wildcard. It
matches for any number of some kind of element (which means in this special
case *give me all bar elements from anywhere in the graph*:

    >>> p.parse('/_*/bar')
    ['EEJOIN', None, '/_*/', ('ELEM', 'bar')]


Query for attribute values
==========================

Attributes are parsed the same way: an ATTR-Function with ID and VALUE as
parameters:

    >>> p.parse('foo/bar*[@a="b"]')
    ['EEJOIN', ('ELEM', 'foo'), ['EAJOIN', ['ATTR', ('a', 'b', '=')], ['KCJOIN', '*', ('ELEM', 'bar')]]]

Attribute vales may also be empty:

    >>> p.parse('/foo[@A=""]')
    ['EEJOIN', None, ['EAJOIN', ['ATTR', ('A', '', '=')], ('ELEM', 'foo')]]

Possible compare operators are =, ==, <, >, <=, >= and !=:

    >>> p.parse('/foo[@a=="b"]')
    ['EEJOIN', None, ['EAJOIN', ['ATTR', ('a', 'b', '==')], ('ELEM', 'foo')]]

    >>> p.parse('/foo[@a>"b"]')
    ['EEJOIN', None, ['EAJOIN', ['ATTR', ('a', 'b', '>')], ('ELEM', 'foo')]]

    >>> p.parse('/foo[@a<"b"]')
    ['EEJOIN', None, ['EAJOIN', ['ATTR', ('a', 'b', '<')], ('ELEM', 'foo')]]

    >>> p.parse('/foo[@a>="b"]')
    ['EEJOIN', None, ['EAJOIN', ['ATTR', ('a', 'b', '>=')], ('ELEM', 'foo')]]

    >>> p.parse('/foo[@a<="b"]')
    ['EEJOIN', None, ['EAJOIN', ['ATTR', ('a', 'b', '<=')], ('ELEM', 'foo')]]

    >>> p.parse('/foo[@a!="b"]')
    ['EEJOIN', None, ['EAJOIN', ['ATTR', ('a', 'b', '!=')], ('ELEM', 'foo')]]

Possible chars inside a string argument are alphanumeric, dot, unterline,
minus, plus and comma:

    >>> p.parse('SingleSource/bla[@reverse_name="a1._-+,"]')
    ['EEJOIN', ('ELEM', 'SingleSource'), ['EAJOIN', ['ATTR', ('reverse_name', 'a1._-+,', '=')], ('ELEM', 'bla')]]

Querying for integer or float values is also possible:

    >>> p.parse('/foo[@a=12]')
    ['EEJOIN', None, ['EAJOIN', ['ATTR', ('a', 12, '=')], ('ELEM', 'foo')]]

    >>> p.parse('/foo[@a=12.0]')
    ['EEJOIN', None, ['EAJOIN', ['ATTR', ('a', 12.0, '=')], ('ELEM', 'foo')]]

But there is a rounding problem with floats (TODO):

    >>> p.parse('/foo[@a=12.7]')
    ['EEJOIN', None, ['EAJOIN', ['ATTR', ('a', 12.699999999999999, '=')], ('ELEM', 'foo')]]


Further queries
===============

To show the mightiness of the parser, here a more complex example:

    >>> p.parse('/foo[@a!="b"]/_*/bar+[@a2="b2 b3"]/fou')
    ['EEJOIN', ['EEJOIN', ['EEJOIN', None, ['EAJOIN', ['ATTR', ('a', 'b', '!=')], ('ELEM', 'foo')]], '/_*/', ['EAJOIN', ['ATTR', ('a2', 'b2 b3', '=')], ['KCJOIN', '+', ('ELEM', 'bar')]]], ('ELEM', 'fou')]

You may also provide precedence by using round brackets and UNIONs. Here a
simple example:

    >>> p.parse('(foo/bar)|boo')
    ['UNION', ['PREC', ['EEJOIN', ('ELEM', 'foo'), ('ELEM', 'bar')]], ('ELEM', 'boo')]

And here the same example with changed precedence:

    >>> p.parse('foo|(bar/boo)')
    ['UNION', ('ELEM', 'foo'), ['PREC', ['EEJOIN', ('ELEM', 'bar'), ('ELEM', 'boo')]]]

A Union of two EEJOINS

    >>> p.parse('(foo/bar)|(boo/far)')
    ['UNION', ['PREC', ['EEJOIN', ('ELEM', 'foo'), ('ELEM', 'bar')]], ['PREC', ['EEJOIN', ('ELEM', 'boo'), ('ELEM', 'far')]]]

A really big example with everything possible included:

    >>> p.parse('(E1/E2)*/E3/((E4[@A="v"])|(E5/_*/E6))')
    ['EEJOIN', ['EEJOIN', ['KCJOIN', '*', ['PREC', ['EEJOIN', ('ELEM', 'E1'), ('ELEM', 'E2')]]], ('ELEM', 'E3')], ['PREC', ['UNION', ['PREC', ['EAJOIN', ['ATTR', ('A', 'v', '=')], ('ELEM', 'E4')]], ['PREC', ['EEJOIN', ('ELEM', 'E5'), '/_*/', ('ELEM', 'E6')]]]]]


Some queries with a wrong syntax
================================

Empty querys are not allowed:

    >>> p.parse('')
    Traceback (most recent call last):
    ...
    SyntaxError: ...

Do not use only a path separator:

    >>> p.parse('/')
    Traceback (most recent call last):
    ...
    SyntaxError: ...

Double path separators are also not allowed:

    >>> p.parse('//foo')
    Traceback (most recent call last):
    ...
    SyntaxError: ...

Another example of a non-complete query:

    >>> p.parse('foo[a="v"]')
    Traceback (most recent call last):
    ...
    SyntaxError: ...

Opening brackets must be closed:

    >>> p.parse('((abc/def)|(ghi/jkl)')
    Traceback (most recent call last):
    ...
    SyntaxError: ...

Union needs two parameters at all:

    >>> p.parse('(foo)|')
    Traceback (most recent call last):
    ...
    SyntaxError: ...

Empty brackets are not allowed:

    >>> p.parse('()')
    Traceback (most recent call last):
    ...
    SyntaxError: ...

Quoted strings in attribute values must be quoted correctly:

    >>> p.parse('foo/bar[@a="b\"c de"]')
    Traceback (most recent call last):
    ...
    SyntaxError: ...

    >>> p.parse('foo/bar[@a="b"c de"]')
    Traceback (most recent call last):
    ...
    SyntaxError: ...
