                           pyleaf TUTORIAL
                           ===============


Table of Contents
=================
1 Tutorial ex1.py 
    1.1 Introduction 
    1.2 Loading the project 
    1.3 Producing resources 
    1.4 Ensuring consistency 
    1.5 Export and Publishing 


1 Tutorial ex1.py 
==================

1.1 Introduction 
-----------------
This is a short introductory tutorial meant to give an overview of the
main features of the Leaf system. The tutorial is based on the example
ex1.py included in the pyLeaf package.

1.2 Loading the project 
------------------------

Load the ex1.py source into a Python interactive shell. This can be
done from within the folder containing the file ex1.py with the
following command at a Python prompt:

>>> execfile('ex1.py')

NOTE: the "execfile" function has been removed starting with
Python3. For this example you can use "from ex1 import *" instead.

ex1.py is a very minimal example, performing a sum over many random
numbers in two different ways: with a for loop and with the Python sum
function. The two methods are timed in order to see which one is
faster. Here is the LGL protocol:

         / testFor -> report
genData <
         \ testSum -> @report -> exportRes[F]
;

genData generates random data. The data are passed to both testFor and
testSum. Both functions return the amount of time they spent
performing the sum. report compares the two results and send the
comparison score to exportRes, which in turn saves it to the disk. The
"[F]" flag is not mandatory, but useful to distinguish nodes producing
files on the disk.

ex1.py also contains a function named "prj" whose aim is to initialize
the pyleaf project associated with the above pipeline structure
(defined in the same file). The function returns a protocol object and
a project object. The protocol object is the main interface to the
pipeline management functions. The project object is a higher level
interface dealing with setting up the protocols (a project may include
more than one protocols, which is not the case for this example). In
order to initialize the project:

>>> p, pr = prj()
[L] Loading user module.
[L] Initializing protocol with root: leaf_ex1
[L] graph is new: building fingerprint.
[L] report is new: building fingerprint.
[L] genData is new: building fingerprint.
[L] testSum is new: building fingerprint.
[L] testFor is new: building fingerprint.
[L] exportRes is new: building fingerprint.

Lines starting with "[L]" contain Leaf messages. The second line shows
where the Leaf "protocol root" has been created: a sub-directory named
leaf_ex1 in the directory containing ex1.py. It will contain all
pyLeaf internal data about the project. The following 6 lines specify
that 5 objects have been created, 1 for each of the modules (python
functions) included in the project, plus 1 for the pipeline
structure. This information will be used to keep track of
changes. The directory leaf_ex1 contains pyLeaf memory: deleting it
corresponds to resetting the status of the project.

1.3 Producing resources 
------------------------

The main Leaf command in order to build a resource is "provide". The
provide command will check whether a resource has already been
produced or not and whether it is already in primary memory or stored
on the disk (dumped). If not, it will produce it according to the
pipeline structure. In order to request production of the resource
"testFor" (testFor is both the name of the node and the name of the
resource it produces) the following command is issued:

>>> x = p.provide(testFor)
[L] The following resources will need production: genData, testFor
[L] Running node: genData
[L] Dumping resource: genData
[L] Running node: testFor
[L] Dumping resource: testFor
[L] Done in: 00:00:11.28.

The resource testFor has not been provided before, so it will need
production. According to the protocol, this will also require the
production of genData. Leaf runs the appropriate nodes and dumps the
results. Now consider what happens if the resource testSum is
requested subsequently with the following code:

>>> y = o(testSum)
[L] The following resources will need production: testSum
[L] Running node: testSum
[L] Dumping resource: testSum
[L] Done in: 00:00:2.16.

The "o()" function ("output of") is an alias for "p.provide()". It is
defined within the ex1.py file for convenience. This time Leaf only
needed to produce testSum, since genData was already available.

The node "report" simply computes the ratio x/y. Bypassing Leaf
completely or in part, it can be computed, for example, in the
following two ways:

>>> x/y
5.326954330195621
>>> o(testFor)/o(testSum)
5.326954330195621

Note that the second instruction, like the first one, did not require
any computation. The provide method con also accept multiple requests
and return a list of the corresponding resources, allowing for the
following additional way of making the same computation:

>>> o([testFor, testSum])[1] / o([testFor, testSum])[2]
5.326954330195621

Finally, Leaf internally stores a time stamp and the computational
time for each node. This information can be requested to pyleaf
instead of computing it directly:

>>> p.time(testFor)
('Thu Dec 27 16:45:31 2012', 12.815258979797363)
>>> p.time(testFor)[2] / p.time(testSum)[2]
5.854536618211694

If Leaf can't find a resource in memory, it will check the disk for
previous dumps. The following instructions will delete testSum from
primary memory, list the status of all resources, produce the "report"
resource.

>>> p.clear(testSum)
[L] Clearing resource: testSum
>>> p.list()
report   NOT available  NOT dumped
genData  available      dumped
testSum  NOT available  dumped
testFor  available      dumped
exportRes        NOT available  NOT dumped
>>> o(report)
[L] The following resources will be loaded from disk: testSum
[L] The following resources will need production: report
[L] Running node: report
[L] Dumping resource: report
[L] Done in: 00:00:0.06.
5.326954330195621

Notice that the actual computational time is close to 0 seconds, while
the computational time required by testSum is around 2 seconds, as
previously computed.

It is possible to delete all resources from memory and from the disk
with the following instructions:

>>> p.clearall()
[L] Clearing resource: report
[L] Clearing resource: genData
[L] Clearing resource: testSum
[L] Clearing resource: testFor
>>> p.undumpall()
[L] Undumping resource: report
[L] Undumping resource: genData
[L] Undumping resource: testSum
[L] Undumping resource: testFor

It is also possible to request the production of all leaf nodes of the
pipeline as follows (which in this case is equivalent to "o(report)"
since the protocol has only one leaf node):

>>> p.run()
[L] The following resources will need production: report, genData, testSum, testFor, exportRes
[L] Running node: genData
[L] Dumping resource: genData
[L] The following nodes can run in parallel: testSum, testFor
[L] Running node: testFor
[L] Running node: testSum
[L] Dumping resource: testSum
[L] Dumping resource: testFor
[L] Running node: report
[L] Dumping resource: report
[L] Running node: exportRes
[L] Dumping resource: exportRes
[L] Done in: 00:00:11.82.
'reportOut.txt'

The final output, 'reportOut.txt' is what the leaf node exportRes
returns. Notice the 4th message, signaling that testSum and testFor
will be executed in parallel, which will improve performance on
multicore computers. Leaf automatically scans the pipeline in order to
detect nodes that can run in parallel. How much faster did the
computation go? The last Leaf message indicates that the overall
computational time has been around 12 seconds, which is more or less
the time required by testFor alone. Thus, the 5 seconds required by
testSum were saved thanks to parallel processing.

1.4 Ensuring consistency 
-------------------------

Leaf is aware of the source code used to produce each resource. When
code changes, pyleaf invalidates all dependant resources. In the
example ex1.py, try to change the code of the function testFor, for
example replacing the for loop:

    for i in range(0, len(data)):
        x = x + data[i]

with a while loop:

    i = 0
    while i < len(data):
        x = x + data[i]
        i = i + 1

Save the file. The following instruction (this time pr, the project
object, is used) will ask Leaf to look for changes in the source code:

>>> pr.update()
[L] Reloading user module.
[L] testFor has changed: updating.
[L] Resetting resource: testFor
[L] Resetting resource: report
[L] Resetting resource: exportRes

Leaf noticed a change in testFor and "untrusted" the node,
i.e. cleared and undumped testFor and all its descendants. The same
can be forced issuing the following instruction:

>>> p.untrust(testFor)

Leaf can also detect changes across sessions. Run the project again:

>>> p.run()
[L] The following resources will need production: report, testFor
[L] Running node: testFor
[L] Dumping resource: testFor
[L] Running node: report
[L] Dumping resource: report
[L] Done in: 00:00:12.60.
4.890758776530981

Close the Python shell and put back the for loop in testFor. Open a
Python shell and enter:

>>> execfile('ex1.py')
>>> p, pr = prj()
[L] Loading user module.
[L] Initializing protocol with root: leaf_ex1
[L] graph is dumped in leaf_ex1/graph.grp: loading it.
[L] report is dumped in leaf_ex1/report.res: loading it.
[L] genData is dumped in leaf_ex1/genData.res: loading it.
[L] testSum is dumped in leaf_ex1/testSum.res: loading it.
[L] testFor is dumped in leaf_ex1/testFor.res: loading it.
[L] exportRes is dumped in leaf_ex1/exportRes.res: loading it.
[L] report is dumped in leaf_ex1/report.mod: loading it.
[L] genData is dumped in leaf_ex1/genData.mod: loading it.
[L] testSum is dumped in leaf_ex1/testSum.mod: loading it.
[L] testFor is dumped in leaf_ex1/testFor.mod: loading it.
[L] exportRes is dumped in leaf_ex1/exportRes.mod: loading it.
[L] testFor has changed: updating.
[L] Resetting resource: testFor
[L] Resetting resource: report
[L] Resetting resource: exportRes

Both resources (.res files) and previous source code of node (or
"module", that's why .mod files) testFor are found changed and
untrusted.

Also the pipeline structure is monitored. This can be tested by
changing the lgl code as follows:

         / report
genData <
         \ testSum -> @report -> exportRes[F]
;

After saving the file, the Leaf system is made aware of the changes by
issuing:

>>> pr.update()
[L] Reloading user module.
[L] Inputs to report have changed, untrusting it.
[L] graph has changed: updating.
[L] Resetting resource: report
[L] Resetting resource: exportRes

Leaf analyzes the new pipeline structure and untrusts (only) the
necessary nodes.

1.5 Export and Publishing 
--------------------------

There are three main ways of exporting a Leaf protocol: building a
simple pdf representing the pipeline strucutre; deriving a more
elaborate pdf including different node shape for F-nodes (nodes
producing files on the disk) and node documentation; publishing a full
hypertextual protocol with additional information and node source
code. To use export and publishing features, graphviz features are
needed ([http://www.graphviz.org]).

* Exporting the pipeline structure 
  
  Upon creating a Leaf project (pyleaf.prj object), a DOT file is output
  to the project's root directory, named leafprot.lf.dot. DOT is a graph
  description format that supports tools to build graphical
  visualizations. A visualization of the project's pipeline can be
  obtained running the following command from a system shell:
  
  $ dot -Tpdf leafprot.lf.dot -o leafprot.pdf
  
* Exporting the protocol's pipeline 
  
  By "protocol's pipeline" we mean the pipeline as included in the
  protocol document output by the "publish" method (see below). In
  addition to the pipeline structure, it also includes documentation
  stripped from the source code and rectangular shape for nodes
  producing files. It can be produced with the following call:
  
  >>> p.export('py1')
  
  
* Publishing Leaf protocol: 
  
  Finally, a complete hypertext reporting the pipeline, some statistics
  about the project, source code for all nodes and links to the produced
  files can be obtained by:
  
  >>> p.publish('py1')
  
  By default, a directory named "html" will be created, including a
  "py1.html", which is the final protocol documentation.
  
