.. Hey Emacs, this is -*- rst -*-

   This file follows reStructuredText markup syntax; see
   http://docutils.sf.net/rst.html for more information.

.. include:: ../global.inc


.. _gcrypto:

The `gcrypto`:command: script
=============================

GC3Apps provide a script drive execution of multiple ``gnfs-cmd`` jobs
each of them with a different parameter set. Allotogehter they form a
single crypto simulation of a large parameter space.
It uses the generic `gc3libs.cmdline.SessionBasedScript` framework.

The purpose of `gcrypto`:command: is to execute *several concurrent
runs* of ``gnfs-cmd`` on a parameter set. These runs are performed in
parallel using every available GC3Pie :term:`resource`; you can of
course control how many runs should be executed and select what output
files you want from each one.

Introduction
------------

Like in a `for`-loop, the `gcrypto`:command: driver script takes as input
three mandatory arguments:

1. RANGE_START: initial value of the range (e.g., 800000000)
2. RANGE_END: final value of the range (e.g., 1200000000)
3. SLICE: extent of the range that will be examined by a single job (e.g., 1000)

For example::

  # gcrypto 800000000 1200000000 1000
  
will produce 400000 jobs; the first job will perform calculations
on the range 800000000 to 800000000+1000, the 2nd one will do the
range 800001000 to 800002000, and so on.

Inputfile archive location (e.g. *lfc://lfc.smscg.ch/crypto/lacal/input.tgz*)
can be specified with the '-i' option. Otherwise a default filename
'input.tgz' will be searched in current directory.

Job progress is monitored and, when a job is done,
output is retrieved back to submitting host in folders named:
``RANGE_START + (SLICE * ACTUAL_STEP)``
Where ``ACTUAL_STEP`` correspond to the position of the job in the
overall execution.

The `gcrypto`:command: command keeps a record of jobs (submitted, executed and
pending) in a session file (set name with the '-s' option); at each
invocation of the command, the status of all recorded jobs is updated,
output from finished jobs is collected, and a summary table of all
known jobs is printed.  New jobs are added to the session if new input
files are added to the command line.

Options can specify a maximum number of jobs that should be in
'SUBMITTED' or 'RUNNING' state; `gcrypto`:command: will delay submission
of newly-created jobs so that this limit is never exceeded.

The `gcrypto`:command: execute *several runs* of ``gnfs-cmd`` on a
parameter set, and collect the generated output.  These runs are
performed in parallel, up to a limit that can be configured with the
``-J`` `command-line option`:term:.  You can of course control how
many runs should be executed and select what output files you want
from each one.

In more detail, `gcrypto`:command: does the following:
   
1. Reads the `session`:term: (specified on the command line with the
   ``--session`` option) and loads all stored jobs into memory.
   If the session directory does not exist, one will be created with
   empty contents.
   
2. Divide the initial parameter range, given in the command-line,
   into `chunks`:term: taking the ``-J`` value as a reference. So
   from a coomand line argument like the following:
   ::
      $ gcrypto  800000000 1200000000 1000 -J 200
   
   `gcrypto`:command: will generate an initial `chunks`:term: of 200 jobs
   starting from the initial range 800000000 incrementing of 1000.
   All jobs will run ``gnfs-cmd`` on a specific parameter set
   (e.g. 800000000, 800001000, 800002000, ...). `gcrypto`:command: will keep
   constant the number of simulatenous jobs running retrieving
   those terminated and submitting new ones untill the whole
   parameter range has been computed.

3. Updates the state of all existing jobs, collects output from
   finished jobs, and submits new jobs generated in step 2.

   Finally, a summary table of all known jobs is printed.  (To control
   the amount of printed information, see the ``-l`` command-line
   option in the `session-based script`:ref: section.)

4. If the ``-C`` command-line option was given (see below), waits
   the specified amount of seconds, and then goes back to step 3.

   The program `gcrypto`:command: exits when all jobs have run to
   completion, i.e., when the whole paramenter range has been
   computed.

Execution can be interrupted at any time by pressing :kbd:`Ctrl+C`.
If the execution has been interrupted, it can be resumed at a later
stage by calling `gcrypto`:command: with exactly the same
command-line options.

`gcrypto`:command: requires a number of default input files common to every
submited job. This list of input files is automatically fetched by
`gcrypto`:command: from a default storage repository.
Those files are: 

::

      gnfs-lasieve6
      M1019
      M1019.st
      M1037
      M1037.st
      M1051
      M1051.st
      M1067
      M1067.st
      M1069
      M1069.st
      M1093
      M1093.st
      M1109
      M1109.st
      M1117
      M1117.st
      M1123
      M1123.st
      M1147
      M1147.st
      M1171
      M1171.st
      M8e_1200
      M8e_1500
      M8e_200
      M8e_2000
      M8e_2500
      M8e_300
      M8e_3000
      M8e_400
      M8e_4200
      M8e_600
      M8e_800
      tdsievemt

When `gcrypto`:command: has to be executed with a different set of input
files, an additional command line argument ``--input-files`` could be
used to specify the locatin of a `tar.gz`:ref: archive containing the
input files that ``gnfs-cmd`` will expect. Similarly, when a different
version of gnfs-cmd command needs to be used, the command line
argument ``--gnfs-cmd`` could be used to specify the location of the
``gnfs-cmd`` to be used.
   

Command-line invocation of `gcrypto`:command:
----------------------------------------------

The `gcrypto`:command: script is based on GC3Pie's `session-based
script <session-based script>`:ref: model; please read also the
`session-based script`:ref: section for an introduction to sessions
and generic command-line options.

A `gcrypto`:command: command-line is constructed as follows:
Like a `for`-loop, the `gcrypto`:command: driver script takes as input
three mandatory arguments:

1. *RANGE_START*: initial value of the range (e.g., 800000000)
2. *RANGE_END*: final value of the range (e.g., 1200000000)
3. *SLICE*: extent of the range that will be examined by a single job (e.g., 1000)

**Example 1.** The following command-line invocation uses
`gcrypto`:command: to run ``gnfs-cmd`` on the parameter set ranging from
800000000 to 1200000000 with an increment of 1000.
::

   $ gcrypto 800000000 1200000000 1000

In this case `gcrypto`:command: will use the default values for
determine the `chunks`:term: size from the default value of the ``-J``
option (default value is 50 simulatenous jobs).

**Example 2.**
::
   
   $ gcrypto --session SAMPLE_SESSION -c 4 -w 4 -m 8 800000000 1200000000 1000

In this example, job information is stored into session
``SAMPLE_SESSION`` (see the documentation of the ``--session`` option
in `session-based script`:ref:).  The command above creates the jobs,
submits them, and finally prints the following status report::

  Status of jobs in the 'SAMPLE_SESSION' session: (at 10:53:46, 02/28/12)
  NEW   0/50    (0.0%)  
  RUNNING   0/50    (0.0%)  
  STOPPED   0/50    (0.0%)  
  SUBMITTED   50/50   (100.0%) 
  TERMINATED   0/50    (0.0%)  
  TERMINATING   0/50    (0.0%)  
  total   50/50   (100.0%) 

Note that the status report counts the number of *jobs in the
session*, not the total number of jobs that would correspond to the
whole parameter range. (Feel free to report this as a bug.)

Calling `gcrypto`:command: over and over again will result in the same jobs
being monitored; 

The ``-C`` option tells `gcrypto`:command: to continue running until
all jobs have finished running and the output files have been
correctly retrieved.  On successful completion, the command given in
*example 2.* above, would print::

  Status of jobs in the 'SAMPLE_SESSION' session: (at 11:05:50, 02/28/12)
  NEW   0/400k    (0.0%)  
  RUNNING   0/400k    (0.0%)  
  STOPPED   0/400k    (0.0%)  
  SUBMITTED   0/400k    (0.0%)  
  TERMINATED   50/400k   (100.0%) 
  TERMINATING   0/400k    (0.0%)  
  ok   400k/400k   (100.0%) 
  total   400k/400k   (100.0%) 

Each job will be named after the parameter range it has computed (e.g.
800001000, 800002000, ... ) (you could see this by passing
the ``-l`` option to `gcrypto`:command:); each of these jobs will
create an output directory named after the job.

For each job, the set of output files is automatically retrieved and
placed in the locations described below.  

Output files for `gcrypto`:command:
------------------------------------

Upon successful completion, the output directory of each
`gcrypto`:command: job contains:

* a number of `.tgz`:term: files each of them correspondin to a step
  within the execution of the ``gnfs-cmd`` command.
* A log file named `gcrypto.log`:term: containing both the
  `stdout`:term: and the `stderr`:term: of the ``gnfs-cmd``
  execution.

.. note::
   The number of `.tgz`:term: files may depend on whether the
   execution of the ``gnfs-cmd`` command has completed or not
   (e.g. jobs may be killed by the batch system when exausting
   requested resources)

Example usage
-------------

This section contains commented example sessions with
`gcrypto`:command:.

Manage a set of jobs from start to end
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

*In typical operation,* one calls `gcrypto`:command: with the ``-C``
option and lets it manage a set of jobs until completion.

So, to compute a whole parameter range from 800000000 to 1200000000
with an increment of 1000, submitting 200 jobs simultaneously each of
them requesting 4 computing cores, 8GB of memory and 4 hours of
`wall-clock time <walltime>`:term:, one can use the following
command-line invocation::
  
  $ gcrypto -s example -C 120 -J 200 -c 4 -w 4 -m 8 800000000 1200000000 1000

The ``-s example`` option tells `gcrypto`:command: to store
information about the computational jobs in the ``example.jobs``
directory. 

The ``-C 120`` option tells `gcrypto`:command: to update job state
every 120 seconds; output from finished jobs is retrieved and new jobs
are submitted at the same interval.

The above command will start by printing a status report like the
following:: 

Status of jobs in the 'example.csv' session:
SUBMITTED   1/1 (100.0%)

It will continue printing an updated status report every 120 seconds
until the requested parameter range has been computed.

In GC3Pie terminology when a job is finished and its output has been
successfully retrieved, the job is marked as ``TERMINATED``::

  Status of jobs in the 'example.csv' session:
  TERMINATED   1/1 (100.0%)


Using GC3Pie utilities
----------------------

GC3Pie comes with a set of generic utilities that could be used as a
complemet to the `gcrypto`:command: command to better manage a entire
session execution.

:command:`gkill`: cancel a running job
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To cancel a running job, you can use the command :command:`gkill`.  For
instance, to cancel *job.16*, you would type the following command
into the terminal::

  gkill job.16

or::

  gkill -s example job.16


gkill could also be used to cancel jobs in  a given state
::

  gkill -s example -l UNKNOWN

.. warning:: 

   *There's no way to undo a cancel operation!* Once you have issued a
   :command:`gkill` command, the job is deleted and it cannot be resumed.
   (You can still re-submit it with :command:`gresub`, though.)


:command:`ginfo`: accessing low-level details of a job
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It is sometimes necessary, for debugging purposes, to print out all
the details about a job; the :command:`ginfo` command does just that: prints
all the details that GC3Utils know about a single job.

For instance, to print out detailed information about `job.13` in
session `example`, you would type
::

    ginfo -s example job.13

For a job in ``RUNNING`` or ``SUBMITTED`` state, only little
information is known: basically, where the job is running, and when it
was started::


If you omit the job number, information about *all* jobs in the
session will be printed.

Most of the output is only useful if you are familiar with GC3Utils
inner working. Nonetheless, :command:`ginfo` output is definitely something
you should include in any report about a misbehaving job!

For a finished job, the information is more complete and can include
error messages in case the job has failed
::

    $ ginfo -s example job.13
    job.13
        cores: 2
	execution_targets: hera.wsl.ch
	log: 
            SUBMITTED at Tue May 15 09:52:05 2012
	    Submitted to 'wsl' at Tue May 15 09:52:05 2012
	    RUNNING at Tue May 15 10:07:39 2012
	lrms_jobid: gsiftp://hera.wsl.ch:2811/jobs/116613370683251353308673
	lrms_jobname: LACAL_800001000
	original_exitcode: -1
	queue: smscg.q
	resource_name: wsl
	state_last_changed: 1337069259.18
	stderr_filename: gcrypto.log
	stdout_filename: gcrypto.log
	timestamp: 
	    RUNNING: 1337069259.18
	    SUBMITTED: 1337068325.26
	unknown_iteration: 0
	used_cputime: 1380
	used_memory: 3382706

With option ``-v``, :command:`ginfo` output is even more verbose and complete,
and includes information about the application itself, the input and
output files, plus some backend-specific information
::

  $ ginfo -c -s example job.13
  job.13
    arguments: 800000800, 100, 2, input.tgz
    changed: False
    environment: 
    executable: gnfs-cmd
    executables: gnfs-cmd
    execution: 
        _arc0_state_last_checked: 1337069259.18
        _exitcode: 0
        _signal: None
        _state: TERMINATED
        cores: 2
        download_dir: /data/crypto/results/example.out/8000001000
        execution_targets: hera.wsl.ch
        log:
            SUBMITTED at Tue May 15 09:52:04 2012
            Submitted to 'wsl' at Tue May 15 09:52:04 2012
            TERMINATING at Tue May 15 10:07:39 2012
            Final output downloaded to '/data/crypto/results/example.out/8000001000'
            TERMINATED at Tue May 15 10:07:43 2012
        lrms_jobid: gsiftp://hera.wsl.ch:2811/jobs/11441337068324584585032
        lrms_jobname: LACAL_800001000
        original_exitcode: 0
        queue: smscg.q
        resource_name: wsl
        state_last_changed: 1337069263.13
        stderr_filename: gcrypto.log
        stdout_filename: gcrypto.log
        timestamp:
            SUBMITTED: 1337068324.87
            TERMINATED: 1337069263.13
            TERMINATING: 1337069259.18
        unknown_iteration: 0
        used_cputime: 360
        used_memory: 3366977
        used_walltime: 300
    inputs: 
        srm://dpm.lhep.unibe.ch/dpm/lhep.unibe.ch/home/crypto/gnfs-cmd_20120406: gnfs-cmd
        srm://dpm.lhep.unibe.ch/dpm/lhep.unibe.ch/home/crypto/lacal_input_files.tgz: input.tgz
    jobname: LACAL_800000900
    join: True
    output_base_url: None
    output_dir: /data/crypto/results/example.out/8000001000
    outputs: 
        @output.list: file, , @output.list, None, None, None, None
        gcrypto.log: file, , gcrypto.log, None, None, None, None
    persistent_id: job.1698503
    requested_architecture: x86_64
    requested_cores: 2
    requested_memory: 4
    requested_walltime: 4
    stderr: None
    stdin: None
    stdout: gcrypto.log
    tags: APPS/CRYPTO/LACAL-1.0

.. References

.. _grid: http://www.smscg.ch
