
run-job-flow
************


DESCRIPTION
===========

run-job-flow creates and starts running a new job flow. The job flow
will run the steps specified. Once the job flow completes, the cluster
is stopped and the HDFS partition is lost. To prevent loss of data,
configure the last step of the job flow to store results in Amazon S3.
If the  JobFlowInstancesConfig "KeepJobFlowAliveWhenNoSteps" parameter
is set to "TRUE" , the job flow will transition to the WAITING state
rather than shutting down once the steps have completed.

For additional protection, you can set the  JobFlowInstancesConfig
"TerminationProtected" parameter to "TRUE" to lock the job flow and
prevent it from being terminated by API call, user intervention, or in
the event of a job flow error.

A maximum of 256 steps are allowed in each job flow.

If your job flow is long-running (such as a Hive data warehouse) or
complex, you may require more than 256 steps to process your data. You
can bypass the 256-step limitation in various ways, including using
the SSH shell to connect to the master node and submitting queries
directly to the software running on the master node, such as Hive and
Hadoop. For more information on how to do this, go to Add More than
256 --steps to a Job Flow in the *Amazon Elastic MapReduce Developer's
Guide* .

For long running job flows, we recommend that you periodically store
your results.


SYNOPSIS
========

   aws emr run-job-flow
     --name <value>
     --instances <value>
     [--additional-info <value>]
     [--bootstrap-actions <value>]
     [--job-flow-role <value>]
     [--steps <value>]
     [--visible-to-all-users ]
     [--log-uri <value>]
     [--ami-version <value>]
     [--supported-products <value>]


REQUIRED PARAMETERS
===================

"--name"  (string)
   The name of the job flow.

"--instances"  (structure)
   A specification of the number and type of Amazon EC2 instances on
   which to run the job flow.

   "instance_count"  (integer)
      The number of Amazon EC2 instances used to execute the job flow.

   "placement"  (structure)
      Specifies the Availability Zone the job flow will run in.

      "availability_zone"  (string)
         The Amazon EC2 Availability Zone for the job flow.

   "termination_protected"  (boolean)
      Specifies whether to lock the job flow to prevent the Amazon EC2
      instances from being terminated by API call, user intervention,
      or in the event of a job flow error.

   "instance_groups"  (list)
      Configuration for the job flow's instance groups.

         (structure)
            Configuration defining a new instance group.

            "instance_count"  (integer)
               Target number of instances for the instance group.

            "name"  (string)
               Friendly name given to the instance group.

            "instance_role"  (string)
               The role of the instance group in the cluster.

            "bid_price"  (string)
               Bid price for each Amazon EC2 instance in the instance
               group when launching nodes as Spot Instances, expressed
               in USD.

            "instance_type"  (string)
               The Amazon EC2 instance type for all instances in the
               instance group.

            "market"  (string)
               Market type of the Amazon EC2 instances used to create
               a cluster node.

   "master_instance_type"  (string)
      The EC2 instance type of the master node.

   "hadoop_version"  (string)
      Specifies the Hadoop version for the job flow. Valid inputs are
      "0.18", "0.20", or "0.20.205". If you do not set this value, the
      default of 0.18 is used, unless the --ami-version parameter is
      set in the run-job-flow call, in which case the default version
      of Hadoop for that AMI version is used.

   "ec_2_subnet_id"  (string)
      To launch the job flow in Amazon Virtual Private Cloud (Amazon
      VPC), set this parameter to the identifier of the Amazon VPC
      subnet where you want the job flow to launch. If you do not
      specify this value, the job flow is launched in the normal
      Amazon Web Services cloud, outside of an Amazon VPC.

      Amazon VPC currently does not support cluster compute quadruple
      extra large (cc1.4xlarge) instances. Thus you cannot specify the
      cc1.4xlarge instance type for nodes of a job flow launched in a
      Amazon VPC.

   "keep_job_flow_alive_when_no_steps"  (boolean)
      Specifies whether the job flow should terminate after completing
      all steps.

   "slave_instance_type"  (string)
      The EC2 instance type of the slave nodes.

   "ec_2_key_name"  (string)
      Specifies the name of the Amazon EC2 key pair that can be used
      to ssh to the master node as the user called "hadoop."

   *Parameter Syntax*

      {
        "instance_count": integer,
        "placement": {
          {
            "availability_zone": "string"
          },
        "termination_protected": true|false,
        "instance_groups":
          [
            {
              "instance_count": integer,
              "name": "string",
              "instance_role": "MASTER"|"CORE"|"TASK",
              "bid_price": "string",
              "instance_type": "string",
              "market": "ON_DEMAND"|"SPOT"
            }
            ...
          ],
        "master_instance_type": "string",
        "hadoop_version": "string",
        "ec_2_subnet_id": "string",
        "keep_job_flow_alive_when_no_steps": true|false,
        "slave_instance_type": "string",
        "ec_2_key_name": "string"
      }


OPTIONAL PARAMETERS
===================

"--additional-info"  (string)
   A JSON string for selecting additional features.

"--bootstrap-actions"  (list)
   A list of bootstrap actions that will be run before Hadoop is
   started on the cluster nodes.

      (structure)
         Configuration of a bootstrap action.

         "script_bootstrap_action"  (structure)
            The script run by the bootstrap action.

            "path"  (string)
               Location of the script to run during a bootstrap
               action. Can be either a location in Amazon S3 or on a
               local file system.

            "args"  (list of string)
               A list of command line arguments to pass to the
               bootstrap action script.

         "name"  (string)
            The name of the bootstrap action.

   *Parameter Syntax*

      [
        {
          "script_bootstrap_action": {
            {
              "path": "string",
              "args":
                ["string", ...]
            },
          "name": "string"
        }
        ...
      ]

"--job-flow-role"  (string)
   An IAM role for the job flow. The EC2 instances of the job flow
   assume this role. The default role is "EMRJobflowDefault" . In
   order to use the default role, you must have already created it
   using the CLI.

"--steps"  (list)
   A list of steps to be executed by the job flow.

      (structure)
         Specification of a job flow step.

         "hadoop_jar_step"  (structure)
            Specifies the JAR file used for the job flow step.

            "main_class"  (string)
               The name of the main class in the specified Java file.
               If not specified, the JAR file should specify a Main-
               Class in its manifest file.

            "args"  (list of string)
               A list of command line arguments passed to the JAR
               file's main function when executed.

            "jar"  (string)
               A path to a JAR file run during the step.

            "properties"  (list)
               A list of Java properties that are set when the step
               runs. You can use these properties to pass key value
               pairs to your main function.

                  (structure)
                     A key value pair.

                     "value"  (string)
                        The value part of the identified key.

                     "key"  (string)
                        The unique identifier of a key value pair.

         "name"  (string)
            The name of the job flow step.

         "action_on_failure"  (string)
            Specifies the action to take if the job flow step fails.

   *Parameter Syntax*

      [
        {
          "hadoop_jar_step": {
            {
              "main_class": "string",
              "args":
                ["string", ...],
              "jar": "string",
              "properties":
                [
                  {
                    "value": "string",
                    "key": "string"
                  }
                  ...
                ]
            },
          "name": "string",
          "action_on_failure": "TERMINATE_JOB_FLOW"|"CANCEL_AND_WAIT"|"CONTINUE"
        }
        ...
      ]

"--visible-to-all-users"  (boolean)
   Whether the job flow is visible to all IAM users of the AWS account
   associated with the job flow. If this value is set to "true" , all
   IAM users of that AWS account can view and (if they have the proper
   policy permissions set) manage the job flow. If it is set to
   "false" , only the IAM user that created the job flow can view and
   manage it.

"--log-uri"  (string)
   Specifies the location in Amazon S3 to write the log files of the
   job flow. If a value is not provided, logs are not created.

"--ami-version"  (string)
   The version of the Amazon Machine Image (AMI) to use when launching
   Amazon EC2 instances in the job flow. The following values are
   valid:

   * "latest" (uses the latest AMI)

   * The version number of the AMI to use, for example, "2.0"

   If this value is not specified, the job flow uses the default of
   (AMI 1.0, Hadoop 0.18).

   If the AMI supports multiple versions of Hadoop (for example, AMI
   1.0 supports both Hadoop 0.18 and 0.20) you can use the
   JobFlowInstancesConfig "HadoopVersion" parameter to modify the
   version of Hadoop from the defaults shown above.

   For details about the AMI versions currently supported by Amazon
   ElasticMapReduce, go to AMI Versions Supported in Elastic MapReduce
   in the *Amazon Elastic MapReduce Developer's Guide.*

"--supported-products"  (list of string)
   A list of strings that indicates third-party software to use with
   the job flow. For more information, go to Use Third Party
   Applications with Amazon EMR . Currently supported values are:

   * "karmasphere-enterprise-utility" - tag the job flow for
     management by Karmasphere.

   * "mapr-m3" - launch the job flow using MapR M3 Edition.

   * "mapr-m5" - launch the job flow using MapR M5 Edition.

   *Parameter Syntax*

      ["string", ...]
