Submit MapReduce Jobs to Yarn RM using REST API(s) – Oozie Workflow

Recently was exploring to integrate my MapReduce (MR) application with rest of the micro-services and their infrastructure. Available command line mechanism (e.g. using Hadoop or Yarn command from prompt) is not a recommended option here. It needs to be REST API based solution to make them consumable across applications. While exploring realized we can\’t use YARN Resource Manager API(s) to execute (submit) map-reduce applications. As documented here YARN (RM) REST API(s) mechanism to use RM API(s) involve retrieving application ID and then submit the application. It works fine for the Spark Job but not really for MapReduce Jobs.

If we execute a map-reduce application using these API(s), on completion of the sub-process parent application fails with \”Application application_xx_00xx failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_xx_00xx_000001 exited with exitCode: 0\”. You can see in the image it will launch two tasks parent and sub-task. Even after successful completion of sub-task parent task reports a failure.

How to Submit MR job to RM (Yarn) using API(s)

Recommended mechanism for submitting or managing map-reduce or Hadoop jobs is by using oozie workflow. It is integrated with the rest of the Hadoop stack supporting Hadoop MapReduce jobs. We can use oozie rest endpoint URL to submit (execute) a MapReduce job. Offcourse we can use Yarn (RM) API(s) to manage our submitted application from any other channel and perform activities like status, kill etc.

hdfs dfs -mkdir -p /user/root/examples/apps/mapreduce/lib

hdfs dfs -mkdir -p /user/root/examples/input-data/mapreduce (specified by mapred.input.dir)

hdfs dfs -mkdir -p /user/root/examples/output-data/mapreduce (specified by mapred.output.dir)

Submitting MapReduce (MR) jobs with Oozie

To provide more details using wordcount example from Hadoop examples jar.

Step 1: We can start by login to Hadoop Cluster and create a required working directory or helper structure. You can also use Ambari UI to perform these tasks.

Step 2: Create a properties or config file comprising the following properties. Let\’s say oozieconfig.xml





user.name

root





jobTracker

sandbox-hdp.hortonworks.com:8032





oozie.wf.application.path

/user/root/examples/apps/mapreduce





queueName

default





nameNode

hdfs://sandbox-hdp.hortonworks.com:8020





applicationName

testoozie

NOTE :  element in Oozie referred here is used to pass information of the RM and not really represent old job tracker.Specs still call it jobTracker though it can serve both JT or RM depending on Hadoop version you are using.

Here value provided for jobTracker is resource-manager URL. I am using the reference name from HDP sandbox here. Another value is nameNode URL here again referred from HDP sandbox for representation.
Step 3:

 







 ${jobTracker}

 ${nameNode}

 

  

   mapred.job.name

   map-reduce-wf

  

  

   mapred.mapper.new-api

   true

  

  

   mapred.reducer.new-api

   true

  

  

   mapred.job.queue.name

   ${queueName}

  

  

   mapreduce.map.class

   org.apache.hadoop.examples.WordCount$TokenizerMapper

  

  

   mapreduce.reduce.class

  org.apache.hadoop.examples.WordCount$TokenizerMapper

  

  

   mapreduce.reduce.class

   org.apache.hadoop.examples.WordCount$IntSumReducer

  

  

   mapreduce.combine.class

   org.apache.hadoop.examples.WordCount$IntSumReducer

  

  

   mapred.output.key.class

   org.apache.hadoop.io.Text

  

  

   mapred.output.value.class

   org.apache.hadoop.io.IntWritable

  

  

   mapred.input.dir

   /user/root/examples/input-data/mapreduce

  

  

   mapred.output.dir

   /user/root/examples/output-data/mapreduce

  

 











Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]

Apart from input and output directory you also need to provide classes representing mapper, reducer and other relevant information to your flow.
Step 4: Create sampledata.txt

test

here

insight

failure

nomore

enough

Step 5: Copy the desired libraries to designated directory structure within HDFS using following commands. Please change the structure as appropriate for your application.

Copy application jar file in this case hadoop mapreduce example
hdfs dfs -put hadoop-mapreduce-examples.jar /user/root/examples/apps/mapreduce/lib 

Copy created workflow xml file based on sample given
hdfs dfs -put workflow.xml /user/root/examples/apps/mapreduce/ 

Copy Sample Data for input
hdfs dfs -put sampledata.txt /user/root/examples/input-data/mapreduce

Step 6: Run
Here is how you can call it via curl. Off-course can use same from within your program based on the programming language you are using.

curl -i -s -X POST -H “Content-Type: application/xml” -T oozieconfig.xml http://sandbox-hdp.hortonworks.com:11000/oozie/v1/jobs?action=start

Step 7: Response
The command returns a JSON response that is similar to

HTTP/1.1 100 Continue

HTTP/1.1 201 Created

Server: Apache-Coyote/1.1

Content-Type: application/json;charset=UTF-8

Content-Length: 45

Date: Mon, 21 Jan 2019 07:19:03 GMT

{“id”:“0000008-190119071046177-oozie-oozi-W”}

You need to Be sure to record the job ID value i.e. {“id”:“0000008-190119071046177-oozie-oozi-W”}
Step 8: Status
Here is an example of using curl to retrieve the status of the workflow

curl -i -s -X GET -H \"Content-Type: application/xml\" -T oozieconfig.xml http://sandbox-hdp.hortonworks.com:11000/oozie/v1/job/\"0000008-190119071046177-oozie-oozi-W\"

HTTP/1.1 100 Continue
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/json;charset=UTF-8
Content-Length: 8114
Date: Mon, 21 Jan 2019 17:09:56 GMT

You can also check the job status from the Ambari console, select YARN and click Quick Links > Resource Manager UI. Select the job ID that matches the previous step result and view the job details.

For additional details you can refer oozie specification guidelines.

-Ritesh

Disclaimer: “The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions.”

13 thoughts on “Submit MapReduce Jobs to Yarn RM using REST API(s) – Oozie Workflow”

Shriram Earth says:

6th Feb 2019 at 10:47 am

It's Really A Great Post. Looking For Some More Stuff. shriram break free

LikeLike

Shriram Earth says:

21st Mar 2019 at 7:45 am

Helpful and good post. Thankyou for sharing Sobha Royal Pavilion Sarjapur Road

LikeLike

unknown says:

22nd Mar 2019 at 9:12 am

The post was really good. Thanks for sharingshriram earth plots

LikeLike

prestige elysian says:

27th Apr 2019 at 7:28 am

Information provided by you is very helpful and informative. Keep On updating such information.prestige elysian

LikeLike

Sugantha Raja says:

27th Jun 2019 at 7:08 am

Nice Blog, When i was read this blog i learnt new things & its truly have well stuff related to developing technology, Thank you for sharing this blog.Microsoft Azure Training in Chennai | Azure Training in Chennai

LikeLike

anonymous says:

2nd Aug 2019 at 5:49 am

I really enjoyed your blog Thanks for sharing such an informative post.Looking For Some More Stuff.best seo company in bangalore SSS digital Marketing

LikeLike

anonymous says:

2nd Aug 2019 at 5:49 am

I really enjoyed your blog Thanks for sharing such an informative post.Looking For Some More Stuff. shuttering works

LikeLike

Vishal DurgaIT says:

12th Aug 2019 at 1:41 pm

This comment has been removed by the author.

LikeLike

Admin says:

4th Nov 2019 at 12:28 pm

BA Exam Result – BA 1st Year, 2nd Year and 3rd Year Result Bsc Exam Result – Bsc 1st Year, 2nd Year and 3rd Year Result

LikeLike

Alexmohom says:

25th Nov 2019 at 10:34 am

Thanks for sharing is so amazing and helpful to us.Buy Hydrocodone online

LikeLike

Julia Loi says:

3rd Mar 2020 at 10:33 am

Really appreciate this wonderful post that you have provided for us.Great site and a great topic as well I really get amazed to read this. It's really good.I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!.mobile phone repair in Fredericksburg iphone repair in Fredericksburg cell phone repair in Fredericksburg phone repair in Fredericksburg tablet repair in Fredericksburg mobile phone repair in Fredericksburg mobile phone repair Fredericksburg iphone repair Fredericksburg cell phone repair Fredericksburg phone repair Fredericksburg

LikeLike

oliver greek says:

20th Aug 2020 at 8:20 am

Hey! I wish your working with QuickBooks goes superb. If now not, then dial at Quickbooks Customer Service Number.

LikeLike

Unknown says:

16th Dec 2020 at 6:44 am

Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. big data projects for students But it’s not the amount of data that’s important.Project Center in Chennai Spring Framework has already made serious inroads as an integrated technology stack for building user-facing applications. Corporate TRaining Spring Framework the authors explore the idea of using Java in Big Data platforms. Spring Training in Chennai The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training Project Centers in Chennai

LikeLike

Ritesh's Blog

My take on the latest in Data and AI, Smart Data Integration, DataOps, Micro-services, DataStage, Hybrid Cloud, Data Ingestion and Spark with a focus on Analytics and SaaS in Hybrid Cloud

Submit MapReduce Jobs to Yarn RM using REST API(s) – Oozie Workflow

13 thoughts on “Submit MapReduce Jobs to Yarn RM using REST API(s) – Oozie Workflow”

Leave a comment Cancel reply

Share this:

Related

13 thoughts on “Submit MapReduce Jobs to Yarn RM using REST API(s) – Oozie Workflow”

Leave a comment Cancel reply