How wordCount mapReduce jobs, run on hadoop yarn cluster with apache tez? -


as github page of tez says, tez simple , @ heart has 2 components:

  1. the data-processing pipeline engine, and

  2. a master data-processing application, where-by 1 can put arbitrary data-processing 'tasks' described above task-dag

well first question is, how existing mapreduce jobs wordcount exists in tez-examples.jar, converted task-dag? where? or don't...?

and second , more important question part:

every 'task' in tez has following:

  1. input consume key/value pairs from.
  2. processor process them.
  3. output collect processed key/value pairs.

who in charge of splitting input data between tez-tasks? code user provide or yarn (the resource manager) or tez itself?

the question same output phase. in advance

to answer first question on converting mapreduce jobs tez dags:

any mapreduce job can thought of single dag 2 vertices(stages). first vertex map phase , connected downstream vertex reduce via shuffle edge.

there 2 ways in mr jobs can run on tez:

  1. one approach write native 2-stage dag using tez apis directly. present in tez-examples.
  2. the second use mapreduce apis , use yarn-tez mode. in scenario, there layer intercepts mr job submission , instead of using mr, translates mr job 2-stage tez dag , executes dag on tez runtime.

for data handling related questions have:

the user provides logic on understanding data read , how split it. tez takes each split of data , takes on responsibility of assigning split or set of splits given task.

the tez framework controls generation , movement of data i.e. generate data between intermediate steps , how move data between 2 vertices/stages. however, not control underlying data contents/structure, partitioning or serialization logic provided user plugins.

the above high level view additional intricacies. more detailed answers posting specific questions development list ( http://tez.apache.org/mail-lists.html )


Comments

Popular posts from this blog

python - argument must be rect style object - Pygame -

webrtc - Which ICE candidate am I using and why? -

c# - Better 64-bit byte array hash -