How wordCount mapReduce jobs, run on hadoop yarn cluster with apache tez? -

- September 15, 2012

as github page of tez says, tez simple , @ heart has 2 components:

the data-processing pipeline engine, and
a master data-processing application, where-by 1 can put arbitrary data-processing 'tasks' described above task-dag

well first question is, how existing mapreduce jobs wordcount exists in tez-examples.jar, converted task-dag? where? or don't...?

and second , more important question part:

every 'task' in tez has following:

input consume key/value pairs from.
processor process them.
output collect processed key/value pairs.

who in charge of splitting input data between tez-tasks? code user provide or yarn (the resource manager) or tez itself?

the question same output phase. in advance

to answer first question on converting mapreduce jobs tez dags:

any mapreduce job can thought of single dag 2 vertices(stages). first vertex map phase , connected downstream vertex reduce via shuffle edge.

there 2 ways in mr jobs can run on tez:

one approach write native 2-stage dag using tez apis directly. present in tez-examples.
the second use mapreduce apis , use yarn-tez mode. in scenario, there layer intercepts mr job submission , instead of using mr, translates mr job 2-stage tez dag , executes dag on tez runtime.

for data handling related questions have:

the user provides logic on understanding data read , how split it. tez takes each split of data , takes on responsibility of assigning split or set of splits given task.

the tez framework controls generation , movement of data i.e. generate data between intermediate steps , how move data between 2 vertices/stages. however, not control underlying data contents/structure, partitioning or serialization logic provided user plugins.

the above high level view additional intricacies. more detailed answers posting specific questions development list ( http://tez.apache.org/mail-lists.html )

Search This Blog

Current CAD

How wordCount mapReduce jobs, run on hadoop yarn cluster with apache tez? -

Comments

Post a Comment

Popular posts from this blog

python - argument must be rect style object - Pygame -

c++ - Qt setGeometry: Unable to set geometry -

How to resolve Delphi error: Incompatible types: 'PWideChar' and 'Pointer' -