r - Can I use readLines in mapreduce job in Rhadoop? -


i'm trying read text or gz file hdfs , run simple mapreduce job (actually map job) got error seems readlines part doesn't work. i'm seeking answers of whether can use readlines function in mapreduce. ps. there no problem if use readlines function parse hdfs files outside of mapreduce job. thanks.

counts <- function(path){         ct.map <- function(., lines) {         line <- readlines(lines)         word <- unlist(strsplit(line, pattern = " "))         keyval(word, 1)     }      mapreduce(     input = path,     input.format = "text",     map = ct.map         ) } counts("/user/ychen/100.txt") 

not - mapping function expects dfs formatted data come in. rewrite function this, formatting in input step:

counts <- function(path){   ct.map <- function(.,line) {     word <- unlist(strsplit(line, split = " "))     keyval(word, 1)   }    mapreduce(     input = to.dfs(readlines(path)),     map = function(k,v){ct.map(k,v)},     reduce = function(k,v){keyval(k,length(v))}   ) } output<-from.dfs(counts("/user/ychen/100.txt")) 

i added in reduce step, sum values.


Comments

Popular posts from this blog

c++ - Qt setGeometry: Unable to set geometry -

python - argument must be rect style object - Pygame -

How to resolve Delphi error: Incompatible types: 'PWideChar' and 'Pointer' -