statistics - R code optimizing for rep function -
i'm working data income/expense per home poll. 9,002 observations sample data base represent 3,155,937 homes through expansion factor this.
homeid income factor 001 23456 678 002 42578 1073 .. .. .. 9002 62333 987
i'm trying exact summary of total income per decile expanding each income value times factor give result 3,155,937 ovservations vector , i'm using 'for' loop asign each value decile belongs to.
three <- nal %>% select(income,factor) 5 <- data.frame(income=rep(three$income,three$factor)) for(i in 1:31559379){if(i<=3155937){five$decil[i]=1} else{if(i<=6311874){five$decil[i]=2} else{if(i<=9467811){five$decil[i]=3} else{if(i<=12623748){five$decil[i]=4} else{if(i<=15779685){five$decil[i]=5} else{if(i<=18935622){five$decil[i]=6} else{if(i<=22091559){five$decil[i]=7} else{if(i<=25247496){five$decil[i]=8} else{if(i<=28403433){five$decil[i]=9} else{five$decil[i]=10} }}}}}}}}} for(i in 1:10){two=filter(five,decil==i); totdecil$inctot[i]=sum(two$income)} rm(five);rm(three);rm(two);gc()
i want know if can me optimize code; has taken hours , still haven't finished.
the ntile
function dplyr
package worked better:
three <- nal %>% select(income,factor) 5 <- data.frame(income=rep(three$income,three$factor)) cinco$decil <- ntile(cinco$ing_cor,10) # ^ line works instead of 'for' loop & takes seconds run
Comments
Post a Comment