statistics - R code optimizing for rep function -


i'm working data income/expense per home poll. 9,002 observations sample data base represent 3,155,937 homes through expansion factor this.

homeid    income    factor 001       23456     678 002       42578     1073 ..        ..        .. 9002      62333     987 

i'm trying exact summary of total income per decile expanding each income value times factor give result 3,155,937 ovservations vector , i'm using 'for' loop asign each value decile belongs to.

three <- nal %>% select(income,factor)  5 <- data.frame(income=rep(three$income,three$factor)) for(i in 1:31559379){if(i<=3155937){five$decil[i]=1} else{if(i<=6311874){five$decil[i]=2} else{if(i<=9467811){five$decil[i]=3} else{if(i<=12623748){five$decil[i]=4} else{if(i<=15779685){five$decil[i]=5} else{if(i<=18935622){five$decil[i]=6} else{if(i<=22091559){five$decil[i]=7} else{if(i<=25247496){five$decil[i]=8} else{if(i<=28403433){five$decil[i]=9} else{five$decil[i]=10} }}}}}}}}} for(i in 1:10){two=filter(five,decil==i); totdecil$inctot[i]=sum(two$income)} rm(five);rm(three);rm(two);gc() 

i want know if can me optimize code; has taken hours , still haven't finished.

the ntile function dplyr package worked better:

three <- nal %>% select(income,factor)  5 <- data.frame(income=rep(three$income,three$factor))  cinco$decil <- ntile(cinco$ing_cor,10) # ^ line works instead of 'for' loop & takes seconds run 

Comments

Popular posts from this blog

php - Zend Framework / Skeleton-Application / Composer install issue -

c# - Better 64-bit byte array hash -

python - PyCharm Type error Message -