statistics - R code optimizing for rep function -


i'm working data income/expense per home poll. 9,002 observations sample data base represent 3,155,937 homes through expansion factor this.

homeid    income    factor 001       23456     678 002       42578     1073 ..        ..        .. 9002      62333     987 

i'm trying exact summary of total income per decile expanding each income value times factor give result 3,155,937 ovservations vector , i'm using 'for' loop asign each value decile belongs to.

three <- nal %>% select(income,factor)  5 <- data.frame(income=rep(three$income,three$factor)) for(i in 1:31559379){if(i<=3155937){five$decil[i]=1} else{if(i<=6311874){five$decil[i]=2} else{if(i<=9467811){five$decil[i]=3} else{if(i<=12623748){five$decil[i]=4} else{if(i<=15779685){five$decil[i]=5} else{if(i<=18935622){five$decil[i]=6} else{if(i<=22091559){five$decil[i]=7} else{if(i<=25247496){five$decil[i]=8} else{if(i<=28403433){five$decil[i]=9} else{five$decil[i]=10} }}}}}}}}} for(i in 1:10){two=filter(five,decil==i); totdecil$inctot[i]=sum(two$income)} rm(five);rm(three);rm(two);gc() 

i want know if can me optimize code; has taken hours , still haven't finished.

the ntile function dplyr package worked better:

three <- nal %>% select(income,factor)  5 <- data.frame(income=rep(three$income,three$factor))  cinco$decil <- ntile(cinco$ing_cor,10) # ^ line works instead of 'for' loop & takes seconds run 

Comments

Popular posts from this blog

c# - Better 64-bit byte array hash -

webrtc - Which ICE candidate am I using and why? -

php - Zend Framework / Skeleton-Application / Composer install issue -