r - Subsetting with multiple conditions in very large data set -

- April 15, 2015

i have matrix approximately 430 x 20,000. each row person, each column project have worked on. each cell has value of either 0 - (not involved), 1 - (project head, 1 per project), 2 - (project helper). trying @ projects single person head of. want @ 1 person @ time. person need r drop columns person's value isn't 1. want retain data other individuals in columns.

ex:

 name   project 1   project 2......project 2,000 person      1            0                    2 person b      0            1                    1 person c      2            2                    2

i trying person b drops columns didn't head.

 name    project 2......   project 2,000 person      0                    2 person b      1                    1 person c      2                    2

sorry if obvious, reason have struggled find examples data large (a.k.a can't type in column names because there many). appreciated.

so trying select columns of dataframe based on values in 1 of rows. using dataframe similar example:

> df #      name project1 project2 project2000 #1 person        1        0           2 #2 person b        0        1           1 #3 person c        2        2           2

in order select columns for, say, "person b", need logical vector indicating columns keep, i.e. vector has length same number of columns in dataframe, , has value true columns include in result, , false otherwise.

you can almost with:

> leadb <- df[2,]==1 #   name project1 project2 project2000 #2 false    false     true        true

which picks out correct projects, drop name column; include column, use:

> leadb <- c(true, df[2,-1]==1) #[1]  true false  true  true

then use vector select columns dataframe:

> df_b <- df[,leadb] #      name project2 project2000 #1 person        0           2 #2 person b        1           1 #3 person c        2           2

of course, can in single line, , there nothing special "person b" row, use function returns desired dataframe person in row n:

leader_df <- function(n){     df[,c(true, df[n,-1]==1)] }

then evaluating leader_df(n) on values of n 1 number of rows give dataframes each project leader.

Search This Blog

Current CAD

r - Subsetting with multiple conditions in very large data set -

Comments

Post a Comment

Popular posts from this blog

python - argument must be rect style object - Pygame -

c++ - Qt setGeometry: Unable to set geometry -

javascript - How To Make Two Container Heights Match? -