r - Subsetting with multiple conditions in very large data set -
i have matrix approximately 430 x 20,000. each row person, each column project have worked on. each cell has value of either 0 - (not involved), 1 - (project head, 1 per project), 2 - (project helper). trying @ projects single person head of. want @ 1 person @ time. person need r drop columns person's value isn't 1. want retain data other individuals in columns.
ex:
name project 1 project 2......project 2,000 person 1 0 2 person b 0 1 1 person c 2 2 2
i trying person b drops columns didn't head.
name project 2...... project 2,000 person 0 2 person b 1 1 person c 2 2
sorry if obvious, reason have struggled find examples data large (a.k.a can't type in column names because there many). appreciated.
so trying select columns of dataframe based on values in 1 of rows. using dataframe similar example:
> df # name project1 project2 project2000 #1 person 1 0 2 #2 person b 0 1 1 #3 person c 2 2 2
in order select columns for, say, "person b"
, need logical vector indicating columns keep, i.e. vector has length same number of columns in dataframe, , has value true
columns include in result, , false
otherwise.
you can almost with:
> leadb <- df[2,]==1 # name project1 project2 project2000 #2 false false true true
which picks out correct projects, drop name
column; include column, use:
> leadb <- c(true, df[2,-1]==1) #[1] true false true true
then use vector select columns dataframe:
> df_b <- df[,leadb] # name project2 project2000 #1 person 0 2 #2 person b 1 1 #3 person c 2 2
of course, can in single line, , there nothing special "person b"
row, use function returns desired dataframe person in row n
:
leader_df <- function(n){ df[,c(true, df[n,-1]==1)] }
then evaluating leader_df(n)
on values of n
1 number of rows give dataframes each project leader.
Comments
Post a Comment