r - Counting Frequencies Using (logical?) Expressions -


i have been teaching myself r scratch please bear me. have found multiple ways count observations, however, trying figure out how count frequencies using (logical?) expressions. have massive set of data approx 1 million observations. df set so:

    latitude    longitude   id  year    month   day value 66.16667    -10.16667   cpuele25399 1979    1   7   0 66.16667    -10.16667   cpuele25399 1979    1   8   0 66.16667    -10.16667   cpuele25399 1979    1   9   0 

there 154 unique id's , 154 unique lat/long. focusing in on top 1% of values each unique id. each unique id have calculated 99th percentile using associated values. went further , calculated each id's 99th percentile individual years , months i.e.. cpuele25399 1979 month=1 99th percentile value 3 (3 being floor of top 1%)

using these threshold values: each id, each year, each month- need count amount of times (per month per year) value >= ids 99th percentile

i have tried @ least 100 different approaches think fundamentally misunderstanding maybe in syntax? snippet of code has gotten me farthest:

ddply(total,       c('latitude','longitude','id','year','month'),         function(x) c(threshold=quantile(x$value,probs=.99,na.rm=true),                       frequency=nrow(x$value>=quantile(x$value,probs=.99,na.rm=true)))) 

r throws warning message saying >= not useful factors? if 1 out there understands convoluted message supremely grateful help.

using these threshold values: each id, each year, each month- need count amount of times (per month per year) value >= ids 99th percentile

does mean want to

  1. calculate 99th percentile each id (i.e. disregarding month year etc), , then
  2. work out number of times exceed value, split month , year id?

(note: example code groups lat/lon not mentioned in question, ignoring it. if wish add in, add grouping variable in appropriate places).

in case, can use ddply calculate per-id percentile first:

# calculate percentile each id total <- ddply(total, .(id), transform, threshold=quantile(value, probs=.99, na.rm=t)) 

and can group (id, month , year) see how many times exceed:

total <- ddply(total, .(id, month, year), summarize, freq=sum(value >= threshold)) 

note summarize return dataframe many rows there columns of .(id, month, year), i.e. drop latitude/longitude columns. if want keep use transform instead of summarize, , freq repeated different (lat, lon) each (id, mon, year) combo.


notes on ddply:

  • can .(id, month, year) rather c('id', 'month', 'year') have done
  • if want add columns, using summarize or mutate or transform lets slickly without needing total$ in front of column names.

Comments

Popular posts from this blog

c# - Better 64-bit byte array hash -

webrtc - Which ICE candidate am I using and why? -

php - Zend Framework / Skeleton-Application / Composer install issue -