parsing - Python for creating a set operations Calculator -


i'm trying create calculator, not numbers, set operations. illustrate concept lets have file 2 columns.

keyword, userid hello  , john hello  , alice world  , alice world  , john mars   , john pluto  , dave 

the goal read in expressions like

[hello] 

and return set of users have keyword. example

[hello]           -> ['john','alice'] [world] - [mars]  -> ['alice'] // - here difference operation [world] * [mars]  -> ['john','alice'] // * here intersection operation [world] + [pluto] -> ['john','alice','dave'] // + here union operation 

i used plyplus module in python generate following grammar parse requirement. grammar shown below

 grammar("""  start: tprog ;  @tprog: atom | expr u_symbol expr | expr i_symbol expr | expr d_symbol | expr | '\[' tprog '\]';  expr:   atom | '\[' tprog '\]';  @atom: '\[' queryterm '\]' ;  u_symbol: '\+' ;  i_symbol: '\*' ;  d_symbol: '\-' ;  queryterm: '[\w ]+' ;   ws: '[ \t]+' (%ignore);  """) 

however, i'm not able find links on web take parsed output next level can evaluate parsed output step step. understand need parse syntax tree of sort , define functions apply each node & children recursively. appreciated.

i beginner tried solve question excuse me if answer has mistakes. suggest using pandas , think works best in case.

first save data in csv file

then

from pandas import * 

the next line read file , turn dataframe

x=read_csv('data.csv') print(x) 

the result

 keyword  userid 0  hello      john 1  hello     alice 2  world     alice 3  world      john 4  mars       john 5  pluto      dave 

in next line, filter data frame , assign new variable

y= x[x['keyword'].str.contains("hello")] 

where keyword column of interest , hello searching result

  keyword  userid 0  hello      john 1  hello     alice 

we interested in second column use indexing take , save in new variable

z=y.iloc[:,1] print(z) 

the result

0      john 1     alice 

now last step turning dataframe list using

my_list = z.tolist()  print(my_list) 

the result

[' john', ' alice'] 

i think achieve functionality require manipulating resulting lists

update: tried solve case have "or" code becomes this

from pandas import *  x=read_csv('data.csv') print(x) y= x[x['keyword'].str.contains("hello|pluto")] print(y)  z=y.iloc[:,1] print(z) my_list = z.tolist()  print(my_list) 

the result

[' john', ' alice', ' dave'] 

update2: found solution "-" , "*" cases first, use same code both words

from pandas import *  x=read_csv('data.csv') print(x) y= x[x['keyword'].str.contains("world")] print(y)  z=y.iloc[:,1] print(z) my_list = z.tolist()  print(my_list) s= x[x['keyword'].str.contains("mars")] print(s)  g=s.iloc[:,1] print(g) my_list2 = g.tolist()  print(my_list2) 

then add loop 2 subtract 2 lists

for in my_list:     if in my_list2:         my_list.remove(i) print(my_list) 

the result

[' alice'] 

and intersection change last bit

for in my_list:     if not in  my_list2:         my_list.remove(i) print(my_list) 

the result

[' john'] 

Comments

Popular posts from this blog

c# - Better 64-bit byte array hash -

webrtc - Which ICE candidate am I using and why? -

php - Zend Framework / Skeleton-Application / Composer install issue -