parsing - Python for creating a set operations Calculator -
i'm trying create calculator, not numbers, set operations. illustrate concept lets have file 2 columns.
keyword, userid hello , john hello , alice world , alice world , john mars , john pluto , dave
the goal read in expressions like
[hello]
and return set of users have keyword. example
[hello] -> ['john','alice'] [world] - [mars] -> ['alice'] // - here difference operation [world] * [mars] -> ['john','alice'] // * here intersection operation [world] + [pluto] -> ['john','alice','dave'] // + here union operation
i used plyplus
module in python generate following grammar parse requirement. grammar shown below
grammar(""" start: tprog ; @tprog: atom | expr u_symbol expr | expr i_symbol expr | expr d_symbol | expr | '\[' tprog '\]'; expr: atom | '\[' tprog '\]'; @atom: '\[' queryterm '\]' ; u_symbol: '\+' ; i_symbol: '\*' ; d_symbol: '\-' ; queryterm: '[\w ]+' ; ws: '[ \t]+' (%ignore); """)
however, i'm not able find links on web take parsed output next level can evaluate parsed output step step. understand need parse syntax tree of sort , define functions apply each node & children recursively. appreciated.
i beginner tried solve question excuse me if answer has mistakes. suggest using pandas , think works best in case.
first save data in csv file
then
from pandas import *
the next line read file , turn dataframe
x=read_csv('data.csv') print(x)
the result
keyword userid 0 hello john 1 hello alice 2 world alice 3 world john 4 mars john 5 pluto dave
in next line, filter data frame , assign new variable
y= x[x['keyword'].str.contains("hello")]
where keyword column of interest , hello searching result
keyword userid 0 hello john 1 hello alice
we interested in second column use indexing take , save in new variable
z=y.iloc[:,1] print(z)
the result
0 john 1 alice
now last step turning dataframe list using
my_list = z.tolist() print(my_list)
the result
[' john', ' alice']
i think achieve functionality require manipulating resulting lists
update: tried solve case have "or" code becomes this
from pandas import * x=read_csv('data.csv') print(x) y= x[x['keyword'].str.contains("hello|pluto")] print(y) z=y.iloc[:,1] print(z) my_list = z.tolist() print(my_list)
the result
[' john', ' alice', ' dave']
update2: found solution "-" , "*" cases first, use same code both words
from pandas import * x=read_csv('data.csv') print(x) y= x[x['keyword'].str.contains("world")] print(y) z=y.iloc[:,1] print(z) my_list = z.tolist() print(my_list) s= x[x['keyword'].str.contains("mars")] print(s) g=s.iloc[:,1] print(g) my_list2 = g.tolist() print(my_list2)
then add loop 2 subtract 2 lists
for in my_list: if in my_list2: my_list.remove(i) print(my_list)
the result
[' alice']
and intersection change last bit
for in my_list: if not in my_list2: my_list.remove(i) print(my_list)
the result
[' john']
Comments
Post a Comment