parsing - Python for creating a set operations Calculator -

- February 15, 2014

i'm trying create calculator, not numbers, set operations. illustrate concept lets have file 2 columns.

keyword, userid hello  , john hello  , alice world  , alice world  , john mars   , john pluto  , dave

the goal read in expressions like

[hello]

and return set of users have keyword. example

[hello]           -> ['john','alice'] [world] - [mars]  -> ['alice'] // - here difference operation [world] * [mars]  -> ['john','alice'] // * here intersection operation [world] + [pluto] -> ['john','alice','dave'] // + here union operation

i used plyplus module in python generate following grammar parse requirement. grammar shown below

 grammar("""  start: tprog ;  @tprog: atom | expr u_symbol expr | expr i_symbol expr | expr d_symbol | expr | '\[' tprog '\]';  expr:   atom | '\[' tprog '\]';  @atom: '\[' queryterm '\]' ;  u_symbol: '\+' ;  i_symbol: '\*' ;  d_symbol: '\-' ;  queryterm: '[\w ]+' ;   ws: '[ \t]+' (%ignore);  """)

however, i'm not able find links on web take parsed output next level can evaluate parsed output step step. understand need parse syntax tree of sort , define functions apply each node & children recursively. appreciated.

i beginner tried solve question excuse me if answer has mistakes. suggest using pandas , think works best in case.

first save data in csv file

then

from pandas import *

the next line read file , turn dataframe

x=read_csv('data.csv') print(x)

the result

 keyword  userid 0  hello      john 1  hello     alice 2  world     alice 3  world      john 4  mars       john 5  pluto      dave

in next line, filter data frame , assign new variable

y= x[x['keyword'].str.contains("hello")]

where keyword column of interest , hello searching result

  keyword  userid 0  hello      john 1  hello     alice

we interested in second column use indexing take , save in new variable

z=y.iloc[:,1] print(z)

the result

0      john 1     alice

now last step turning dataframe list using

my_list = z.tolist()  print(my_list)

the result

[' john', ' alice']

i think achieve functionality require manipulating resulting lists

update: tried solve case have "or" code becomes this

from pandas import *  x=read_csv('data.csv') print(x) y= x[x['keyword'].str.contains("hello|pluto")] print(y)  z=y.iloc[:,1] print(z) my_list = z.tolist()  print(my_list)

the result

[' john', ' alice', ' dave']

update2: found solution "-" , "*" cases first, use same code both words

from pandas import *  x=read_csv('data.csv') print(x) y= x[x['keyword'].str.contains("world")] print(y)  z=y.iloc[:,1] print(z) my_list = z.tolist()  print(my_list) s= x[x['keyword'].str.contains("mars")] print(s)  g=s.iloc[:,1] print(g) my_list2 = g.tolist()  print(my_list2)

then add loop 2 subtract 2 lists

for in my_list:     if in my_list2:         my_list.remove(i) print(my_list)

the result

[' alice']

and intersection change last bit

for in my_list:     if not in  my_list2:         my_list.remove(i) print(my_list)

the result

[' john']

Search This Blog

Current CAD

parsing - Python for creating a set operations Calculator -

Comments

Post a Comment

Popular posts from this blog

python - argument must be rect style object - Pygame -

c++ - Qt setGeometry: Unable to set geometry -

How to do feature selection and reduction on a LIBSVM file in Spark using Python? -