neo4j - Traversing through all nodes and comparing each one with every other one -


i working on little project , have dataset of 60k nodes , 500k relationships between nodes. nodes of 2 types. first type are recipes , second type ingredients. recipes composed of ingredients like:

    (ingredient)-[:is_part_of]->(recipe) 

my objective find how many common ingredients 2 recipes share. have managed obtain information following query compares 1 recipe others (the first 1 others):

   match (recipe:recipe{ id: 1000000 }),(other)    (other.id >= 1000001 , other.id <= 1057690)    optional match (recipe:recipe)<-[:is_part_of]-(ingredient:ingredient)-                 [:is_part_of]->(other)    ingredient, other    return other.id, count(distinct ingredient.name)    order other.id desc 

my first question: how can obtain number of ingredients of 2 recipes in way mutual ones counted once (union of r1 , r2 --> r1 u r2)

my second question: possible write loop iterate through recipes , check common ingredients? objective compare each recipe others. think should return (n-1)*(n/2) rows.

i have tried above , problem remains. limit , skip can not run code on whole set. have changed query allows me partition set accordingly:

match (recipe1)<-[:is_part_of]-(ingredient:ingredient)-[:is_part_of]->(recipe2) (recipe2.id >= 1000000 , recipe2.id <= 1000009) , (recipe1.id >=   1000000 , recipe1.id <= 1000009) , (recipe1.id < recipe2.id) return recipe1.id, count(distinct ingredient.name) mutualingredients, recipe2.id order recipe1.id 

until hands on better machine suffice.

i still haven't solved first question: how can obtain number of ingredients of 2 recipes in way mutual ones counted once (union of r1 , r2 --> r1 u r2)

you'll need play this, it's going similar this:

match (recipe1:recipe)<-[:is_part_of]-(ingred:ingredient)-[:is_part_of]->(recipe2:recipe) id(recipe1) < id(recipe2) return recipe1, collect(ingred.name), recipe2 order recipe1.id 

the match pattern gets of common ingredients between 2 recipes. where clause ensures you're not comparing recipe (because share ingredients itself). return clause gives 2 recipes you're comparing, , have in common.

this o(n^2) though, , very slow.

update took nicole's suggestion, one. should guarantee each pair considered once.


Comments

Popular posts from this blog

python - argument must be rect style object - Pygame -

webrtc - Which ICE candidate am I using and why? -

c# - Better 64-bit byte array hash -