neo4j - Traversing through all nodes and comparing each one with every other one -
i working on little project , have dataset of 60k nodes , 500k relationships between nodes. nodes of 2 types. first type are recipes , second type ingredients. recipes composed of ingredients like:
(ingredient)-[:is_part_of]->(recipe)
my objective find how many common ingredients 2 recipes share. have managed obtain information following query compares 1 recipe others (the first 1 others):
match (recipe:recipe{ id: 1000000 }),(other) (other.id >= 1000001 , other.id <= 1057690) optional match (recipe:recipe)<-[:is_part_of]-(ingredient:ingredient)- [:is_part_of]->(other) ingredient, other return other.id, count(distinct ingredient.name) order other.id desc
my first question: how can obtain number of ingredients of 2 recipes in way mutual ones counted once (union of r1 , r2 --> r1 u r2)
my second question: possible write loop iterate through recipes , check common ingredients? objective compare each recipe others. think should return (n-1)*(n/2) rows.
i have tried above , problem remains. limit
, skip
can not run code on whole set. have changed query allows me partition set accordingly:
match (recipe1)<-[:is_part_of]-(ingredient:ingredient)-[:is_part_of]->(recipe2) (recipe2.id >= 1000000 , recipe2.id <= 1000009) , (recipe1.id >= 1000000 , recipe1.id <= 1000009) , (recipe1.id < recipe2.id) return recipe1.id, count(distinct ingredient.name) mutualingredients, recipe2.id order recipe1.id
until hands on better machine suffice.
i still haven't solved first question: how can obtain number of ingredients of 2 recipes in way mutual ones counted once (union of r1 , r2 --> r1 u r2)
you'll need play this, it's going similar this:
match (recipe1:recipe)<-[:is_part_of]-(ingred:ingredient)-[:is_part_of]->(recipe2:recipe) id(recipe1) < id(recipe2) return recipe1, collect(ingred.name), recipe2 order recipe1.id
the match pattern gets of common ingredients between 2 recipes. where
clause ensures you're not comparing recipe (because share ingredients itself). return clause gives 2 recipes you're comparing, , have in common.
this o(n^2) though, , very slow.
update took nicole's suggestion, one. should guarantee each pair considered once.
Comments
Post a Comment