mysql - How often should I upload documents to CloudSearch (Solr)? -
here use case:
i use mysql primary data store , cloudsearch searching. database contains tables: threads, comments, upvotes, users.
i created expression sort search results based on "trending" using upvotes , created_at date (hacker news hot algorithm). expression called "trend", , used in cloudsearch query this: /search?q=superman&sort=trend+desc
(upotes-1)/pow(floor((_time-created_at)/3600000)+2, 1.8)
right now, when user upvotes thread or comment, stored in mysql database. question how should keep upvotes in sync cloudsearch?
the 2 options see:
- immediately insert (replace) upvote in mysql, update score on cloudsearch. involves sending single document upload on every upvote, ensures real-time accuracy.
- immediately insert (replace) upvote in mysql, keep upvote in cache somewhere (redis?). once every hour, upload upvotes cloudsearch.
what best way handle situation?
it depends on lot of things
your solr setup, how many servers, how memory, cpu, storage, how many documents, index size per shard/server etc.
how many "estimated" upvotes expecting? if take option 1, easier decide if can how estimate number.
since using solrcloud, has nrt feature makes sure documents available searches. again depends on current document corpus, , how many updates per second or minute expecting.
if know number of upvotes (updates solr) , if have sufficiently servers, go option 1, since reduce overhead of manitaining database, , logic updating upvotes every hour solr.
you can setup couple of test servers, , stress testing find out exact number of updates under solr's performance reduce.
i know not give exact yes or no, said, depends on particular use case.
Comments
Post a Comment