[p2pu-dev] search comparison: elasticsearch and solr

Jessy Kate Schingler jessy at jessykate.com
Wed Jun 1 02:36:52 UTC 2011


doing some reading up some on solr and elastic search... (chose these two
because of our previous thread). here's some notes on the two (sorry i
didn't source everything, but included a couple of interesting links). both
are built on the apache Lucene search library. this is by no means
exhaustive or complete.

main conclusion: they're built on the same fundamental technology. one is
more of a proven workhorse, the other is fundamentally built for distributed
scalability. and us? we probably have no idea yet what our real pain points
will be :) that said, now's a good time to ruminate, and it's worth thinking
about what those pain points could be.

i'll try to get going with installation of one or the other in the next
couple of days. if others want to jump on too, just let me know....

also, not sure where we should host this-- the dev server seems to be
bursting at the seams....

*elastic:*
real time indexing
schema-less?
emphasis on distributed (but can be used in single instance too)
scripting support (sounds interesting, but not obvious we coudn't do this
with a "real" scripting language too. probably speed/complexity tradeoffs).
can it talk to the db directly?
relatively new product with one central charismatic figure
has a bit of new and shiny syndrome, but definitely seems very shiny.
faster uqerying with real-time indexing
http://engineering.socialcast.com/2011/05/realtime-search-solr-vs-elasticsearch/
built-in restful API

* solr:*
cron-job style indexing
schema-full
smaller index? (read one source that claimed this) (which means elastic must
be reeeally big :)).
can talk to the db directly
faster straight-up indexing
http://dmurphy747.wordpress.com/2011/04/02/solr-vs-elasticsearch-deathmatch/
(not clear this is a key requirement for us-- expect we'll have lots of
small additions over time rather than infrequent large additions).
more mature search options - facets and filters
tried and true community with more history, more participants
for querying alone, faster for indexing
built-in restful API

questions:
how real-time do we want to be? even if the answer is "very" it seems at any
given moment we're not going to be adding "large" documents (compared to
some sites).
do we want to index uploaded documents?


Jessy
--
http://jessykate.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.p2pu.org/pipermail/p2pu-dev/attachments/20110531/2d860194/attachment.html>


More information about the p2pu-dev mailing list