Lucene and SOLR Get Commercial Support
ruphus13 writes "Two of the technical leads and core committers of the Lucene Project have launched Lucid Imagination, a venture backed company now offering commercial versions of Lucene and SOLR in the hopes of making it the de facto choice of search technologies used by companies within their products. 'The Lucene search library ranks amongst the top 5 Apache projects, installed at over 4,000 global companies. Although OStatic is primarily Drupal-based, our site's search is based on Lucene. According to Lucid Imagination officials, the Solr search server, which transforms the Lucene search library into a ready-to-use search platform for building applications, is the fastest growing Lucene sub-project...Lucid's business model is roughly comparable to Red Hat's very successful model, in that it centers on support and services for free, open source software.'"
Nice press release but.. what does it do? O_o Five million dollars and they couldn't even buy a one sentence description of their product. Standards are slipping.
#fuckbeta #iamslashdot #dicemustdie
Talk at the water cooler was that Sun was taking an interest in them to expand their open source catalog. All in all, they're probably a lot better off going it alone in the current market. With companies looking to save money by going open source, it's a great time for OS support.
Do you even lift?
These aren't the 'roids you're looking for.
Nice going for Lucene (LGPL?), although i've preferred Xapian (GPL) in the past (with python bindings).
Good to have choice, i guess.
We're currently using the Zend PHP port of Lucene. It was nice, because we were able to use all our existing code for loading our PHP objects from the database for indexing. It worked fine, as long as are indexes stayed small.
Now we have several indexes weighing in at around 300+ megabytes, and Zend Lucene has proven to be absolute crap. It takes seconds of CPU time, and hundreds of megs of ram to process simple queries against these indexes. When tested in Luke, the same queries against the same indexes finish in milliseconds with minimal memory usage. Either the Zend port, or PHP itself is clearly unsuitable for production use on large indexes.
Either way, we're going to switch it out for Solr ASAP, and we anticipate the development overhead should be minimal -- we'll keep using the same code to load our objects, and pass them to Solr via JSON.
You mentioned SQL SELECTs elsewhere. Full-text search isn't like a SELECT. It's more like what what happens when you google something: many documents are searched in a split second, and complex queries can be done, like documents containing a phrase, but not this one, or documents that mention X with Y within a few sentences of that, or documents that mention X and Y, but not Z. Yes, SQL lets you do that, but not for text, except in very inefficient ways.
From what I've seen of it (which is very little), Lucene lets you, as a programmer, index data using your own field names. So, say you're indexing word documents and HTML documents. You can extract most of the text and index it as "maincontent", but seperately extract the author, title and subtitle, indexing those individually. This lets you query attributes, like: "space nasa and not genre:sci-fi". Full text search also does ranking based on the occurences of different words you query by, etc. Presumably Lucene would let you specify which fields/attributes are included in a search, and which ones have the highest scores in search results, for instance.
Yeah, I don't get where $5m USD went on that either. I didn't think it was THAT big a problem. But maybe it is. Personally, I'm holding out for a decent Triple API, which hopefully make all but the indexer of this obsolete.
Ummm, nope. There Can Be Only One.
I've heard great things about Lucene (guy at the company I used to work for swears by it, he used it for anything from searching B2B stores to biological indexing). Both Hibernate and Spring have support for this library.
I'm looking into adding search on my site so I should probably check it out. There's a new "In Action" book out for using the Hibernate Lucene add-on -- I might have to pick that up.
Comment removed based on user account deletion
I agree, Xapian is nice, and we considered it for a while. However, in the end, the decision was made to use SOLR because of one overriding factor in its favor: it takes care of all the nasty details to enable concurrent access, which makes developing web applications just so much easier. With SOLR you just don't have to worry about who might currently be reading or writing to the index, and the index replication features are very powerful, too.
That, and facet searches are very nice, too (e.g., searching for a keyword and then automatically displaying the # of hits per category, and refining per category).
SOLR has Python bindings, too, by the way. They currently are not in the official repository, but recently maintenance on them has picked up, and they work in a very Pythonic way.
What's the feature set vs Xapian?
http://www.xapian.org/
Comment removed based on user account deletion
Google shows over 3 million hits for a search on Lucene, and the first 100 summaries all seem relevant.