Lucene and SOLR Get Commercial Support

← Back to Stories (view on slashdot.org)

Lucene and SOLR Get Commercial Support

Posted by ryuzaki0 on Friday January 30, 2009 @11:30AM from the foss-going-mainstream dept.

ruphus13 writes "Two of the technical leads and core committers of the Lucene Project have launched Lucid Imagination, a venture backed company now offering commercial versions of Lucene and SOLR in the hopes of making it the de facto choice of search technologies used by companies within their products. 'The Lucene search library ranks amongst the top 5 Apache projects, installed at over 4,000 global companies. Although OStatic is primarily Drupal-based, our site's search is based on Lucene. According to Lucid Imagination officials, the Solr search server, which transforms the Lucene search library into a ready-to-use search platform for building applications, is the fastest growing Lucene sub-project...Lucid's business model is roughly comparable to Red Hat's very successful model, in that it centers on support and services for free, open source software.'"

47 comments

oookay. by girlintraining · 2009-01-30 11:32 · Score: 4, Insightful

Nice press release but.. what does it do? O_o Five million dollars and they couldn't even buy a one sentence description of their product. Standards are slipping.

--
#fuckbeta #iamslashdot #dicemustdie
1. Re:oookay. by Azar · 2009-01-30 11:40 · Score: 3, Insightful
  
  "...in the hopes of making it the defacto choice of search technologies used by companies within their products. 'The Lucene search library ranks amongst the top 5 Apache projects... According to Lucid Imagination officials, the Solr search server, which transforms the Lucene search library into a ready-to-use search platform for building applications...
  I agree, it could have been more explicit in giving a brief description, but was it really that difficult to glean what it does from the summary?
2. Re:oookay. by FooBarWidget · 2009-01-30 11:44 · Score: 2, Informative
  
  Lucene is a full-text indexer and search library. Solr is a full-text indexer and search server, based on Lucene.
3. Re:oookay. by Chabo · 2009-01-30 11:50 · Score: 1, Insightful
  
  I read the summary twice and it just made my head spin.
  There's a big presumption in the summary that we've heard of Lucene before. I don't even know what they do. Do they search... the web? ...your LAN? ...your desktop?
  
  --
  Convert FLACs to a portable format with FlacSquisher
4. Re:oookay. by Anonymous Coward · 2009-01-30 11:52 · Score: 0
  
  I read the summary twice and it just made my head spin.
  There's a big presumption in the summary that we've heard of Lucene before. I don't even know what they do. Do they search... the web? ...your LAN? ...your desktop?
  ...your mom!
5. Re:oookay. by The+End+Of+Days · 2009-01-30 12:02 · Score: 1
  
  So your reaction to something you don't understand is denigration? I hope you don't expect people to accept you as a girl in training, then. It would be hypocritical.
6. Re:oookay. by morgan_greywolf · 2009-01-30 12:18 · Score: 3, Informative
  
  No. It's a search engine for your website. It's not quite as simple as a SELECT query. Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.. That does quite a bit more than a SELECT query could hope to do.
  
  --
  My blog
7. Re:oookay. by macraig · 2009-01-30 12:26 · Score: 1
  
  That's still illegal, though Bush was working on it behind closed doors.
8. Re:oookay. by Anonymous Coward · 2009-01-30 12:41 · Score: 0
  
  Link is currently DoS'd... but you can't blame the knee jerk posters who treat this like a standard slashvertisment.
  TFS reads like ad copy, not new for nerds.
9. Re:oookay. by morgan_greywolf · 2009-01-30 13:14 · Score: 0, Redundant
  
  Google is your friend. Yes, I can blame the posters. If you don't know what something is, we have technology for that. It's called a freakin' search engine.
  
  --
  My blog
10. Re:oookay. by morgan_greywolf · 2009-01-30 13:27 · Score: 0, Flamebait
  
  use SOLR for indexing a few million documents....SOLR does it in less than 1 second no matter what, and actually scales
  
  Nice. Yeah, I'd definitely say that's worth a 5 million bucks.
  
  --
  My blog
11. Re:oookay. by Anonymous Coward · 2009-01-30 13:55 · Score: 0
  
  That's because the average Slashdot user's technology knowledge ends with how to use Bittorent. Lucene is an indexing engine. A really good, free one. It is a truly MAJOR piece of technology infrastrucure, and one of the real gems of Open Source. Moron.
12. Re:oookay. by sonsonete · 2009-01-30 16:36 · Score: 1
  I don't even know what they do. Do they search... the web? ...your LAN? ...your desktop?
  In short: yes.
  
  Lucene can be set up to search just about anything—the web, a network, your desktop, a database, or anything else you can tell it to read.
  Solr provides a web interface to Lucene.
  Lucid Imagination contributes to the Lucene and Solr projects and provides commercial support for users of the software.
  --
  "Folks bent on reinventing the wheel should understand that if it's not round, it ain't a wheel." - Jonah Goldberg
13. Re:oookay. by FishWithAHammer · 2009-01-30 17:14 · Score: 0, Troll
  
  That, and at the least anyone peripherally involved with non-MS development should probably know what Lucene is. It's that awesome.
  
  --
  "You can either have software quality or you can have pointer arithmetic, but you cannot have both at the same time."
14. Re:oookay. by saibot834 · 2009-01-30 19:40 · Score: 1
  
  One of the most interesting fields where Lucene is useful (probably also for you) is Wikipedia. Remember how painful it was to search something on Wikipedia some months ago?
  Well now, thanks to Lucene, Wikipedia (and its sister projects) don't have to use the in-build MediaWiki search engine (which really is crappy). Probably the best feature Lucene brings is "Did you mean ...". Google is still better, but Lucene was a big step for Wikipedia.
15. Re:oookay. by Simetrical · 2009-02-01 13:39 · Score: 1
  
  Wikipedia has been using Lucene for a few years by now. The recent changes were improvements to how it was used, but it was being used the whole time. Out of the box, MediaWiki uses whatever fulltext search is available from the DBMS being used -- in MySQL's case, that means using MyISAM, which is impossible for a site the size of Wikipedia (all selects, updates, deletes, etc. take out table-level locks).
  
  --
  MediaWiki developer, Total War Center sysadmin
interesting by larry+bagina · 2009-01-30 11:37 · Score: 1

Talk at the water cooler was that Sun was taking an interest in them to expand their open source catalog. All in all, they're probably a lot better off going it alone in the current market. With companies looking to save money by going open source, it's a great time for OS support.

--
Do you even lift?
These aren't the 'roids you're looking for.
1. Re:interesting by Darkness404 · 2009-01-30 13:27 · Score: 1
  
  That and possible large government projects. With Obama wanting to increase government projects and more transparency, along with save money, OSS is a great way to do it and I believe that Sun has already written to Obama about switching to all OSS. So Sun wanting to acquire more OSS vendors certainly makes sense.
  
  --
  Taxation is legalized theft, no more, no less.
possible alternative: xapian by bdqbit · 2009-01-30 11:37 · Score: 2, Interesting

Nice going for Lucene (LGPL?), although i've preferred Xapian (GPL) in the past (with python bindings).

Good to have choice, i guess.
1. Re:possible alternative: xapian by bdqbit · 2009-01-30 11:55 · Score: 2, Informative
  
  Xapian is C++ (with plenty of bindings for a lot of languages -- including python)
2. Re:possible alternative: xapian by Anonymous Coward · 2009-01-30 11:58 · Score: 0
  
  Ah well then:
  Java is balls slow compared to C++.
  And less portable.
3. Re:possible alternative: xapian by bdqbit · 2009-01-30 12:04 · Score: 1
  
  We can agree on that. Xapian (setup correctly) can be blazingly fast with loads of data. The python bindings (basically having access to the xapian API) don't add much weight in my experience.
4. Re:possible alternative: xapian by morgan_greywolf · 2009-01-30 13:21 · Score: 1
  
  They typically wouldn't. Python bindings for C or C++ libraries are usually nothing more than pointers to the correct shared library calls.
  
  --
  My blog
About to move to the Java port of Lucene... by merreborn · 2009-01-30 12:01 · Score: 4, Informative

We're currently using the Zend PHP port of Lucene. It was nice, because we were able to use all our existing code for loading our PHP objects from the database for indexing. It worked fine, as long as are indexes stayed small.
Now we have several indexes weighing in at around 300+ megabytes, and Zend Lucene has proven to be absolute crap. It takes seconds of CPU time, and hundreds of megs of ram to process simple queries against these indexes. When tested in Luke, the same queries against the same indexes finish in milliseconds with minimal memory usage. Either the Zend port, or PHP itself is clearly unsuitable for production use on large indexes.
Either way, we're going to switch it out for Solr ASAP, and we anticipate the development overhead should be minimal -- we'll keep using the same code to load our objects, and pass them to Solr via JSON.
1. Re:About to move to the Java port of Lucene... by Sentry21 · 2009-01-30 12:27 · Score: 1
  
  Either the Zend port, or PHP itself is clearly unsuitable for production use on large indexes.
  You phrase this in such a way as to imply an exclusion, when really both are often true. We've ported our PHP application to Rails (which provides a different, but workable, set of problems), and we've rid ourselves of the Zend engine in return for Ferret; I'm a proponent of replacing that with SOLR, but we've yet to go down that path.
2. Re:About to move to the Java port of Lucene... by WoLpH · 2009-01-30 12:32 · Score: 3, Insightful
  
  That's because the Zend Lucene library is written in pure PHP, ergo... _really_ slow. Either use a C module or get SOLR to get it fast. In my simple tests the Python lucene libraries were about 100-500 times faster than the Zend PHP version, it's really one of the worst Lucene libraries around (in terms of speed).
3. Re:About to move to the Java port of Lucene... by Anonymous Coward · 2009-01-30 13:06 · Score: 1, Informative
  
  I found the original Java libraries to be plenty fast as well. We index millions of records, and it's always been plenty fast returning even the most complex queries. Granted, it probably isn't as fast as the C library, but it is the most updated and feature rich. And, many of those later features that the C library lacks makes it COMPLETELY worth it.
4. Re:About to move to the Java port of Lucene... by ionix5891 · 2009-01-30 13:20 · Score: 1
  
  Yes the Lucene php version is very very slow (very)
  I recently switched to sphinx (http://www.sphinxsearch.com/) its written in C and compiles nicely on my linux servers, indexes documents at crazy speeds and theres piles of options
  I highly recommend above (use it on 200,000 queries a day vertical search engine for one of our sites)
5. Re:About to move to the Java port of Lucene... by tcopeland · 2009-01-30 13:41 · Score: 1
  
  > I recently switched to sphinx (http://www.sphinxsearch.com/) its written in C
  Minor nit - it's in C++. But yeah, it's totally awesome - fast when indexing, easy to scale horizontally, powerful query language, custom stop word lists, etc, etc. The APIs (I use the Ruby one, Riddle) make it easy to do nifty excerpt formatting (for example, note the highlighting around the word 'battle'), and there are a couple of different ways to integrate it into a Ruby on Rails app.
  Speaking of Sphinx and Rails, here's a code snippet for escaping extended mode Sphinx queries. This will probably make its way into Riddle at some point, but, until then, there it is.
  
  --
  The Army reading list
6. Re:About to move to the Java port of Lucene... by Ythan · 2009-01-30 14:51 · Score: 1
  
  As a satisfied user I just wanted to give another shoutout to Sphinx. It really is fantastic, better than Lucene if you want something lightweight and easy to configure, and the speed and relevance of search results are excellent. Commercial support is available and it's being used on Craigslist and The Pirate Bay among other notable sites. Anyone who's struggling with MySQL's anemic fulltext search would do well to give it a look.
7. Re:About to move to the Java port of Lucene... by Moebius+Loop · 2009-01-30 18:20 · Score: 1
  
  What I ended up doing for various webapps (PHP and Python, although Python's port of Lucene actually loads the Java runtime, and is fairly fast) is create a simple local server that a PHP script can communicate with over sockets and a trivial protocol.
  This is fairly straightforward for me since most of the time I just want Lucene to return a list of document IDs. I use those IDs to create a temp table that I can do additional queries against in SQL.
  Running it as a separate server allows me to use the original Java codebase (which make updating the library easy), and also avoid the overhead of loading/instantiating Lucene from PHP on every request.
  
  --
  have you been seen on slash?
full-text search by CarpetShark · 2009-01-30 12:18 · Score: 4, Informative

Nice press release but.. what does it do?
You mentioned SQL SELECTs elsewhere. Full-text search isn't like a SELECT. It's more like what what happens when you google something: many documents are searched in a split second, and complex queries can be done, like documents containing a phrase, but not this one, or documents that mention X with Y within a few sentences of that, or documents that mention X and Y, but not Z. Yes, SQL lets you do that, but not for text, except in very inefficient ways.
From what I've seen of it (which is very little), Lucene lets you, as a programmer, index data using your own field names. So, say you're indexing word documents and HTML documents. You can extract most of the text and index it as "maincontent", but seperately extract the author, title and subtitle, indexing those individually. This lets you query attributes, like: "space nasa and not genre:sci-fi". Full text search also does ranking based on the occurences of different words you query by, etc. Presumably Lucene would let you specify which fields/attributes are included in a search, and which ones have the highest scores in search results, for instance.
Yeah, I don't get where $5m USD went on that either. I didn't think it was THAT big a problem. But maybe it is. Personally, I'm holding out for a decent Triple API, which hopefully make all but the indexer of this obsolete.
1. Re:full-text search by FishWithAHammer · 2009-01-30 17:20 · Score: 1
  
  Yeah, I don't get where $5m USD went on that either. I didn't think it was THAT big a problem.
  Getting it right, and doing it as well as Lucene does (which is spectacularly well), really is THAT big a problem.
  
  --
  "You can either have software quality or you can have pointer arithmetic, but you cannot have both at the same time."
2. Re:full-text search by Wokan · 2009-01-31 05:16 · Score: 2, Informative
  
  Nice press release but.. what does it do?
  From what I've seen of it (which is very little), Lucene lets you, as a programmer, index data using your own field names. So, say you're indexing word documents and HTML documents. You can extract most of the text and index it as "maincontent", but seperately extract the author, title and subtitle, indexing those individually. This lets you query attributes, like: "space nasa and not genre:sci-fi". Full text search also does ranking based on the occurences of different words you query by, etc. Presumably Lucene would let you specify which fields/attributes are included in a search, and which ones have the highest scores in search results, for instance.
  You've certainly hit close to the mark. I work on a site that uses Solr and it does work just as incredibly as others have said. You can tell it what fields you want to search. You can tell it what order you want results sorted in (and you can sort on more than one column in cases of relevancy ties). You can tell it you want matches in one column weighted more than another. You can tell it you want the terms to be within X words of each other. And you can tell it what words should not be in the results.
  And then there's the other results it can offer. Faceted search is fantastic. If you have products split by department, you can facet by department and your search for widgets can then return not only the results, but a list of the departments the current results were found in with a result count for each. (Very common feature on ecommerce sites, especially those using Endeca.)
  They also have more-like-this results you can use as well as match highlighting. I haven't had the opportunity to try the spelling correction parts yet.
  And the indexes can be incredibly small. After indexing over 1 million pages of information, the index data folders were under 500MB. The Lucene indexer can literally hold our entire search set in RAM while it's running.
Re:first!!! by macraig · 2009-01-30 12:31 · Score: 0, Offtopic

Ummm, nope. There Can Be Only One.
Both Spring and Hibernate have Lucene modules by kbrasee · 2009-01-30 13:01 · Score: 1

I've heard great things about Lucene (guy at the company I used to work for swears by it, he used it for anything from searching B2B stores to biological indexing). Both Hibernate and Spring have support for this library.

I'm looking into adding search on my site so I should probably check it out. There's a new "In Action" book out for using the Hibernate Lucene add-on -- I might have to pick that up.
Comment removed by account_deleted · 2009-01-30 13:02 · Score: 5, Informative

Comment removed based on user account deletion
SOLR has several advantages by Krischi · 2009-01-30 18:20 · Score: 3, Informative

I agree, Xapian is nice, and we considered it for a while. However, in the end, the decision was made to use SOLR because of one overriding factor in its favor: it takes care of all the nasty details to enable concurrent access, which makes developing web applications just so much easier. With SOLR you just don't have to worry about who might currently be reading or writing to the index, and the index replication features are very powerful, too.
That, and facet searches are very nice, too (e.g., searching for a keyword and then automatically displaying the # of hits per category, and refining per category).
SOLR has Python bindings, too, by the way. They currently are not in the official repository, but recently maintenance on them has picked up, and they work in a very Pythonic way.
Xapian vs Lucene by Anonymous Coward · 2009-01-31 00:32 · Score: 0

What's the feature set vs Xapian?
http://www.xapian.org/
Comment removed by account_deleted · 2009-01-31 01:49 · Score: 2, Insightful

Comment removed based on user account deletion
Re:Based on open source? #5? by Anonymous Coward · 2009-01-31 01:53 · Score: 1, Informative

Google shows over 3 million hits for a search on Lucene, and the first 100 summaries all seem relevant.