Open Source Search Engine Benchmarks

first post by Anonymous Coward · 2009-07-06 01:19 · Score: -1

yeah. this is boring.

k by selven · 2009-07-06 01:20 · Score: -1

Nothing else to say, really

Re:k by eldavojohn · 2009-07-06 01:28 · Score: 5, Insightful

Nothing else to say, really
Really? Am I the only person that found it interesting that Lucene, the only non C/C++ implementation, gave some pretty impressive stats? I mean, it's written in Java and although it has a slower index time its search time, index size and relevancy are impressive.

I may have to poke around in the Lucene code after work tonight to figure out what kind of strange majick those Apache developers employ. Hopefully I'll walk away with some extra spells in my bag.

--
My work here is dung.
Re:k by Anonymous Coward · 2009-07-06 01:33 · Score: -1, Offtopic

Yeah, you're the only person. *yawn*
Re:k by Jarlsberg · 2009-07-06 01:34 · Score: 1

It was a foregone conclusion that lucene would trounce the others, if you ask me. And comparing sqlite vs lucene is slightly absurd, since most people with a clue already uses lucene on top of sqlite (and mysql as well) to get good search results.
Re:k by julesh · 2009-07-06 01:44 · Score: 4, Insightful

Really? Am I the only person that found it interesting that Lucene, the only non C/C++ implementation, gave some pretty impressive stats?
Is it really that big a surprise? Given that some of the largest, most information-heavy sites on the Internet (e.g. Wikipedia) use it for their internal search?
Re:k by forkazoo · 2009-07-06 01:48 · Score: 2, Insightful

Really? Am I the only person that found it interesting that Lucene, the only non C/C++ implementation, gave some pretty impressive stats? I mean, it's written in Java and although it has a slower index time its search time, index size and relevancy are impressive.
Meh, look at any /. article about Java and you'll see somebody complain about the speed of Java, and a reply explaining that Java isn't particularly slow. It has some weaknesses that mean it isn't as optimal as really good C, but it also has some capacity for dynamic optimisation which can make it faster than poorly optimised C. Regardless in a DB type application, a lot of your time will be spent in vendor supplied code. Whether that is disk access supplied by the OS or some functions available as part of the language standard library. A lot of actually runs this type of app isn't particularly guaranteed to be written in the same language as the app.
Also, most of the Java code you run across in real life is crap. That's not a dig at the language itself. IMO, it's the volume of poor coders that give Java a reputation for slowness more than anything else. You probably won't find any secret double ninja techniques in Lucene as much as you will just find relatively few embarrassing fuckups.
Re:k by Lord+Grey · 2009-07-06 01:51 · Score: 5, Informative

Really? Am I the only person that found it interesting that Lucene, the only non C/C++ implementation, gave some pretty impressive stats? I mean, it's written in Java and although it has a slower index time its search time, index size and relevancy are impressive.
Lucene is a great search tool. As TFA pointed out, however, if you're looking for a "search solution" rather than "search engine" then you should check out Solr instead. Lucene is a toolkit that you build on top of, not something you really want to deploy by itself. Solr is that thing built on top of Lucene.

Be aware that while Lucene/Solr has made terrific progress, it is not quite in the "enterprise search" category. For superscale implementations you'll still likely need to look at a high-priced product like FAST.

--
// Beyond Here Lie Dragons
Re:k by kestasjk · 2009-07-06 01:58 · Score: 1

Really? Am I the only person that found it interesting that Lucene, the only non C/C++ implementation, gave some pretty impressive stats? I mean, it's written in Java and although it has a slower index time its search time, index size and relevancy are impressive.
Of course you are, fool! Everyone else on slashdot knows exactly how Lucene and sqlite's indexing systems work. I don't know why they bothered to take the benchmarks at all, anyone with half a clue has integrated a Java engine running Lucene into sqlite and hooked it into MyISAM already..

--
// MD_Update(&m,buf,j);
Re:k by Daengbo · 2009-07-06 02:32 · Score: 1

I mean, it's written in Java and although it has a slower index time its search time, index size and relevancy are impressive.
In the "benchmark," it wasn't just impressive in those areas: it had the lowest search time, the smallest index, and the highest relevance. That makes top honors, in my book.

--
Put identity in the browser.
Re:k by Anonymous Coward · 2009-07-06 02:46 · Score: 0

Yeah and look at the memory stats. It uses nearly twice the memory of the next one down and more than 6 times the memory of the best*. I don't imagine it gets better over a long period of time either. I see that time and again, long running Java processes are no good.
* With that said, SQLite needs a lot of tweaking and I can tell from the memory usage that they didn't tweak it much if at all. That pretty much invalidates SQLite's results in these tests.
Re:k by Scott+Kevill · 2009-07-06 02:53 · Score: 1

Far more likely to be because of the choice of algorithms and the resources behind the project. Would be interesting to see how CLucene performs.

--
GameRanger - multiplayer gaming service for PC and Mac games
Re:k by nyctopterus · 2009-07-06 03:02 · Score: 4, Insightful

But Wikipedia's internal search is the suckiest thing that ever sucked! Seriously, does anyone use it, instead of just sticking "wikipedia" into their Google search?
Re:k by tealwarrior · 2009-07-06 03:37 · Score: 4, Informative

Solr/Lucene power a number of sites that would be in the enterprise search category (Apple, Netflix, C-Net). Where I work, we index 5 million docs in Solr/Lucne and serve out millions of search requests a day. It's not google scale, but most people don't need that. The markets where one needs a FAST are dwindling quickly.

--
In theory, there is no difference between theory and practice, in practice there is.
Re:k by Anonymous Coward · 2009-07-06 04:34 · Score: 1, Interesting

Solr/Lucene power a number of sites that would be in the enterprise search category (Apple, Netflix, C-Net). Where I work, we index 5 million docs in Solr/Lucne and serve out millions of search requests a day. It's not google scale, but most people don't need that. The markets where one needs a FAST are dwindling quickly.
I work in a shop that uses fast, despite pressure from some to move to solr. As I understand it, solr can't keep up with the volume of changes we need to make to our data. I'm talking millions of documents of a 100+ fields changed, per day, with any given change visible to the customer within a short timeframe (10 minutes). solr can index that much data easily, but it can't keep with that kind of volume. That's what I've been told anyway.
Re:k by johannesg · 2009-07-06 05:03 · Score: 1

Nothing else to say, really
Really? Am I the only person that found it interesting that Lucene, the only non C/C++ implementation, gave some pretty impressive stats? I mean, it's written in Java and although it has a slower index time its search time, index size and relevancy are impressive.

Yes, that's pretty much you yes. Different algorithms, therefore different performance. Reimplement Lucene in C++, then see what the differences are in terms of speed (and if you care, code size, complexity, etc.). Until then the comparison is totally meaningless.
And gee, what's with the defensive attitude...
Re:k by tealwarrior · 2009-07-06 07:02 · Score: 2, Informative

Solr/Lucene real-time search (or near real-time) is one of its weaker points. I think it could keep up with the updates but making them appear in the index immediately and having the caching still perform can be tricky.

We have one index with that's updated every 20 minutes, but only has about 50k documents and a combination of Solr cache auto-warming and squid's stale-while-re-validate logic works there.

In another system where updates need to be faster, we had to do some custom work to make it perform where there is an in memory index for recent changes, an on-disk index of previous changes, and process for moving from one to another. Hopefully these improvements will make their way back to Lucene in the future.

--
In theory, there is no difference between theory and practice, in practice there is.
Re:k by BikeHelmet · 2009-07-06 08:05 · Score: 1

It's no surprise to me. Java has long since been the best technology for all things internet. Streaming servers, forum software, indexing/archiving, Web2.0 sites; it's only several dozen times faster than Ruby or PHP, with similar memory usage. And I'm not talking applets here - I mean the backend. Tomcat is even significantly faster than mod_php or fastCGI with their C backends.
Keep in mind that anything Java based has VM overhead. If they included that in the Lucene graphs, then it performed the best while using about as much memory as sqlite. If they didn't, then it's a bit RAM hungry(add another 30MB), but still performs the best.
I've always been a big advocate of using easy languages for complex software. When I was first learning programming, I opted to create Tetris in Javascript. It took me a few days - about 12 hours - but I did it from scratch, without help! Now I could probably do the same task in Java in 2 hours, but working in an "easy" language certainly does help when the code is almost above your head. It helps you keep a larger part of the project in focus, instead of having to focus on the actual code.
And then there's the gains from when you make a mistake. I'm sure some of you will claim to be perfect - but in C/C++, if you mess up and introduce memory leaks, you have to waste time tracking them down, rather than spending that time optimizing, thinking up new algorithms, etc., easier languages are so much better for the average programmer, which may think up an impressive algorithm from time to time, but struggle with implementing it in a low level language.
Re:k by JorDan+Clock · 2009-07-06 08:49 · Score: 2, Informative

Kind of like... CLucene?
Re:k by johannesg · 2009-07-06 09:31 · Score: 4, Informative

Ah, thank you. So indeed, an implementation of the same algorithm turns out to be _three times_ as fast in C++ than it is in Java (see here).
I wonder if eldavojohn wishes to comment on that?
Re:k by Hurricane78 · 2009-07-12 09:22 · Score: 1

Sticking "wiki" into it usually suffices. :)

--
Any sufficiently advanced intelligence is indistinguishable from stupidity.

Medical Journals? by Anonymous Coward · 2009-07-06 01:38 · Score: -1, Troll

Are those anything like Livejournals?

Re:Medical Journals? by fuzzyfuzzyfungus · 2009-07-06 02:21 · Score: 1

No, they are what keep Livejournals from becoming Deadjournals.

Hear the heads exploding - Java is fastest by MosesJones · 2009-07-06 01:43 · Score: 4, Informative

Okay so the fastest engine is using Lucerne, a Java search engine, and this is neither tuned nor horizontally scaled (which it can do very well).

C++ and C both fail to deliver the same level of performance as the Java virtual machine.

Oh wait hang on... does this mean that for complex applications the most important performance piece is normally actually the efficiency of the code rather than the efficiency of the base platform and therefore having a language in which it is easier to write efficient code is better than just having the one that is fastest to execute a for loop?

But hell this is Slashdot and Java is Slooooooow...

--
An Eye for an Eye will make the whole world blind - Gandhi

Re:Hear the heads exploding - Java is fastest by zappepcs · 2009-07-06 01:50 · Score: 1

You beat me to the comment. I'm sort of surprised that the reaction so far has been the sound of crickets and loud yawning... meh

--
Support NYCountryLawyer RIAA vs People
Re:Hear the heads exploding - Java is fastest by TheSunborn · 2009-07-06 01:54 · Score: 1, Interesting

It may be a bit faster on searching, but it take ~5 times as long to generate the index, and use twice as much memmory when searching so it may just be a different trade off between index time and search time.
And it's a bad search test, because the total search time is less them 2 seconds, thus not including the cost of the gc for java.
hint to people doing benchmark: When benchmarking a component which use gc or similary memmory handling methods, remember to have the test dataset be large enough that you cause enough gc cycles to make the performance of any single cycle noise.
And to be fair to the gc language, set minimum memmory=maximum memmory, so it will use as much memmory as you allow and don't waste time allocating more memmory.
Gc is more effective, the more memory you allow it to use, because the runtime cost of gc mostly depend on the number of live objects, not the number of allocated objects.
Re:Hear the heads exploding - Java is fastest by Anonymous Coward · 2009-07-06 01:57 · Score: 0

But...but...if I did it, it would be fast! The guys making the non-Java implementations must not know what they're doing!

Yeah...sure...
Re:Hear the heads exploding - Java is fastest by bluefoxlucid · 2009-07-06 01:58 · Score: 1

Finding it easier to code well in Java than C is like finding it easier to drive Automatic than Manual. I stopped driving automatic, it stopped almost getting me into accidents.

--
Support my political activism on Patreon.
Re:Hear the heads exploding - Java is fastest by Anonymous Coward · 2009-07-06 01:59 · Score: 0

C++ and C both fail to deliver the same level of performance as the Java virtual machine.
The first requirement to successfully mock the uninformed wisdom of crowds, is to be informed yourself.
HTH
Re:Hear the heads exploding - Java is fastest by Atzanteol · 2009-07-06 02:00 · Score: 1

Driving an automatic almost got you into accidents? You must *suck* at driving dude.

--
"Ignorance more frequently begets confidence than does knowledge"

- Charles Darwin
Re:Hear the heads exploding - Java is fastest by cpghost · 2009-07-06 02:01 · Score: 1

Granted, bubble sort is slower in C/C++ than Quicksort in Java. Then again, we do have qsort(3) in C and std::sort() in C++/STL, and slow C++ code is usually the result of developer newbies misunderstanding the copy semantics of parameter passing.

--
cpghost at Cordula's Web.
Re:Hear the heads exploding - Java is fastest by Roy+van+Rijn · 2009-07-06 02:27 · Score: 3, Informative

Hrm, this had absolutely nothing to do with the language. It has almost everything to do with the algorithms.
Its very hard to compare languages, maybe if you use the languages to implement the exact same algorithm and let it run for a long while... But that still doesn't really compare it well enough.
Like somebody already said: Bubble sort in C++ is (almost) always slower then a quicksort in Java.

--
My blog: http://www.redcode.nl
Re:Hear the heads exploding - Java is fastest by Anonymous Coward · 2009-07-06 02:27 · Score: 1, Insightful

I stopped driving automatic, it stopped almost getting me into accidents.
You're a fucking idiot. Get off my road.
Re:Hear the heads exploding - Java is fastest by bluefoxlucid · 2009-07-06 02:49 · Score: 1

I can't seem to move into a lane of faster traffic in heavy traffic situations on the highway without being able to immediately accelerate. I can't seem to shift into a lower gear without using the knockdown mechanism, which requires me to depress the accelerator the whole way and wait a second for everything to engage. In a manual, I can downshift to fourth or third and control my speed, enter an opening, and accelerate quickly without fear that backing off the accelerator a little (you try keeping control under full throttle) will result in me being thrown instantly into high gear.
It's the same as what happens when Boehm-gc kicks in during a real time event and causes sound playback to skip. It usually doesn't, but sometimes it does. More importantly, it's like when Boehm-gc encounters an implementation bug or limitation (like a a message passing mechanism that uses relative pointers to pass information between threads) and frees up memory in use.
Too many programmers use Java's garbage collector and exception handling as crutches. There are brand new programmers that only know Java, C#, and PHP/Perl/whatnot; when you explain to them how C++ and C allocate/free memory, they look confused, and start talking about how it's humanly impossible to keep the entire running state of your program in your head, or some other such garbage. Then they go and use try/catch with an empty catch block just so their broken Java program continues to hobble along through errors. I'm more comfortable with C, and some programmers are more comfortable in Java or C# but can code in C and/or C++ fine.

--
Support my political activism on Patreon.
Re:Hear the heads exploding - Java is fastest by ThePhilips · 2009-07-06 03:36 · Score: 1

C++ and C both fail to deliver the same level of performance as the Java virtual machine.
Oh wait hang on...
As was pointed above, the search engines spend >90% of their time in DB/file I/O code.
In other words, implementation language plays little role - it is I/O optimization algorithms which play bigger role.
From my experience with number of C/C++ projects, efficiency of the languages/compilers allows developers to remain ignorant. In Java that approach simply doesn't work. Thus I more often see more better algorithms often in less efficient languages.
Like I recently found in one program people used a bubble sort - as if copied verbatim from "C Programming for Dummies in 21 day." And it worked without causing any problem for more than a decade - only after a rare occasion when dimension went above 1000, it took longer than 1 second to finish. I bet Java would have immediately choked on the code.

--
All hope abandon ye who enter here.
Re:Hear the heads exploding - Java is fastest by Wovel · 2009-07-06 04:39 · Score: 0, Flamebait

Java is slow. If you took the same algorithm and coded it in an efficient compiled language, it would be faster. Much faster.
Re:Hear the heads exploding - Java is fastest by BlueKitties · 2009-07-06 05:30 · Score: 1

Java is fine for plenty of applications, but there are certain situations where it simply doesn't cut it. Heavy GUI oriented applications tend to take a massive performance hit because all of the objects are dynamically generated at run time -- just load up Eclipse and see how long it takes to start. Scientific and Mathematical applications, as well, rely on high-speed languages like C/FORTRAN. That doesn't mean Java is so slow it's useless -- in many cases the aided clarity and simplicity is worth it.

There are times we use _ASM_, there are times we use C, there are times we use Java. And, like it or not, we often fall back on C/C++ for speed. Generally, the only people who bash one language or the other are fanboys. Languages are tools to be used to our advantage, each has its own strengths and weaknesses -- sometimes we use a hammer, sometimes not; as a programmer, we must know the tools at our disposal and deploy them accordingly. Just because a pipe wrench can substitute a hammer doesn't mean it should.

--
"Sorrow is better than laughter, for by sadness of face the heart is made glad." [Ecclesiastes 7:3]
Re:Hear the heads exploding - Java is fastest by Lisandro · 2009-07-06 05:51 · Score: 1

Oh wait hang on... does this mean that for complex applications the most important performance piece is normally actually the efficiency of the code rather than the efficiency of the base platform...
Yes. ...and therefore having a language in which it is easier to write efficient code is better than just having the one that is fastest to execute a for loop?
No.
Re:Hear the heads exploding - Java is fastest by BrokenHalo · 2009-07-06 07:04 · Score: 1

I'm sort of surprised that the reaction so far has been the sound of crickets and loud yawning... meh

Well, the OP certainly got a loud yawn from me for the remark about indexing twitter posts. They might just as well index cockroach farts.
Re:Hear the heads exploding - Java is fastest by johannesg · 2009-07-06 09:36 · Score: 2, Informative

Okay so the fastest engine is using Lucerne, a Java search engine, and this is neither tuned nor horizontally scaled (which it can do very well).
C++ and C both fail to deliver the same level of performance as the Java virtual machine.
Oh wait hang on... does this mean that for complex applications the most important performance piece is normally actually the efficiency of the code rather than the efficiency of the base platform and therefore having a language in which it is easier to write efficient code is better than just having the one that is fastest to execute a for loop?
But hell this is Slashdot and Java is Slooooooow...
Actually if you check here, you will find that an implementation of the exact same Lucene done in C++ is about three times faster than Java.
Sorry for spoiling your moment there...
Re:Hear the heads exploding - Java is fastest by xouumalperxe · 2009-07-06 23:13 · Score: 1

hint to people doing benchmark: When benchmarking a component which use gc or similary memmory handling methods, remember to have the test dataset be large enough that you cause enough gc cycles to make the performance of any single cycle noise.
I have an even better idea. Why don't we just model the benchmark on the real world usage scenarios, and let those decide whether garbage collection and allocation even matter?

Not really surprising - Disk I/O is the slowdown by Anonymous Coward · 2009-07-06 01:53 · Score: 0

This isn't really surprising to me. Disk I/O is the slowdown for almost all programs, so efficient disk access is more important than the application code, no matter how it is written. OTOH, a well designed system that minimizes wasteful I/O will do very well - even if it is written in, cough, java.

Way to go Apache guys!

BTW, I use Lucene on our document management system. It works well enough, but definitely eats more RAM than I'd like. Did anyone look at the RAM trade-off?

I've been very happy with Sphinx.... by tcopeland · 2009-07-06 02:00 · Score: 1

...have used it on several projects and always gotten good results. Setting it up is easy and the Ruby API is solid, although I needed a tiny bit of additional code for special character escaping. Highly recommended!

--
The Army reading list

SQLLite is a search engine?!??! by brunes69 · 2009-07-06 02:01 · Score: 2, Insightful

Oh wait - seems TFA is saying a lot of sites just use an SQL DB and use like '%FOO%' as a "search engine....

Ok, this is reasonable, however, I don't see why anyone would choose sqllite as a benchmark. If you are trying to compare search engines, and consider an RDBMS to be a 'search engine' category, then you at least need to include 4 or 5 of the most popular open source RDBMSs in the benchmark (SQL lite, POstgreSQL, MySQL, Derby, Firebird), not just one.

THIS FP FoR- GNAA by Anonymous Coward · 2009-07-06 02:03 · Score: -1, Offtopic

driven out by the ?A super-organised am protesting

OS Search measured by OS Benchmark by Anonymous Coward · 2009-07-06 02:07 · Score: 0

The open source search engines are being measured by an open source benchmark. Must be a conspiracy. I want to see propriety benchmarks measuring these. I'm sure M$'s Bing would be the best.

CLucene by drac667 · 2009-07-06 02:12 · Score: 5, Insightful

All the other search engines except lucene are written in C/C++. Why didn't Vik Singh test also CLucene (http://sourceforge.net/projects/clucene/)?

Here is the CLucene's description on SourceForce: "CLucene is a C++ port of Lucene: the high-performance, full-featured text search engine written in Java. CLucene is faster than lucene as it is written in C++."

Re:CLucene by samkass · 2009-07-06 02:53 · Score: 2, Insightful

CLucene is faster than lucene as it is written in C++.
XXX is better than YYY as it is written in [my favorite language].
Haven't we explored this one to death already? Java isn't slow, and there's nothing magic about C/C++. Badly written C/C++ gets trounced by Java any day, and algorithmic efficiency trounces both of those when it comes to complex functions like indexed searches.

--
E pluribus unum
Re:CLucene by caramelcarrot · 2009-07-06 03:21 · Score: 2, Insightful

But if it's a direct port of Lucene presumably it's using the same algorithms and has similar code quality - hence it provides a good direct comparison of the language speeds and such a comment is legit.
Re:CLucene by ThePhilips · 2009-07-06 03:51 · Score: 1

Haven't we explored this one to death already? Java isn't slow, and there's nothing magic about C/C++. Badly written C/C++ gets trounced by Java any day, and algorithmic efficiency trounces both of those when it comes to complex functions like indexed searches.

Actually on synthetic benchmarks C/C++ implementation might outperform the Java implementation. Some benchmarks are crafted to essentially test memory bandwidth, where C/C++ easily wins.
And still, well written C/C++ code scales magnitudes better than Java code. Resource management is a bitch. I have seen that to win a number of deals.

--
All hope abandon ye who enter here.
Re:CLucene by Scott+Kevill · 2009-07-06 14:29 · Score: 1

CLucene is faster, and uses less memory, from what is basically a direct port. The README includes some benchmarks:

There are 250 HTML files under $JAVA_HOME/docs/api/java/util for about
6108kb of HTML text.
org.apache.lucene.demo.IndexFiles with java and gcj:
on mac os x 10.3.1 (panther) powerbook g4 1ghz 1gb:
. running with java 1.4.1_01-99 : 20379 ms
. running with gcj 3.3.2 -O2 : 17842 ms
. running clucene 0.8.9's demo : 9930 ms
I recently did some more tests and came up with these rough tests:
663mb (797 files) of Guttenberg texts
on a Pentium 4 running Windows XP with 1 GB of RAM. Indexing max 100,000 fields
Ã Jlucene: 646453ms. peak mem usage ~72mb, avg ~14mb ram
Ã Clucene: 232141. peak mem usage ~60, avg ~4mb ram
Searching indexing using 10,000 single word queries
Ã Jlucene: ~60078ms and used ~13mb ram
Ã Clucene: ~48359ms and used ~4.2mb ram

--
GameRanger - multiplayer gaming service for PC and Mac games
Re:CLucene by jawahar · 2009-07-06 22:18 · Score: 1

Wish Vik Singh tested Open Source Implementation of PageRank

--
Slashdot = Sarcasm

why index something as useless as twitter? by Anonymous Coward · 2009-07-06 02:14 · Score: 0

why index something as useless as twitter?

How do these compare to Oracle? by introspekt.i · 2009-07-06 02:30 · Score: 1

Does anybody know? That'd be a great comparison.

PLEASE by Parker+Lewis · 2009-07-06 02:32 · Score: 1

Please, can we avoid the "java vs C/CC++" thread again?

i can speak from experience by nimbius · 2009-07-06 02:45 · Score: 1

the lucene based nutch has been a big help to our group. we currently index 60 sites across the company, dive through PDF files and even shockwave flash and powerpoint with ease. the search results are extremely fast and the results are so accurate theyve blown our corporate engine completely out of the water.

--
Good people go to bed earlier.

Re:SQLLite is a search engine?!? by Anonymous Coward · 2009-07-06 03:33 · Score: 0

SQLLite, Oracle, MySQL, PostgreSQL, etc. all have full-text indexing engines as part of the RDBMS, or as add-on packages. From TFA "I had some text issues with sqlite (also needs to be recompiled with FTS3 enabled) ...". According to this, it uses Full Text Search 3 (FTS3) as its text indexing engine. They all parse the CHAR(N) or CLOB(N) columns into tokens (words), and index those.

The standard SQL predicate "...WHERE columnN LIKE '%FOO%' " cannot be indexed by any RDBMS. That is a non-indexable CHAR(N) or CLOB(N) searchinside a string. Only, left-anchored LIKE queries can use the index, "...WHERE columnN LIKE 'Foo%' ".

Java is not slow (anymore) by allcoolnameswheretak · 2009-07-06 04:09 · Score: 1

Java can't seem to get past it's reputation for being slow - which quite simply is no longer true. Java can match and even exceed the speed of C/C++ implementations. This often seems like an impossible, even outrageous claim to many C/C++ developers. What they fail to see is, that Javas Hotspot compiler compiles critical code sections at runtime on the client computer. This has the advantage over C/C++ programs that the compiler has detailed info about the system it's running on and therefore can perform specific optimizations that a C/C++ program -that is compiled only on the developers PC- can't.

Re:SQLLite is a search engine?!? by mindas · 2009-07-06 04:27 · Score: 1

Although they might have full text indexing and searching, databases and search engines/libraries work differently.

E.g. you come to online DVD shop and search for "Tom Criuse" (hint: misspelled surname). Every decent search engine (including Lucene library, not sure of others evaluated here) would yield a result, despite misspelling. I am not sure whether database fulltext thing would spit anything at all. It's simply built do do different job, that's it.

Swish++ not mentioned? by bobv-pillars-net · 2009-07-06 04:59 · Score: 2, Informative

Last time I had to implement an indexing and searching solution, swish++ was by far the performance winner.

--
The Web is like Usenet, but
the elephants are untrained.

Try DBSight -- Lucene based Database Search by chrislusf · 2009-07-06 06:13 · Score: 1

DBSight uses Lucene's inverted index, and beats any database based B-tree search. And it's dead simple to use. Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes

clucene by Anonymous Coward · 2009-07-06 23:59 · Score: 0

clucene beats jlucene (or simply Java Lucene) in everything.

http://clucene.wiki.sourceforge.net/Benchmarks

Slashdot Mirror

Open Source Search Engine Benchmarks

62 comments