Slashdot Mirror


New Clustering Search Engine to battle Google

Sophrosyne writes "The New York Times is reporting a new search engine [free if DNA on file with Homeland Security] named "Clusty" is going to try and take Google head-on. The new search engine was developed by three former CMU computer scientists who formed the company Vivisimo. The search engine uses Overture for it's results but offers new features such as an encyclopedia search, clustered results, and a gossip search."

35 of 189 comments (clear)

  1. Klutsy? by mfh · · Score: 5, Insightful

    New Clustering Search Engine to battle Google
    More like New Clustering Search Engine goes Beta. Let's wait until it's production stable before talking about who it's going to take down in a fist fight reminiscent of the Spock/Kirk battle in Amok Time.

    Clusty by Vivisimo? Did I even spell that right? They need to consider naming things that people can:
    A) pronounce
    B) spell
    C) are actual words or at least close to words that qualify for both A & B.

    Clusty sounds like something you would call the fat cheerleader. It also will be often mispronounced as Klutsy, so it's a very bad name for a search engine (of all things).

    The search engine uses Overture for it's results but offers new features such as an encyclopedia search, clustered results, and a gossip search.

    This is a Microsoft tactic: add features to get market share, and it's an evil tactic because nothing new comes out of it, except bloat and bad karma. The fact this is based on Overature leads me to believe that it won't be able to take Google head-on at all. Clusty uses the Google interface but shows sponsored results first (evil), and displays 404 pages in the results. (FYI dteam was the first 3d design guild that is no longer)

    I don't think they really have a hope of competing with Google. If it ain't broke don't fix it, so most people will just continue to use Google.

    --
    The dangers of knowledge trigger emotional distress in human beings.
    1. Re:Klutsy? by LiquidCoooled · · Score: 5, Informative

      The original engine was actually called Vivisimo,
      and the exact point you make was mentioned back then.

      heres the article (january)

      http://slashdot.org/articles/04/01/05/1839233.sh tm l?tid=126&tid=185&tid=95

      I think clusty.com is better, but now makes me think of unclean prostitutes.

      --
      liqbase :: faster than paper
    2. Re:Klutsy? by mfearby · · Score: 2, Insightful

      It's probably called "clusty" because of all the domain-name hogging scum out there getting fat off registering everything they can think of to extort big bucks! Who would have thought of registering google, huh? Back when it first came out I remember thinking "google: what a stupid name!". Now, it has become both a noun and a verb in most peoples' everyday speech

    3. Re:Klutsy? by Quixote · · Score: 4, Funny
      They need to consider naming things that people can:
      A) pronounce

      Well, Google has got everyone beat in this regard. "Google" is probably the first thing a baby says (and hence I'm sure it is hardwired into our brains). The only thing that could beat "Google" would be "dada" or "burp". Any takers?

    4. Re:Klutsy? by meza · · Score: 3, Insightful

      Clusty by Vivisimo? Did I even spell that right? They need to consider naming things that people can:
      A) pronounce
      B) spell
      C) are actual words or at least close to words that qualify for both A & B.

      The main reason why I used altavista for so long was actually because I didn't manage to spell google right. Honestly. I had to try all kind of combinations everytime I wanted to go there, like gogle, googel, gogel. I should also say that english is not my native language.

    5. Re:Klutsy? by br0ck · · Score: 2, Informative

      According to the CEO the awkward name and the Krusty similarity were both intentional.

      Valdes-Perez said his company dumped the name Vivisimo for the search engine because it was ``an obstacle.''

      ``It's a name that is difficult to pronounce and type and spell. Other than that, it's a great name,'' he quipped.

      But the new name may face similar challenges, Valdes-Perez acknowledged. Though it is easy to remember, for many people Clusty evokes the name Krusty the Clown, the not-so-kid-friendly character on ``The Simpsons'' television show.

      Valdes-Perez said he initially recoiled at the name Clusty, which was conceived by a business partner. But he found it more memorable than the vanilla-sounding names proposed by a professional branding company.

      ``A mildly negative association,'' Valdes-Perez said, ``will be swamped by a positive experience. And that's what we hope to offer.''

    6. Re:Klutsy? by It'sYerMam · · Score: 2, Insightful

      Google's pretty much a misspelling, anyway. The original word is 'googol' meaning 10^100. Incidentally, this was the £1,000,000 question on WWTBAM, when the cheater was on. He didn't know it, but everyone in my family did... :|

      --
      im in ur .sig, writin ur memes.
    7. Re:Klutsy? by RotJ · · Score: 2, Insightful

      More like New Clustering Search Engine goes Beta. Let's wait until it's production stable before talking about who it's going to take down in a fist fight reminiscent of the Spock/Kirk battle in Amok Time.

      Whether it's beta or not doesn't matter. Google picked up most of its steam by word of mouth while it was still in beta and was already on its way to becoming the dominant search engine by the time it took off the beta tag. Just look at Google's own Gmail beta. Hotmail and Yahoo! didn't have to "wait until it's production stable" before worrying their asses off about the marketshare its gaining.

      This is a Microsoft tactic: add features to get market share, and it's an evil tactic because nothing new comes out of it, except bloat and bad karma.

      So is Google is using evil Microsoft tactics by adding a news search, newsgroup search, image search, directory search, university search, special search, price search, local search, catalog search, definition search, Klingon search, calculator, translator, weblog, email, and photo organizer? Do you think this make Google bloated or a better service?

      As far as clustering goes, I'm pretty sure NortherLights.com was marketing it as its key feature back when it was still competing in the consumer search market. Seems their enterprise search engine still has it: "Automatic classification. Northern Light has patented, proprietary technology that classifies every document in the database by subject, type, language, and source. We provide a complete 17,000-node subject taxonomy developed by our expert gang of librarians that is extensible and customizable. Our classification powers advanced search forms, vertical search applications, and our patented Custom Search Folders(TM) for results navigation."

  2. Clue! by mfh · · Score: 4, Funny

    But anyway, this does look interested.

    I think there's your first clue for why your story was rejected.

    --
    The dangers of knowledge trigger emotional distress in human beings.
    1. Re:Clue! by jabber-admin · · Score: 4, Funny

      Perhap s/he also forgot to include a witty comment about the NYT registration req.

  3. Gossip filter by IwannaCoke · · Score: 3, Insightful

    Instead of being able to search through just gossip, I would be more interested in being able to filter out all the gossip.

  4. Re:is going to try and take Google head-on. by fgb · · Score: 3, Funny

    ...but not one that reminds me of "Clippy".

  5. Since when is search a solved problem? by hanssprudel · · Score: 3, Insightful

    So everybody is waiting for the next great search engine to come along and out-google Google, but it seems to me that they are looking in the completely wrong places.

    All Clusty, A9 and the other more recent search engines seem to do is add more gimmicks to search results from yahoo and Google respectively. To some extent, this seems to be exactly what Google is doing recently as well: the searches are hardly getting beter, instead we can search news, search references (try define:), search printed text, do automatic conversions, etc etc.

    But the truth is that not only are the searches at Google not getting better: they are getting worse. It seems like PageRank is more or less unused nowadays, and Google just uses easily manipulated things like searchterm in URL, searchterm in Title, how recently updated, to rank pages. I think anybody who uses Google to search for specific things must have observed that it works only a fraction of how well it did when it was new.

    So what is going on here? Does everybody consider the basic searching a solved problem, and that we don't need to find pages better than google does? Or is a good search that cannot be manipulated really an intractable problem?

    If I owned Google stock, I would really be wondering how many of all those thousands of PhD's at the Googleplex are working on this, and how many are writing gimmicks and elegant webmail applications. Or maybe one of them already proved that the problem can't be solved, and Google is just hoping to make as much money as possible before the secret comes out...

    1. Re:Since when is search a solved problem? by barthrh2 · · Score: 2, Interesting

      You point out the exact benefit. In most searches where it could apply, your first five pages are mata-shopping engines. People are using tactics like creating stupid page names based on popular searches that the manage to push to the the top of the rankings.

      This is a battle that will always go on. Change your page rank system and people will just start gaming it again.

      What Clusty/Vivisimo accomplishes is that by clustering data, it takes sequence out of play. Even if my preferred pages for "Debian's social contract" appear deep into a search on Debian, it comes front & center on a clustered search.

      If this catches on, I'm certain that people will figure out how to game that too. One feature that I'm surprised was never implemented is an option to suppress meta-engines from search results. That would clean up results a lot.

  6. Not impressed; but more competition is good by Quixote · · Score: 4, Insightful
    I didn't RTFA (I'm a regular, I don't have to) but I tried out Clusty. In particular, the News section.

    Under the heading "House" are the news items:

    • Gunmen Attack Mauritania Security Chief's Home (Reuters)
    • U.S. Policies Stir More Fear Than Confidence (Los Angeles Times)
    • N.Y. Auction Houses Expect High Totals (AP)

    And under the heading "Record", are listed:
    • As Reservoirs Recede, Fears of a Water Shortage Rise (Los Angeles Times)
    • NASA Delays Plans to Fly Shuttle Soon (NY Times)
    • San Jose State, Rice Set Scoring Record (AP)
    This shows that just a clustering technique isn't enough; you need more context. Google (IMHO) does a better job of clustering their news results.

    Having said this, I wish Vivisimo all the luck. Google needs more competition; it is what will give us the Next Great Search Engine(tm).

    Ob: I, for one, would like to welcome our new clustering overlords.. ;-)

  7. ooh a complete suite of search engines by Savves · · Score: 2, Funny
    but i have to say, the gossip search needs to index more sites, and the image search is still no match for google's.

    not a very reliable porn search engine.

  8. Encyclopedia? Bah! by Zeddicus_Z · · Score: 3, Insightful

    The submitter had me all excited there for a minute or so, but unfortunately the "encyclopedia search" he mentions is simply searching the wikipedia.org site. Now don't get me wrong; there's absolutely nothing wrong with wikipedia, however it's already a web resource. You've been able to "encyclopedia search" Wikipedia for AGES by appending "site:wikipedia.org" into a google query.

    Now if they'd done some sort of deal with Britanica to gain search access to its online library, THAT would be a resource worth posting to /. about. Bah.

    --
    Janie took my gun...
  9. Hrmm... by t7 · · Score: 3, Interesting

    How many people actually jump on the "bandwagon" and switch search engines just because some one says it's "new and fresh"?

    I gave a9 a try, I like the interface and some of the new features like the search history and the multiple search panes. But shortly after I found myself using google again. Even though a9 uses google, and the results are almost identical, I didn't find anything compelling enough to make me switch.

    Does anyone else feel they might be missing some results if they were to use another search engine?
    What must a new search engine provide to "steal" users from google?

    Free iPods? Sure!

  10. Re:The interface looks pretty "cruddy" by mfearby · · Score: 2, Informative

    After having just searched for "BinaryWrite ASP Stream" to see if it might produce the goods in trying to solve a little web page problem I have, clusty did actually turn up something I hadn't seen before. Maybe it is going to be OK?

    Back when Teoma came out I remember thinking the same thing, but soon forgot about it.

    And yes, I have seen Linux. I was until a year ago a perpetual new-distro-installing-slut to see if Linux was up to scratch. Sadly, I still have to tinker and fiddle with the thing to even get it to recognise a frigging USB mouse (SuSE 9.0). Said USB mouse (Logitec optical) worked beautifully during the installer, but after rebooting, refused to work in X.

    Hey, mabye I should install Syllable 0.5.4 (http://slashdot.org/comments.pl?sid=124146&cid=10 416966), the world's latest addition to the steaming heap of soon-to-be-abandonware :-)

  11. A better mousetrap by mrshowtime · · Score: 3, Interesting

    I have always considered Google's best point, is it's utter simplicity in design. Also, the name is easy to remember. Anyone who wants to up Google has to not only be MUCH better, but also have a good name and be as easy to use as google. Before, in the old days, each search engine produced sometime wildly different results. At the time, HotBot was the best search engine going, but they lost their steam and was ultimately "replaced" with google.

    --
    "Jeremy, you need to get to an internet cafe and cut and paste some appropriate sentiments about me from the world wide
  12. Dada by mfh · · Score: 4, Insightful

    Well, Google has got everyone beat in this regard. "Google" is probably the first thing a baby says (and hence I'm sure it is hardwired into our brains). The only thing that could beat "Google" would be "dada" or "burp". Any takers?

    You joke, but a search engine named Dada would likely be well received for the name, and if it was a good system it could find a nice user base. I mean it has taken Google *years* to perfect its systems and they started with a good premise: do no evil. That was when all the search engines were cashing in on ads. A lot of people were turned off of the internet because of that, until Google came along. So it was purposeful, not evil, and light/easy to use.

    My suggestion to anyone trying to take on Google is that they should do something else unless google becomes evil, and because power corrupts and absolute power corrupts absolutely -- it's just a matter of time before Google turns evil. Maybe not, though. :-)

    --
    The dangers of knowledge trigger emotional distress in human beings.
    1. Re:Dada by drinkypoo · · Score: 2, Funny

      The problem with a search engine named "dada" is that when you searched for articles on cutlery you'd end up with a picture of a haddock impaled on a pitchfork being held by a naked man wearing fake breasts. Probably not what you were looking for.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  13. It's more impressive than Slashdotters realize by Everyman · · Score: 5, Interesting

    First of all, "uses Overture results" strikes me as misleading. They have an agreement with Overture to share the proceeds from the sponsored links.

    The results include MSN and Gigablast and Lycos. Basically, that means Yahoo's crawling plus Gigablast. Yahoo has ramped up their crawling since March, and is on a par with Google. They've been slow about passing all of it to MSN in a timely fashion, but by now MSN has most of it. I think Lycos, which also uses Yahoo's Inktomi, is about the same as MSN.

    The clustering is the best of any search engine, meta or otherwise. You don't have to have JavaScript enabled, which is a big plus over the Vivisimo interface I remember from a year ago.

    Finally, I was delighted to see that Clusty.com does not set a cookie unless you customize. Even the cookie for customization looked like it lacked a unique ID. I emailed Clusty and they confirmed for me that they have no plans for a unique ID in their cookie.

    Google tracks you with a unique ID across all of their services, and saves everything it knows about you. Google's cookie expires in 2038.

    Now I ask you, why do Slashdotters feel the need to dump on Clusty?

  14. Mozilla search plugin from the actual company by palfrey · · Score: 3, Informative

    Now there's a first. Not even Google has ever directly supported Mozilla - the Google toolbar from Google is IE only. And this one now has a Mozilla search plugin link on the front page. Kudos.

    --
    Beware the psychokinetic mimes!
    1. Re:Mozilla search plugin from the actual company by NeoSkandranon · · Score: 2, Interesting

      The biggest "feature" of the toolbar is the popup blocking..which Moz users don't need

      At any rate Firefox has a box in the corner that's directly linked to google

      --
      If you can't see the value in jet powered ants you should turn in your nerd card. - Dunbal (464142)
  15. It beats Google by danharan · · Score: 2, Informative

    on a search for "MILF" by putting the Moro Islamic Liberation Front in one category, and separating the more "mature" content in others.

    It's not perfect, but it's a good start. I'm sure /.'ers can think of other ambiguous search words where clustering helps. The UI could use some simplification, but otherwise I'm impressed.

    One neat consequence for web marketers will be more targetted traffic. With Google, you have to hope searchers will be savvy enough to use 3-4 keywords to search for exactly what they want- if they can click on two more KWs that refine their search, we'll see the inventory of cheaper 3 KW terms go up significantly.

    --
    Information: "I want to be anthropomorphized"
  16. Search is a dialog, not a ranking by G4from128k · · Score: 3, Insightful

    The basic concept of any kind of PageRank is flawed because it assumes a monotonic ordering of sites on some single scale (e.g., popularity as defiend by linkage). The problem with PageRank is not the use of links to assess popularity, but the presumption of a single scale.

    The search of "Apple" illustrates this well. This search, like many is deeply ambiguous. It could refer to the computer company, to the fruit, to the record company, to New York City, to the singer (Fiona), or to Apple Valley (MN or CA). Even if the search engine knows that it refers to the computer company, it's still ambiguous. It could refer to the company (as an investment), the products (for purchase), or a question (as in technical support).

    The point is that each of these ambiguous alternatives creates an independent cluster of hits. One cannot even rank hits within a cluster due to a hierarchy of ambiguity. Within the Apple computer cluster are distinct subclusters for computer purchase, investment evalaution, and technical support. Although one can create a ranking within each subsubsubsubcluster, it is impossible to construct a meanful rank for all hits across all clusters - the second hit for "purchasing an Apple computer laptop" is not comparable to the 2nd hit for "Apple Records".

    Instead of a pagerank scheme that sorts the universe of hits the instant the user enters the search, search engines should be more interactive. The first page of hits would emphasize breadth -- displaying hits most representative of a broad range of alternative clusters. The UI would enable a "more like this"/"fewer like this" selection process that tells the search engine what the searcher is actually looking for. As the searcher selects hits, the subsequent pages might show popularity-ranked hits within the clusters that seem to interest the searcher.

    Each hit and each page would serve a double-duty -- serving the searcher's need to get information from the internet, and answering the search engine's question about the needs of the searcher for that particular search. Until the search engine understands each searcher and each search, it cannot hope to rank the hits.

    --
    Two wrongs don't make a right, but three lefts do.
  17. tabs by dancedance · · Score: 2, Insightful

    Like google, clusty can seach for/through: images, news, ebay, blogs, and . . . SLASHDOT? I was quite supprised to see that it can be customized to have a slashdot tab at the top. The other interesting thing I noticed is that there is a link on the main page to "mozilla search plugin". I am not able to actually follow the link, but it would seem to suggest that they are interested in supporting OSS. Who do you think they are trying to target?

  18. Hooray! Hooray! Hooray! It puts Wikipedia first! by ortholattice · · Score: 3, Interesting

    Finally, a search engine that correctly bubbles wikipedia above the spam clones (and read the reply to this post too). Google doesn't even show wikipedia at all on the first page, even if expanded. Kudos, you've won your first (?) customer!

  19. Dada would be... by sam_handelman · · Score: 4, Funny

    A search engine that finds pages containing the words you typed which are *least* likely to relate to your actual underlying question. A google of the absurd, as it were.

    This could be very, very difficult. How would you implement such a thing, from a technical standpoint?

    --
    The good and new comes from no quarter where it is looked for, and is always something different from what is expected.
  20. Marketing Slip-Up... by One+Childish+N00b · · Score: 2, Funny

    "Ewww, Jimmy's got a clusty in his hair!"

    - I refuse to use anything that sounds like children's slang for a bogey or some other lump of offensiveness. Whoever thought that name up needs to be drummed out of marketing forever. The layout of the main page is reminiscent of Ask Jeeves (which is a bad thing, it automatically makes me think 'bad searches') and search pages look cluttered and the vivid background against the soft shades of the foreground looks awful. This 'Clustered Searching' is a good idea, badly executed. Next please.

    --
    Dealing with lawyers would be a lot less tedious if they all looked like Casey Novak.
  21. What's it's name... by karniv0re · · Score: 2, Funny

    "Yeah, I use that new search engine. Crusty. Er, Colostomy. Er, Callusy, or whatever."

  22. Klitsy? by Donny+Smith · · Score: 2, Funny

    >I think clusty.com is better, but now makes me think of unclean prostitutes.

    And Google makes me think of clean prostitutes!

  23. With .sig: by FooAtWFU · · Score: 2, Funny
    What must a new search engine provide to "steal" users from google?

    Free iPods? Sure!

    Well, I guess that's one way to do it...

    --
    The World Wide Web is dying. Soon, we shall have only the Internet.
  24. i just couldn't care less in such cases by l3v1 · · Score: 3, Insightful

    Am I the only one who is fed up reading like "company A developed a new search engine which uses company B's search engine by adding revolutionary and world shaking features like thinking instead of you"...

    If some are so revolutionary, then why are they using someone else's engine by adding some stuff most people most probably never find out what to use for. Doesn't A9 ring a bell for anyone, or does it.

    I have an idea. Let's make a totally new and ground breaking search engine which will use Google's results, but hey, the main idea: let's have a different logo and paint the site pink !

    Geez, I sometimes just can't stop wondering about all the freaky things that money can be earned from these days.

    --
    I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.