Slashdot Mirror


Free Software Activists Take On Google Search

alphadogg writes "Free software activists have released a peer-to-peer search engine to take on Google, Yahoo, Bing and others. The free, distributed search engine, YaCy, takes a new approach to search. Rather than using a central server, its search results come from a network of independent 'peers,' users who have downloaded the YaCy software. The aim is that no single entity gets to decide what gets listed, or in which order results appear. 'Most of what we do on the Internet involves search. It's the vital link between us and the information we're looking for. For such an essential function, we cannot rely on a few large companies and compromise our privacy in the process,' said Michael Christen, YaCy's project leader."

62 of 254 comments (clear)

  1. Well by Anonymous Coward · · Score: 4, Insightful

    Result: Search results will be controlled by botnets

    1. Re:Well by Intron · · Score: 4, Insightful

      Result: Search results will be controlled by botnets

      Yes. What's to stop me from downloading the code, modifying it to put my results on top and then joining my 1000 or so servers to the pool? You only need a small advantage to get big differences in results -- the difference between 10th and 11th place is page one vs obscurity.

      --
      Intron: the portion of DNA which expresses nothing useful.
    2. Re:Well by HFShadow · · Score: 5, Informative

      This has been solved by distributed computing a long time ago, you simply get more than on worker to check the results and if anything looks fishy chuck away everything from that worker.

      Not that this makes this any better of an idea.

    3. Re:Well by Hazel+Bergeron · · Score: 5, Insightful

      The great thing about centralised search engines is that they're not gamed... oh wait...

      ...is that it isn't in the provider's interest to encourage spam domains full of adverts brokered by itself... oh wait...

      ...is that there's careful control over dissemination of information so privacy is not compromised... oh wait...

      A p2p search engine will have different problems. But in the limit perhaps it'll be like a load of Google or whatever servers sitting around the Internet instead of in one or two datacentres.

    4. Re:Well by Anonymous Coward · · Score: 3, Interesting

      At least it actually is in the interest of search providers like Google, Yahoo and Microsoft to produce useful results in order to achieve / maintain a large userbase.
      Not so much in the interest of somebody who simply sees a distributed search engine as his chance to drive fews to his blog / ad collection / malware site.

    5. Re:Well by Rei · · Score: 5, Interesting

      The whole "portal only as an afterthought demo" seems to me a huge flaw as well. You think your average person is going to install this on their computer just so they can do web searches? Not-going-to-happen. People who want to run it, will. People who don't or don't know how, won't. They're the 99.99%. They need a portal. Clients should automatically be putting themselves in the portal-switching queue.

      As for the capabilities, I just tried it out. The results are *extremely* few and very poor. "Dog" gets five hits, for example. You'd almost think it was a joke. Hopefully this was a load problem or a problem due to a lack of scaling in the system thusfar, and not a design flaw.

      At least their frontend doesn't seem designed with injection in mind. Start off a search with ' (such as 'Test) and watch what happens to the peer listed at the bottom of the page. I doubt that particular issue is exploitable, but if this a habit of one of their coders...

      --
      Hello from Sputnik 2. I am receiving you.
    6. Re:Well by Rei · · Score: 5, Funny

      Instead of insight, comment contained bobcat. Would not read again.
       

      --
      Hello from Sputnik 2. I am receiving you.
    7. Re:Well by alexgieg · · Score: 4, Insightful

      This system probably solves spam the same way Freenet managed to eliminate it from its boards: by adopting a(n anonymous) Web Of Trust model. In practice, you'll only see results coming from those you trust directly or indirectly. The fake results will be there, but buried.

      And even if they currently don't do that due to the smallness of the network, at some point they will. It's unavoidable.

      Although the problem then might become you only seeing what you like because your friends/trusted nodes all think more or less the same, hence basically shielding yourself from different views. But then, mainstream search engines already do something like this, so it won't be that different from what we already have.

      --
      Conservatism: (n.) love of the existing evils. Liberalism: (n.) desire to substitute new evils for the existing ones.
    8. Re:Well by blackraven14250 · · Score: 4, Insightful

      If it were in Google's interest to bump spam domains to the top, it wouldn't be the useful search engine with leading market share that it is today, as it would have already bumped said results.

    9. Re:Well by M.+Baranczak · · Score: 4, Insightful

      Freenet solves the spam problem by ensuring that nobody actually uses Freenet. I think this project will apply the same solution.

      This scheme has pretty slim chances of success. Which doesn't necessarily mean it shouldn't be attempted.

    10. Re:Well by Carnildo · · Score: 2

      As for the capabilities, I just tried it out. The results are *extremely* few and very poor. "Dog" gets five hits, for example. You'd almost think it was a joke. Hopefully this was a load problem or a problem due to a lack of scaling in the system thusfar, and not a design flaw.

      I tried my standard search engine test (how hard is it to find the web page for the Hilton hotel in Paris?), and it failed miserably: "Paris Hilton" didn't get a single result, and neither did any other variation I tried.

      --
      "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
    11. Re:Well by Urza9814 · · Score: 2

      ...And if 10% of your workers are all part of the same botnet deliberately trying to skew the results, then there's about a 10% chance that the person re-checking the results will be giving you the same "error".

    12. Re:Well by tom229 · · Score: 2

      My first test was 'slashdot'. Several results.. mostly blogs referencing various articles. The actual site wasn't even on the first page.

      Second try: 'cisco ios cli reference' seemed to generate a pile of results and completely froze the service.

      Even if it worked well its much slower than google to give results and about 10,000 times the bandwidth overhead.

      --
      If it ain't broke, don't fix it.
    13. Re:Well by tom229 · · Score: 2

      Update: tried to uninstall from 'Program's and Features': failed said was still running.

      Doesnt run as a Windows service, no discernible process name in task manager. Has to actually be stopped using an administrator command prompt and running the 'stopyacy.bat' file in the programs install directory.

      This software is junk.

      --
      If it ain't broke, don't fix it.
    14. Re:Well by Anonymous Coward · · Score: 2, Informative

      My job is pretty much gaming the system. And I think people really don't grasp just how resistant modern search engines are to that kind of thing, or the massive amount of effort that goes in to making even a small dent in it. Google and the like don't just leave things sitting around. They have hundreds of thousands of people pouring over duplications of search results 24/7 to weed out people like me. People really don't grasp just how much goes into a modern search engine or how much work those of us trying to sneak around them have to work at it. Something new has new problems, plus all of the old ones, except without any of the modern defenses. It's like plugging an xp machine with no updates or anti-virus software onto the net.

    15. Re:Well by Issarlk · · Score: 2

      How is it a problem for a spammer to use his (stolen) node to solve problems? He's not paying the electricity bill.

  2. Question by StripedCow · · Score: 3, Insightful

    Will one client be able to view the queries of its peers?

    If yes, how is that an improvement?
    If no, how does it work?

    --
    If Pandora's box is destined to be opened, *I* want to be the one to open it.
    1. Re:Question by CanHasDIY · · Score: 3, Interesting

      Will one client be able to view the queries of its peers?

      If yes, how is that an improvement? If no, how does it work?

      From TFA:

      It is fully decentralized, all users of the search engine network are equal, the network does not store user search requests and it is not possible for anyone to censor the content of the shared index.

      However, that seems to be all the information there is on the process... doesn't quite assuage the ol' paranoia circuits, does it?

      --
      An enigma, wrapped in a riddle, shrouded in bacon and cheese
    2. Re:Question by viperidaenz · · Score: 2

      From TFA: [yacy.net]

      It is fully decentralized, all users of the search engine network are equal, the network does not store user search requests and it is not possible for anyone to censor the content of the shared index.

      Providing noone modifies the open source code to log user search requests and censor queries

    3. Re:Question by 19thNervousBreakdown · · Score: 2

      From TFA:

      It is fully decentralized, all users of the search engine network are equal, the network does not store user search requests and it is not possible for anyone to censor the content of the shared index.

      However, that seems to be all the information there is on the process... doesn't quite assuage the ol' paranoia circuits, does it?

      The network stores everything.

      --
      <xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
    4. Re:Question by adolf · · Score: 2

      Because, you know, I'm sure that YaCy is totally and absolutely 100% efficient about things. Every peer obviously has a list of URLs that it is responsible for, and every peer is capable of censoring anything on its list, and there will never be more than 1 copy of any shred of data.[/sarcasm]

      Except it doesn't really work that way, as since nobody is in charge, nobody can dictate who will index what. You can censor the data on your own node and you'll certainly be successful (it's your computer, after all). YaCy even has some built-in blacklist functionality which you can set up yourself to make it easy, if that's what you want to do.

      But what you're missing is there's always going to be this other peer right over there that is merrily going about its business indexing all of that stuff that you don't like.

      And chances are, that whatever it is that folks might want to actively censor is contentious enough that other folks will actively work toward indexing it. (Streisand effect, etc.)

      *shrug*

  3. Great by Moheeheeko · · Score: 5, Funny
    Only used by neckbeards = all search results will be tentacle hentai and open source software websites.

    Awesome...

    1. Re:Great by datavirtue · · Score: 5, Funny

      Dude, that would be an awesome search engine name: neckbeard. Catchy and meaningful, easy to remember.

      --
      I object to power without constructive purpose. --Spock
  4. great stuff by alphatel · · Score: 2

    It's hard to argue with "free" and "freedom", so I give it the thumbs up. But in this day and age it feels like going from a Ducati Panigale to a 1950's Triumph Bonneville.

    --
    When the foot seeks the place of the head, the line is crossed. Know your place. Keep your place. Be a shoe.
  5. Ummm by Webs+101 · · Score: 4, Insightful

    Yahoo's search engine IS Bing.

    --

    "Even for Slashdot, that was a very obscure reference!" - Anonymous Coward

    1. Re:Ummm by enoz · · Score: 3, Funny

      Yahoo's search engine IS Bing.

      And Bing's search engine is Google.

  6. Come FLOSS Devs, We Need Better Names! by DMFNR · · Score: 4, Insightful

    Of course they decide to give it a name that doesn't even look like a word. I can't think of a singled popular search engine that doesn't have a catchy name. How do these free software developers expect the word to get around about their software when nobody can pronounce it and probably won't even remember what it was called? Especially a peer to peer search engine which I would imagine depends even more on a decent amount of people actually using it than a regular search engine.

    1. Re:Come FLOSS Devs, We Need Better Names! by nurb432 · · Score: 2

      Because most names are taken and they don't have a legal team to do research.

      --
      ---- Booth was a patriot ----
    2. Re:Come FLOSS Devs, We Need Better Names! by markdavis · · Score: 4, Insightful

      +1 Mod parent up.

      Seems the geeky crowd still doesn't understand that marketing DOES play a critical role in the popularity of any type of project. "YaCy" really does suck- it is impossible to say, isn't a word, introduces strange capitalization, and it is not even easy to remember.

    3. Re:Come FLOSS Devs, We Need Better Names! by adolf · · Score: 4, Insightful

      Seems the geeky crowd still doesn't understand that marketing DOES play a critical role in the popularity of any type of project. "YaCy" really does suck- it is impossible to say, isn't a word, introduces strange capitalization, and it is not even easy to remember.

      So fork it, changing only the name, and release it yourself under a more marketable moniker. The technical aspects of doing this are easy.

      And if you think selecting a catchy, unencumbered name is also easy, then you really shouldn't have any problem pulling it off.

      It's all GPL, so you can pretty much do what you want with it. If you really want to be in charge of marketing and distribution for a GPL project, the only thing stopping you is you.

    4. Re:Come FLOSS Devs, We Need Better Names! by raftpeople · · Score: 4, Funny

      Other names they considered that were equally bad:
      1) FreEble
      2) !!_//[%%%
      3) Bing
      3) xkCQQT

  7. Cool, but what's in it for the peers? by 91degrees · · Score: 5, Interesting

    While these things can succeed on the backs of some philanthropic individuals, it's just human nature that to get a decent community, you need to benefit the supporters in some way.

    Doesn't need to be any formal system. Free software, for example, seems to be based more on the honour system than anything else, but people do develop free software because there's something in it for them - software tailored to their needs. What is the incentive for being a search peer?

    1. Re:Cool, but what's in it for the peers? by TheRaven64 · · Score: 4, Interesting

      I sketched out a few designs for a decentralised search engine (but didn't implement them, so kudos to these guys for actually bothering), and one of the ideas I had was to allow nodes to return sponsored links (e.g. Amazon referrals). The client would display these for the top few nodes and track the reputations of individual peers. The more users who liked the search results that you returned, the more of them would see your sponsored links. If you came up with a ranking algorithm that did a better job than existing ones, then you'd get a bigger slice of the advertising space. It's essentially the same business model as Google, just on a smaller scale.

      --
      I am TheRaven on Soylent News
  8. Java... by HBI · · Score: 4, Interesting

    I was going to load up a peer but there's no way i'm running Java. I've almost completely excised it from all of my computers, no going back.

    --
    HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
    1. Re:Java... by vadim_t · · Score: 4, Interesting

      Ugh, yeah. Another cool project is going to be held back by Java.

      Way back, this happened with Freenet. I thought it was a cool idea, but the darn thing wasn't happy with all the 256MB I could give it. Even now, Java is still a considerable load on laptops with 4GB RAM.

      I think that for best adoption they should have concentrated on making it small and light. If it can be run in say, 64MB RAM then you can install it anywhere. And it's quite likely that a good part of why Freenet was so horrible when I tried it, is because it made a lot of the machines it ran on swap like crazy.

    2. Re:Java... by HBI · · Score: 2

      Yeah, it is evil. Why should I have a slow - certainly slower than native, memory hogging runtime package for every application, requiring myriad versions depending on the support level from the vendor? I'd rather just not have the crap on my system, thanks. I feel the same way about .NET/Mono if that makes you feel any better.

      --
      HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
    3. Re:Java... by Anonymous Coward · · Score: 3, Insightful

      Not evil, no... but annoying as fuck, yes.

      I've yet to see anything written in Java that didn't seem bloated, slow, and annoying.

    4. Re:Java... by TheInternetGuy · · Score: 3, Funny

      That's OK, please join me in my efforts in porting this over to Flash.

      --
      If my comment didn't sound as good in your head as it did in mine, then I guess we all know who's to blame
    5. Re:Java... by HBI · · Score: 4, Interesting

      I'd be interested in porting it to C, actually.

      --
      HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
    6. Re:Java... by Lazy+Jones · · Score: 3, Interesting

      cool project is going to be held back by Java.

      You know, I'll take "cool projects held back by Java" any time over equally cool projects written in C that need to be patched 5 times a year for the next 10 years because of sloppy programming leading to arbitrary remote code execution vulnerabilities. Please, just let software written in C die with dignity, the language had its decades of glory before everything was accessible over the 'net ...

      --
      "I love my job, but I hate talking to people like you" (Freddie Mercury)
    7. Re:Java... by Nimey · · Score: 4, Informative

      ...instead, you have to update the JRE about that often because of sloppy programming leading to arbitrary remote code execution vulnerabilities.

      The JRE is currently the #1 malware vector, even above Flash and Acrobat.

      --
      Hail Eris, full of mischief...

      E pluribus sanguinem
    8. Re:Java... by devent · · Score: 2

      That is really stupid of you.
      Let me see how the facts are:
      Firefox with a few addons and 9 tabs: 180MB RAM. Eclipse with a lot of projects open: 200MB RAM.

      At least with a Java Application I can just download it and run it on my Linux and Windows computers. It would be really nice if more applications would leave the Windows-monoculture, like from companies that owe their very existence to open source systems like Google (Google Sketchup is still not available for Linux and probably never will be).

      --
      http://www.mueller-public.de - My site http://www.anr-institute.com/ - Advanced Natural Research Institute
  9. No control over disk usage by markdavis · · Score: 4, Interesting

    This whole concept seems quite fascinating/interesting. Ironically, two questions came to my mind immediately:

    1) How much bandwidth does this take?
    2) How much disk space does this take?

    Neither question is answered on their FAQ ( http://www.yacy-websuche.de/wiki/index.php/En:FAQ ), although they addressed the disk space issue thus: "Can I limit the size of the indexes on my hard-drive? For the moment no. Automatically limiting that size would mean having to delete stored indexes, which is not suitable. "

    Yikes! I am not sure how many people will want to run a local YaCy client when there is no control over how much disk space it uses (or, apparently, bandwidth). It still has a lot of promise, though.

    1. Re:No control over disk usage by nurb432 · · Score: 5, Insightful

      Run it in a VM. limit its disk space and networking in one fell swoop.

      --
      ---- Booth was a patriot ----
    2. Re:No control over disk usage by markdavis · · Score: 2

      I wonder what happens when the thing runs out of space? If you can't set how much it uses, then how are we to know that it handles running out of space "gracefully"?

      Also, you (presumably) and I are Linux users- so quotas, separate file systems, loopbacks, space checking, or whatever, are not rocket science. But that could be a lot more challenging for the people doing this on MS-Windows. Some users might be thinking they are "helping the world" by installing that app, then months later not understand why their computers are crashing with "no space left" type problems.

    3. Re:No control over disk usage by TheRaven64 · · Score: 2

      3) What is to stop a malicious node in the network from getting my search history?

      All of their claims about privacy seem to be implementation details of their code (which, being open source, is trivial to modify). They don't tell me how they designed the protocol to be avoid someone modifying the code to record searches or even to inject phishing sites into the top lists.

      --
      I am TheRaven on Soylent News
    4. Re:No control over disk usage by Meski · · Score: 4, Insightful

      I'd wonder about what readable or easily decodable data might be found on your local drive. Do you think telling the authorities that raid your computer that you aren't responsible for illicit content (think about it doing something like google cache on a pron site) or url's to sites the government disapproves of etc, is going to be believable?

  10. Got to get off my lazy butt... by xTantrum · · Score: 4, Interesting

    ...and start coding my ideas. First itunes, then fb and now p2p search. Just goes to show ideas are a dime a dozen its just who implements it first. Can't wait to see how this turns out though. P2P is really how the internet should be structured as much as possible.

    --
    $action = empty(PHP) ? backToC() : unset(PHP) ; "when the concrete cases are understood, the abstractions are readily
    1. Re:Got to get off my lazy butt... by dmbasso · · Score: 3, Insightful

      Just goes to show ideas are a dime a dozen

      Exactly, and that's the reason the patent system only works for lawyers these days.

      --
      `echo $[0x853204FA81]|tr 0-9 ionbsdeaml`@gmail.com
  11. Yahtzee by pavon · · Score: 4, Funny

    I assumed it was intended to be pronounced like Yahtzee, which is both memorable and quite descriptive of the quality of results you can expect.

  12. Re:Just Installed it. by Zephiris · · Score: 2

    Cool. "Therefore, more complex ranking algorithms such as those used by Google (which analyze rank using a variety of contextual factors developed during webspidering) are not available in YaCy, placing severe limits on most users' means to retrieve the results they seek. For instance, none of the top 10 results returned by YaCy's public search when queried "Google" actually refer to Google's homepage."

    --

    "A Goddess rarely smiles for she is forced by others to be an island unto herself." - Zephiris
  13. I'm not seeing why this should be tried. by cshark · · Score: 3, Interesting

    Haven't we learned from gnutella, and the others, that this kind of thing just doesn't work? That it'll get overwhelmed by spam, hackers, you name it? I'll try it because I always try new p2p type stuff. But I'm really hoping they have a good security team.

    --

    This signature has Super Cow Powers

    1. Re:I'm not seeing why this should be tried. by wvmarle · · Score: 2

      And it's likely going to be as slow, as so many servers on so many different (and often relatively slow) connections have to be queried. Sorry but I don't like waiting for search results for more than a second or so, when Google provides them almost instantly.

      Google sets the standard, that's what you have to beat. So yes the bar to get into the search engine market is really high, and not many players will be able to give it a go with much chance for success.

  14. Re:how to say YaCy? by Spy+Handler · · Score: 2

    it's pronounced like:

    Yahoo + Cyborg

  15. Re:Nerdy Nomenclature by TheRaven64 · · Score: 2

    Hmm, I'd have said it with a hard C, and then it sounds a lot like yucky, which isn't a great name.

    --
    I am TheRaven on Soylent News
  16. Needs more work by vadim_t · · Score: 2

    So, I tried the portal and searched for slashdot.

    1. geek.net
    2. slashdot tags
    3. ostg.com
    4. slashdot.org/favicon.ico ...
    main page nowhere to be seen.

    Second try, antirely different results:
    1. microsoft.slashdot.org
    2. slashdot.org ...

    Seems very erratic so far. Then maybe it needs some time to stabilize a bit.

  17. In 1996 this was done ... by hubertf · · Score: 4, Informative

    ... by the Harvest Project, which installed several local data collectors, and which then added a search engine over all those collectors. The cache system added in between is still known today: Squid.

    http://en.wikipedia.org/wiki/Harvest_project

      - Hubert

  18. Re:how to say YaCy? by lennier · · Score: 2

    Where does "ach" come into it? "Yah" sounds exactly like "yar", as in what pirates say, which rhymes with "jar" and "far" and "ahh" and "pa", while "yaw" sounds exactly like "yore", which rhymes with "paw" and "poor" and "door" and "more". "Ah" vs "or".

    At least that's how we pronounce those letters here in the Antipodes.

    --
    You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC
  19. GIMP is another example. Great program by Anonymous Coward · · Score: 4, Insightful

    GIMP is another example. Great free graphics program, terrible name.

  20. Re:Also by Enderandrew · · Score: 4, Insightful

    Google actively fought censorship in China more than any company on the planet. They put servers in Hong Kong that weren't required to censor results, and any page that was censored, Google made sure to state explicitly on the page that the content was censored so that people knew it.

    In the end, China changed their laws and forced Google to comply. At that point they either had to pull out of China completely, or comply with laws. While some would contend that the high road is to pull out of China, but at the same time, you can't make in roads and try to effect change if you're not in the country at all.

    --
    http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
  21. Good idea, but Yacy is basically useless trash by xiando · · Score: 2
    I tried Yacy. I've tried it a few times times since I first tried it years ago to see if it had improved or not. It has not. The main problems with it are:
    • Yacy demands a whole lot of resources. You need a powerful dedicated server just to run it.
    • It likes to crawl sites at a very rapid rate, webmasters all over the world should be happy that it has not taken off. How about waiting a few seconds between page fetches from the same server, eh? Run it and you risk people all over banning your IP. I tried to crawl my own sites with it - not a good idea. At least I could shut the thing down when I saw what it was doing..
    • It crashes, and it crashes a whole lot. Do a few searches and it will crash.
    • Do a search and Yacy will hog CPU time for quite a while. Do another search while it's eating resources and it crashes.
    • The search results are horrible. They are basically useless.
    • Yacy has absolutely no support for different languages. The whole Internet is not the same language, yet Yacy pretends it is. Just want search results in your own language? Not an option.

    I could go on, but you get the idea. I would really like to see a usable peer to peer search engine. The Internet needs it. Yacy is not it. The idea is good, the implementation can best be described as EPIC FAIL.

  22. Re:Also by Enderandrew · · Score: 2

    But the blockcade won't do anything. You'd just force 100% adoption of Baidu by 1.3 billion people so that everything they see would be through the filtered eyes of the government.

    At the very least, now that Google is forced to comply with the laws they are still the only ones who plainly put on the page that the search results were censored. They're informing the public that the government is keeping things from them.

    --
    http://blindscribblings.com - Tasty pop-culture in conceptual fashion.