Slashdot Mirror


Wikia Search Engine to be Launched on January 7th

cagnol writes "The Washington Post reports that Jimmy Wales, the founder of online encyclopedia Wikipedia, has announced the launch of a new open-source search engine, Wikia Search, on January 7th, 2008. The project will allow the community to help rank search results, in a model close to Wikipedia. However the company is a for-profit organization. This new search is supposed to challenge Google and Yahoo."

14 of 189 comments (clear)

  1. Easily Abused? by Shade+of+Pyrrhus · · Score: 5, Insightful

    So basically...they're asking for people to abuse the ranking system. To patrol something like this would require a company with resources like Google, and most likely the reason Google doesn't have such functionality. Just my two cents.

    1. Re:Easily Abused? by Walzmyn · · Score: 5, Funny

      What this means is that no matter what you search for, the top hundred results will be to porn sites.

    2. Re:Easily Abused? by jrothwell97 · · Score: 5, Interesting

      Point well made - while spam attacks may be pretty obvious, they could be spread out over time to make them less obvious.

      Additionally, I can see this search engine being very much affected by public mood. For example, say there was a royal death and a certain right-wing 'upmarket' tabloid newspaper decided to claim that it was a conspiracy by the Government to kill the royal off. This is linked to from said newspaper's web site, and this people improve its ranking. Therefore it floats to the top of the results pile, thus giving it more exposure and setting off a vicious cycle.

      Just a hypothetical situation, but certainly possible. Such a model would also make it possible to carry out smear attacks and to ruin the rankings of competing companies, parties, organisations, whatever - a practice that IMHO should be left to search engine admins.

      --
      Those using pirated Tinysoft signatures(TM) are a real threat to society and should all be thrown in jail.
    3. Re:Easily Abused? by Shade+of+Pyrrhus · · Score: 5, Insightful

      Having an open algorithm is good, as non-disclosure isn't security, but the issue is allowing people to rank searches and such. Having that public is asking for people to abuse the system, and as noted before, a lof of malicious parties could seemingly legitimately rank their sites (porn sites, etc) higher, leading to ranking battles by bots. Of course, the issue of vandalism occurs with Wikipedia, however when people are looking to make money off of it they'll likely be more persistent.

    4. Re:Easily Abused? by jwales · · Score: 5, Informative
      The question of abuse is obviously one that we are taking very seriously in thinking about design issues. My belief is that the key to solving this thorny question is hinted at by the success of wikis and the wiki model: the key is to put tools in the hands of the community that allow for broad oversight and control by the community in a process of open dialogue and discussion. This is very different from approaches that allow only for atomistic participation by a "community" which is never allowed to really become a community due to excessive reliance on algorithmic voting systems and similar.

      One of the first lines of defense in the early days will be use of a community (wiki) generated whitelist of sites to crawl. We will want to work outward from there, but basically the first thing is for us to assess "look, what are the most important must-have sites on the net" and crawl them. One thing that the mainstream media never seems to report very well, mostly because I think they don't get why it is important, is that we are doing everything here under free licenses. The software GPL, the data we generate under free licenses, etc. The aim here is not just to create a good search engine, but to create it and *give it all away* in a way that I think has a chance to restructure the entire search industry. Well, maybe not, maybe so, but what the hell, it'll be fun to see. :-)

      --
      Wikia
    5. Re:Easily Abused? by ivan256 · · Score: 5, Insightful

      Is there an intersection between the people who decide what goes on the whitelist, and what is "notable" for inclusion in Wikipedia?

      I thought so. Your solution is already broken.

  2. I don't care how they arrive at a rank! by garcia · · Score: 5, Insightful

    The idea is to challenge the established players by offering a search service that is more transparent to end users, meaning they can see how search results are arrived at. Wales has described Yahoo and Google as opaque services that don't explain how results are arrived at.

    Personally, I don't care how search engines rank the websites they return as long as what is returned is proper, relevant and useful.

  3. Re:first things first by phantomlord · · Score: 5, Insightful

    Search for Kobar Towers and you get 0 relevant articles. Search for Khobar Towers and you get 62 articles. Yeah, the first is a misspelling, but it's 1 letter off, nothing difficult for a spell checker to check against a dictionary of existing articles. What use is a search engine if it is so strict that I have to enter the terms exactly to get an article when I could just do that in the URL?

    As long as I need to use google to search Wikipedia, I don't see Wikipedia creating a google killer.

    --
    Don't leave your mind so open that your brain falls out. Don't close it so much that you cut off the blood.
  4. Re:Challenging Google? by jwales · · Score: 5, Informative

    No, it is no response to Knol. I have been working on this for a year. The press has talked about it endlessly. :-)

    It'd be sort of cool if we could create a search engine in a week or two to respond to Knol, but actually it takes a bit longer. :)

    I see Larry and Sergei socially from time to time. I spoke about the search project at Google Zeigeist a few months ago. Going to a google party next month. The media loves a "fight" but really, that's just a nice story arc the press makes up. (Notice: google is not in the search business, google is in the advertising matching business. This search engine doesn't hurt that business at all, indeed it probably makes it marginally less likely we will see the emergence of a proprietary competitor to topple them.)

    It is actually possible for people to just enjoy doing cool stuff without being bastards about it. People forget this sometimes, maybe due to the reputation of a certain dominant software provider. :)

    --
    Wikia
  5. Re:What a joke... by jwales · · Score: 5, Informative

    Again, it would be hard for this to be a response to Knol, since I announced it and have been working on it for a year. :-)

    And, if you read the linked article, you would know that *zero* donations from Wikipedia have anything at all to do with this: Wikia is a completely separate organization.

    Also don't make the classic mistake of thinking that "open source" automatically means "volunteer coders". It generally does not, and the classic FUD from the proprietary world fails to describe reality for precisely this reason.

    And finally, one of the most important concepts here is that of a broad deep whitelist, which is something that I think can be done realiably and well with appropriate tools in the hands of the end users. The entire problem of bot-driven spam comes from a lack of reliable quantities of human oversight in the process. All you have to do to massively spam google is fool a computer. (Well, even then, google does a pretty damned good job of preventing massive spam though of course there are always some problems.) Pretty hard to get that nonsense by a properly organized community effort.

    (But of course, the design of a community which can move things forward quickly without a lot of useless work is nontrivial.)

    --
    Wikia
  6. Re:first things first by Odiumjunkie · · Score: 5, Interesting

    I completely agree. I am continually amazed at how good google's input-correction is - if I do a search for 'pale gire', it knows to correct it to 'pale fire ', yet if I do a search for 'canadian gire', it's clever enough to work out that I mean 'canadian tire '. I'm also continually amazed that people running other search services haven't yet realised just how vital this feature is - it's probably one of my favourite things about Google. Less so for monosyllables, but it's useful for words like "monosyllables". I'm particulary surprised that prominent online dictionaries don't have similar funcionality, seeing as I would imagine a large portion of their usage is to find the correct spelling of words.

  7. Re:Challenging Google's Revenue Model by sethawoolley · · Score: 5, Insightful

    From the RIAA threads we learn people don't want to pay as endusers for their content. Great post, except this part doesn't make any sense. I pay as an end user for content all the time, and not just for high-end data: Magazine subscriptions, membership in various societies (and their publications), newspapers, my ISP, government funding (I pay through taxes), direct donations to non-profits, contributions to wikipedia and other open content systems directly. While some of them are for high-end data, a lot of it is not.

    Is content going to ever be totally free? It will be if people understand the inherent rewards of an open society. Information's negligible cost of duplication is the revolutionary model is the thing that is shattering the old models (c.f. http://homes.eff.org/~barlow/EconomyOfIdeas.html). Wikipedia is already doing that. As much as I'm a critic of Jimmy Wales, citizendium, etc. (with their NPOV lunacy), the system he's helped build is saving people's lives and improving quality of life in ways the old world just doesn't understand yet.

    Personally, I'm hopeful that as long as we still have the Right to Read (c.f. http://www.gnu.org/philosophy/right-to-read.html), we're on the path to freedom and salvation. A corporation who makes up a new "model" to take advantage of content producers isn't going to take hold anymore. There's just not a point anymore. The price of content is already quite low for common knowledge. Even if the arbiters of knowledge try to keep it from common knowledge, we can paraphrase it. The greatest risk to real productive use of our knowledge still remains Patents. Information may finally be free, but the freedom to tinker is not.
  8. Re:Challenging Google? by STrinity · · Score: 5, Funny

    No, it is no response to Knol. I have been working on this for a year.
    I'm sorry, but your post cites primary sources and thus does not meet Wikipedia's standards.
    --
    Les Miserables Volume 1 now up with my reading of
  9. Re:Your track record says otherwise by Anonymous Coward · · Score: 5, Insightful

    Come to your wikipedia page?

    you mean the one that you have been documented (and here) not only editing, but wiping clean the edit history on, trying to bury your tracks?

    The game you're playing is dirty and how dare you come here unwilling to meet us on equal ground.