Wikipedia Founder Working on User-Powered Search
An anonymous reader writes "Jimmy Wales, founder of the Wikia corporation, has revealed plans to offer a user-driven search engine. Ars Technica reports that the plan is to leverage user preferences to pick the 'best' site for any given search term, while at the same time utilizing advertising for commercial gain. The article admits this may not be the ideal solution: 'Users may be reluctant to contribute to the betterment of a commercial site that may end up being bought by a bigger company. Consider, for example, the tragic death of TV Tome, a comprehensive community-driven television content guide that was eventually bought by CNET and transformed into a garish, excessively commercialized Web 2.0 monstrosity of significantly less value to users.' Just the same, Wales seems very enthusiastic in the Times Online article highlighting this venture."
TV Tome really was a fantastic site. A lot of the users that contributed there went over to a great TV wiki that sprung up after TV tome was sold, http://tviv.org/
"Moderate drinking can help prevent amputated limbs" -- Abigail Zuger, NYTimes, 12/31/02
He managed to find someone even cheaper than India to outsource?
Back in 1999 a company called Direct Hit Technologies developed what they coined a "popularity engine" that ranked results based on tracking user behavior. They basically partnered with existing search engines and mined their web logs looking for patterns among users. If a lot of people who entered search term "A" went to website "X" but within a minute or two went to website "Y" then "Y" would be ranked higher than "X". Direct Hit was bought in the middle of the internet boom by Ask Jeeves for a cool $512 million. Some of that technology likely still exists within their full search engine, and I'm sure others like Google, Yahoo, etc. all use similar methods of tracking user behavior for helping with their rankings.
Disclaimer: I am a founder of a competing search engine concept that is based on volunteers running distributed software that crawls pages, conducts partial analysis and indexing before passing over results to central server that will index data into main central index: human aspect here is the people who take part in the project, and participants can actually change ranking formulaes, and shortly will be able to assist in human detection of spam etc.
Searching the Web is a very challenging problem (that's why few companies do it): volume of data is huge and one only appreciates value of good algorithms when faced with situation when poor algorithms make stuff run for weeks failing near the end and you have to restart the run to wait another week. You can either try to handle this very big problem, which is very hard even if you have the money (look at Amazon's A9 funded with millions, yet they licensed Google's code and database), or you can try to reduce the problem: only focus on a handful of "important" pages - Yahoo did that when they were human edited directory/search engine hybrid.
It seems to me that Mr Wales entertains the illusion that a very small number of manually checked pages in the Web space will be sufficient to satisfy vast majority (and it has got to be 98%+ as I won't be hopping from one search engine to another) of search queries. If this was the case then we would still be using Yahoo that did pretty much just that, yet almost everyone (including Yahoo) moved to algorithmic search engines because it is the only way to handle billions of pages, and billions of pages you will have to handle: even if you just index homepages of all registered domain names you will be dealing with 100 mln+ pages, that's good 20 times more than articles in Wikipedia and checking pages can be far more duller than reading nice article you have some personal interest in.
What I find ironic that our own concept of the search engine was removed from Wikipedia because we were supposedly "not noteable enough", that's the sign how they handle problem of "too much data" in Wikipedia - they just reduce the problem by reducing datasets greatly, sometimes this is done wrongly, sometimes rightly and it might well work for Wikipedia, but it sure as hell won't work for Web scale searches. Oh, and by the way who said Google and others don't use human reviewers? They sure do, just check TrustRank, this link is ranked as #1 match on Google for search TrustRank! Notice what Wikipedia tells us: "While human experts can easily identify spam, it is too expensive to evaluate manually a large number of pages."
Human input plays an important (although fairly unknown as they prefer to keep it secret) role in the state of the art search engines, however suggestion that humans can handle billions of pages and/or that a handful of pages will be sufficient for a general purpose search engine is wrong and a very backwards move that will result in exactly the kind of wrong attitude present in Wikipedia now.
alexc
Join Majestic-12 Distributed Search Engine
OK, perhaps the new generation of Web entrepreneurs had better learn something from their "elders". We're about to see a lot of concepts that didn't work in the 90's be resurrected and funded. This has been tried and failed, badly:
An orphaned ref to Magellan, the human powered search engine
Didn't work before when there were a lot less sites out there, not likely to work this time, either.
jh
SearchSays.com is a user-based search engine that just recently launched, and has a growing contributor base. Looks like it might be a race, but I guess I'm not surprised it's already been started. There are so few original ideas on the Internet these days. As to the "it's already been done and failed" remarks - timing is everything isn't it? I could see something like this taking off with the current Web 2.0 craze.
//Whatever happened to, "Do what you do best. Forget the rest"?
Didn't you get the memo, it was scrapped along with web 1.0.
The new one is "Do what people suggest, and remember the REST".
And how would he handle shills? Bots? Trolls? Ok, google is struggling bad enough with link farms and the like but I can't really imagine end-users being the answer to this, even with some sort of meta-recommendation system. Personally, I find either a wikipedia search (for example now, recently I've been answering some christmas quizzes which it just excels at) or a very targeted google search (3-4 search terms) finds 99.9% of what I want, and the rest just isn't there.
Live today, because you never know what tomorrow brings
Google already has this, only they've decided that pigeons are best for the job.
Part of me welcomes new methods and new technology where search is concerned. However, the involvement of Mr Wales into this arena isn't one I welcome at all.
k -for-google-exactly-why? are likely to succeed where they have failed. Sure, brains aren't necessarily everything, but they really do help. I think no small amount of Google's success is the size of the brains behind it. It's why they have a competitive advantage in most markets they enter.
The only good thing about this is that possibly Wikipedia might be ousted from the primary or secondary page rank for most subjects. That is an authority most highly undeserved, and proof of nothing more than how far we need to go in terms of achieving accurate search.
I think (hope) this is just a piece of self publicity. I doubt they have the technology - judging by the fact that at peak times Wikipedia search shuts down and defaults to Google and Yahoo.
Interesting too, that while Google employs seriously smart people and is founded by seriously smart people, that Jimbo and whomever he cobbles together from the smart-search-technologists-who-decided-not-to-wor
We have seen very clearly that Wikipedia is extremely vulnerable to, and tainted with, group-think manipulation. (Jimbo's icon, Ayn Rand as one very tiny example of many). Why would anyone think this search will be in any way different. This looks just as vulnerable and easy to manipulate if you get a group together. Which every SEO blackhat on the planet will do on the day of launch. This looks much easier to manipulate than meta tags, or page rank.
I'm sure SEO blackhats and right wing organisations are foaming at the mouth with excitement at this wonderful Christmas announcement.
Hoi,
On a WMF mailing list Angela said that there was no substance to all this. I had also heard from other channels that there is not much to this.
So even though it is nice to speculate, there is not much to all this.
Thanks,
GerardM
I entered a bunch of my CDs into the music CD database CDDB (now Gracenote), thinking that the database's contents would remain public domain or at least freely usable. Then it was sold and the company that owned it forbade access by media applications without a fee. Thousands of people like me voluntarily built that thing, and now they wanted to sell it back to us. Any organization asking for my help on future projects had better have an ironclad guarantee that my work product will remain free to users.
That's easy. You need to leverage the new digital paradigm offered to us by Web 2.0 and the Semantic Web to effectively harness and integrate user-generated and user-driven content in a dynamic framework accessable over a simple user-oriented interface via a wireless broadband multiplexed link. Fool.
How many people can read hex if only you and dead people can read hex?