Wikipedia Founder Working on User-Powered Search
An anonymous reader writes "Jimmy Wales, founder of the Wikia corporation, has revealed plans to offer a user-driven search engine. Ars Technica reports that the plan is to leverage user preferences to pick the 'best' site for any given search term, while at the same time utilizing advertising for commercial gain. The article admits this may not be the ideal solution: 'Users may be reluctant to contribute to the betterment of a commercial site that may end up being bought by a bigger company. Consider, for example, the tragic death of TV Tome, a comprehensive community-driven television content guide that was eventually bought by CNET and transformed into a garish, excessively commercialized Web 2.0 monstrosity of significantly less value to users.' Just the same, Wales seems very enthusiastic in the Times Online article highlighting this venture."
Speaking of Wikipedia, it looks like they will hit 100 million edits today or tommorow for the English part. Quite an accomplishment.
I can just imagine the results involving controversial subjects.
Nevermind something as sedate as GWB or Blair or global warning or religion. What about vi vs emacs?
"It is a greater offense to steal men's labor, than their clothes"
SPAM.
Please explain how you're going to handle gaming the system by seo spammers.
Early rumors had him working with Amazon in the effort, but this should clear things up.
Google, Amazon, Opera, Mozilla, all are good ideas but as they expand their reach, they turn to crap. Google is going to Hell, Amazon is there, Opera likes the road, and Moz? They seem to be eyeing it.
Whatever happened to, "Do what you do best. Forget the rest"?
Politics is the art of looking for trouble, finding it everywhere, diagnosing it incorrectly and applying the wrong fix.
TV Tome really was a fantastic site. A lot of the users that contributed there went over to a great TV wiki that sprung up after TV tome was sold, http://tviv.org/
"Moderate drinking can help prevent amputated limbs" -- Abigail Zuger, NYTimes, 12/31/02
He managed to find someone even cheaper than India to outsource?
Back in 1999 a company called Direct Hit Technologies developed what they coined a "popularity engine" that ranked results based on tracking user behavior. They basically partnered with existing search engines and mined their web logs looking for patterns among users. If a lot of people who entered search term "A" went to website "X" but within a minute or two went to website "Y" then "Y" would be ranked higher than "X". Direct Hit was bought in the middle of the internet boom by Ask Jeeves for a cool $512 million. Some of that technology likely still exists within their full search engine, and I'm sure others like Google, Yahoo, etc. all use similar methods of tracking user behavior for helping with their rankings.
Disclaimer: I am a founder of a competing search engine concept that is based on volunteers running distributed software that crawls pages, conducts partial analysis and indexing before passing over results to central server that will index data into main central index: human aspect here is the people who take part in the project, and participants can actually change ranking formulaes, and shortly will be able to assist in human detection of spam etc.
Searching the Web is a very challenging problem (that's why few companies do it): volume of data is huge and one only appreciates value of good algorithms when faced with situation when poor algorithms make stuff run for weeks failing near the end and you have to restart the run to wait another week. You can either try to handle this very big problem, which is very hard even if you have the money (look at Amazon's A9 funded with millions, yet they licensed Google's code and database), or you can try to reduce the problem: only focus on a handful of "important" pages - Yahoo did that when they were human edited directory/search engine hybrid.
It seems to me that Mr Wales entertains the illusion that a very small number of manually checked pages in the Web space will be sufficient to satisfy vast majority (and it has got to be 98%+ as I won't be hopping from one search engine to another) of search queries. If this was the case then we would still be using Yahoo that did pretty much just that, yet almost everyone (including Yahoo) moved to algorithmic search engines because it is the only way to handle billions of pages, and billions of pages you will have to handle: even if you just index homepages of all registered domain names you will be dealing with 100 mln+ pages, that's good 20 times more than articles in Wikipedia and checking pages can be far more duller than reading nice article you have some personal interest in.
What I find ironic that our own concept of the search engine was removed from Wikipedia because we were supposedly "not noteable enough", that's the sign how they handle problem of "too much data" in Wikipedia - they just reduce the problem by reducing datasets greatly, sometimes this is done wrongly, sometimes rightly and it might well work for Wikipedia, but it sure as hell won't work for Web scale searches. Oh, and by the way who said Google and others don't use human reviewers? They sure do, just check TrustRank, this link is ranked as #1 match on Google for search TrustRank! Notice what Wikipedia tells us: "While human experts can easily identify spam, it is too expensive to evaluate manually a large number of pages."
Human input plays an important (although fairly unknown as they prefer to keep it secret) role in the state of the art search engines, however suggestion that humans can handle billions of pages and/or that a handful of pages will be sufficient for a general purpose search engine is wrong and a very backwards move that will result in exactly the kind of wrong attitude present in Wikipedia now.
alexc
Join Majestic-12 Distributed Search Engine
OK, perhaps the new generation of Web entrepreneurs had better learn something from their "elders". We're about to see a lot of concepts that didn't work in the 90's be resurrected and funded. This has been tried and failed, badly:
An orphaned ref to Magellan, the human powered search engine
Didn't work before when there were a lot less sites out there, not likely to work this time, either.
jh
In the summary quote he pretty much announces his intention to sell out at some point. More attention now leads to more money later, either through having a higher-profile name, or through suckering more people into developing his search rankings.
So the guy founded Wikipedia. Good for him. It doesn't mean he walks on water, and the advent of yet another search engine doesn't deserve front page of slashdot. Especially when you know its going to get swamped by spammers (or their bots) and quickly become useless.
Google Pigeonrank come true... http://www.google.com/technology/pigeonrank.html /knew/ this had to be a viable system, as long as you can find enough stupid beings to carry out the task ;-)
I
If Mr. Wales uses some kind of Netflix like system to guard against spam, we will have a problem. People who have entrenched beliefs will read sources that further those beliefs, further entrenching them. Every day we see how it's become harder to have meaningful debate due to polarization. So far, Europe has remained immune, but if the demagogues gain power it will happen there also. If This Goes On--
Inventions have long since reached their limit, and I see no hope for further development.-- Frontinus, 1st cent. AD
SearchSays.com is a user-based search engine that just recently launched, and has a growing contributor base. Looks like it might be a race, but I guess I'm not surprised it's already been started. There are so few original ideas on the Internet these days. As to the "it's already been done and failed" remarks - timing is everything isn't it? I could see something like this taking off with the current Web 2.0 craze.
Back in the late 90s, I used to recommend browsing downward to see if there was anything in the hierchical categorization of web content Yahoo set up as a complement to the many, but often irrelevant results people got back from AltaVista.
Fast forward to days of Google and Wikipedia and you have infinitely better "dumb" search, and an equally easy to use, generally decently accurate, and well contained treatment of a dizzying array of topics.
So, what's needed to fill the "search for information" gap? I doubt it's an attempt at hierarchical categorization, people don't try and absorb a whole lot of related content at once, they want an answer, and the navigation was a pain even back then. So then, just using people to try and make the dumb search results better? Well yes, but Google is continuously working to fill that gap itself. Nobody really comes close that I've seen. Even back in the late 90s those yahoo directories got stale quickly both in terms of dead links, and missing good newer content. Now, you can automatically test for and prune dead links, but results WILL get dated quickly.
Humans are SLOW, especially when you consider that their number is limited to those you hire if you want to avoid opening yourself up to the kind of spamming google has to contend with, which they're putting forward as a key differentiator in what they're trying. The web is several orders of magnitude bigger now than it was then, and it didn't work then, so what exactly makes Jimbo think it will work now?
TVTome is an excellent example of why free licensing matters. When a community has free licensing as the social framework to allow for forking, the infrastructure providers are forced to continue to provide good service, to prevent the community from forking and leaving.
Even Dmoz, for which I have great fondness and respect, has been crippled for years by a non-free license that allowed AOL to run it into the dirt. (See the recent 6 week server outage, for which there is simply no excuse.) (The Dmoz license is not the worst possible, mind you, but it is still problematic in a number of important ways.) And their software is totally non-free.
Wikia
And how would he handle shills? Bots? Trolls? Ok, google is struggling bad enough with link farms and the like but I can't really imagine end-users being the answer to this, even with some sort of meta-recommendation system. Personally, I find either a wikipedia search (for example now, recently I've been answering some christmas quizzes which it just excels at) or a very targeted google search (3-4 search terms) finds 99.9% of what I want, and the rest just isn't there.
Live today, because you never know what tomorrow brings
Google already has this, only they've decided that pigeons are best for the job.
How come advertising is an acceptable business model in polite company?! How come thumping one's chest with no backup data is an acceptable form of communication?! I'm getting REALLY tired by all the Web wannabes that think advertisement is a valid business model. Just give me a clean per-pay service for once!
Part of me welcomes new methods and new technology where search is concerned. However, the involvement of Mr Wales into this arena isn't one I welcome at all.
k -for-google-exactly-why? are likely to succeed where they have failed. Sure, brains aren't necessarily everything, but they really do help. I think no small amount of Google's success is the size of the brains behind it. It's why they have a competitive advantage in most markets they enter.
The only good thing about this is that possibly Wikipedia might be ousted from the primary or secondary page rank for most subjects. That is an authority most highly undeserved, and proof of nothing more than how far we need to go in terms of achieving accurate search.
I think (hope) this is just a piece of self publicity. I doubt they have the technology - judging by the fact that at peak times Wikipedia search shuts down and defaults to Google and Yahoo.
Interesting too, that while Google employs seriously smart people and is founded by seriously smart people, that Jimbo and whomever he cobbles together from the smart-search-technologists-who-decided-not-to-wor
We have seen very clearly that Wikipedia is extremely vulnerable to, and tainted with, group-think manipulation. (Jimbo's icon, Ayn Rand as one very tiny example of many). Why would anyone think this search will be in any way different. This looks just as vulnerable and easy to manipulate if you get a group together. Which every SEO blackhat on the planet will do on the day of launch. This looks much easier to manipulate than meta tags, or page rank.
I'm sure SEO blackhats and right wing organisations are foaming at the mouth with excitement at this wonderful Christmas announcement.
Why start by creating an entirely seperate search engine? I often find myself using Google just to search wikipedia becasue I am not sure how to spell the item I am looking for and the search box on Wikipedia needs an exact phrase. Why not improve that first and then consider expanding?
Saying Java is nice because it works on all OS's is like saying that anal sex is nice because it works on all genders.
Problems:
Also... much of the time Wikipedia's own unstable and limited search tool craps out and proceeds to a malfunction page telling people to do their Wikipedia search on Google or Yahoo instead. Not exactly an encouraging sign for WP's entry into the search engine space...
When all you have is a hammer, everything looks like a nail
My Starcraft 2 Blog
User Driven Search Engine?
Is that anything like ytmnd's voting system? Because if it is, Wiki will have it's own class of Downvoters.
Isn't http://www.stumbleupon.com/ already something like this?
They have a search functionality as well.
InfraSearch was another early attempt at collaborative search, based on Gnutella. Sun bought it for about $20 million in stock (estimated) then did nothing with it.
Hoi,
On a WMF mailing list Angela said that there was no substance to all this. I had also heard from other channels that there is not much to this.
So even though it is nice to speculate, there is not much to all this.
Thanks,
GerardM
Seriously, though, this whole deal of "User-Powered" looks like an attempt to play out the image of Wikipedia if front of ignorant public.
The fact is that many (of not all) search engines use human input to rank search results. For example, Google's PageRank is about links put on pages by whom? Humans, of course.
OK, so you found a new way of extracting rating info from humans? Let's talk about that, but please stop bringing this "People vs. Computers" nonsense.
I entered a bunch of my CDs into the music CD database CDDB (now Gracenote), thinking that the database's contents would remain public domain or at least freely usable. Then it was sold and the company that owned it forbade access by media applications without a fee. Thousands of people like me voluntarily built that thing, and now they wanted to sell it back to us. Any organization asking for my help on future projects had better have an ironclad guarantee that my work product will remain free to users.
I actually started a company doing much the same thing back in 2000, probably a bit ahead of its time. Even took out a patent . Unfortunately investors (actually almost everyone we talked to..) didn't have a clue what we were talking about, so we ran out of money.
Why do I live in such a small country, where nobody has a clue.... sigh...
Anybody got a job working with interesting people that can actually think ????
Before Mr. Wales expands his empire to cash in on search, maybe he ought to invest some development in improving Wikipedia's search feature wich is almost useless, one of the least useful search features I've ever seen. To search Wikipedia, I use Google.
If you want news from today, you have to come back tomorrow.
Moreover, doesn't Google already have the ability to make something like this possible, and, in fact, has already implemented part of it? On my Google toolbar, when I type in a common search term, the input text also outputs how many users searched for the same item. It also suggests popular search terms (listed next to the number of searches), as I type...
It'd have to solve many of the current probs with the W, for one. Prob's such as accuracy, which apparently, said proposer doesn't believe W should be trusted for. Not to mention filtering for biased-users who'd get all their friends to promote irrelevant attachments to search terms, using the engine as a source of free publicity. (And speaking of "search and the W": the existing state of Wikipedia's current search is just horrendous. As for W's current state on filtering: perfectly good entries get marked for deletion without proper justification, while blatant propaganda goes by unnoticed.)
Moreover, if the system does get implemented -- that's if -- there'd still be at least one or two "incubation years," where users contribute enough to make the search engine useful, i.e., better than PageRank.
Crowdsourcing has its limitations. It takes time for all the people to get there to contribute. And, once they get there, since they're not paid for it, they'd only spend what little free time they have to contribute in-between their 80-hour weeks. Those who are paid for it... the majority of them will probably be paid for spawning biased results. (Imagine companies that spring up claiming they hire thousands to give "good rankings" on Wikisearchia.com.......)
It's a materialistic world, and people are... what they are.
microsoft,google,yahoo,cnet,newscorp.. to buy your services, just you wait..
You enter your search criteria, and the wikisearch engine tells you that your choice of topic is Not Notable, and for gods sake why not search for something important.
Maybe the state's highest function is to grind out insoluble problems. (Zelazny, Hall of Mirrors)
Prove me wrong, and you'd start the next .com boom...
Why not create a project that takes the Google search results, and then creates a collaborative layer on top of that? Sort of like GreaseMonkey for Google? I know Google will never alter their automated search results - this is where a project like this could come in. It would have all the power of Google, plus all the power of collaborative human input. Right off the bat, it would be at least as powerful as Google - and assuming human input is more beneficial than detrimental, it would only get better over time...
That is what I can never figure out with these community driven things. All these saps putting there time and energy into it, when they know that behind the scenes someone owns it and for them regardless of the personal investment or emotional attachment the bottom line (money) is going to call the shots at the end of the day.
I have quite a view of my old customers who know me personally from my webhosting company and they still can't understand how I would abandon them and sell the company, not that I wanted to, it was a financial decision.
Its like with the wikipedia phenomenom, are these people being paid for all the time they put into editing articles and reviewing content? Then possibly a year from now they get bought up by Google and the whole structure of the thing changes, then what?
People get a clue... why support some other venture, even if it is well organized and ground breaking etc..., get out there and start your own business or try to develop the next big thing! Thats what I did back in 1999.
Nathaniel P. Wilkerson
www.haidacarver.com
Wikipedia was a reasonably inspired idea... this one, of a user-driven search engine, is NOT. As a not-exactly-run-of-the-mill human who is already nearly drowning in the mediocrity and stupidity that surrounds me, I'm frankly terrified at the prospect that AVERAGE people will be entirely responsible for determining what results are returned by my search engine! I want the exceptional results that *I* can comprehend and appreciate, not the moronic ones that appeal to people of average intelligence and abilities. It already takes far too much time to coax those gems out of existing search engines; Wales' idea would make it utterly impossible. The search engine would drown in the same sea of mediocrity by which I'm engulfed.
This idea would actually be a search-engine implementation of "tyranny of the majority", to go micely hand-in-hand with spam blacklists ad nauseum. The founders of the United States tried their best to put safeguards in place to prevent said tyranny, and now dear Jimmy Wales wants to thwart that? Apparently Mr. Wales is now more interested in any wild idea that stands a ghost of a chance of trumping his one decent idea than in doing something truly useful or helpful. Useful or helpful are nor words descriptive of user-driven search engine results.
Doesn't DMOZ already do something like this? http://dmoz.org/
After it was merged with TV.com, many of the former TVTome user cannot reuse their accounts in tv.com, and many of their contributions in tvtome are lost under the new site. Furthermore, the new tv.com site is flooded with ads, and a new user (including former TVTome users) has to do LOTS of pointless reviews just to earn enough points to write episode guides.
I just posted this the other day on the Wisdom of Crowds article; here is the link:
What's the next logical step?
Search engines. Google's PageRank algorithm may point to highly rated *websites*, but searches themselves can be rated. Since most queries are less than 3 words, track where all less-than-3-word-queries go to, and rate *those* sites higher. Since humans are doing the searching, they will automatically tend to NOT go to splogs (based on their evaluations of the snippets that Google returns), thus dropping splog ratings while raising the ratings of legitimate sites: this is the very definition of "the wisdom of crowds". Google has the infrastructure to do this -- if they only would.
Added for the 'collaborative layer' thread: Here is a link to Yahoo's list of the top 10 web searches for 2006; as you can see, the average length of those searches was only *2* words. That makes it even easier. Three word queries, would, of course, give a deeper (and more accurate) metric.
Here is the link for Google's Zeitgeist, but, as you can see, it is not very useful for our purposes, since it just shows top 10 queries for a few subjects. However, it does support the 3-words-or-less theory.
We're thinking in the same direction. Do you know of any open-source search engines so that we could trace queries and their responses and test this theory? (All we need is one more person commenting in this thread and *we're* a crowd.) 8^D
DNA is a Turing machine. You, however, being dynamic and emergent, are not.