Slashdot Mirror


Could Open Source Lead to a Meritocratic Search Engine?

Slashdot contributor Bennett Haselton writes "When Jimmy Wales recently announced the Search Wikia project, an attempt to build an open-source search engine around the user-driven model that gave birth to Wikipedia, he said his goal was to create "the search engine that changes everything", as he underscored in a February 5 talk at New York University. I think it could, although not for the same main reasons that Wales has put forth -- I think that for a search engine to be truly meritocratic would be more of a revolution than for a search engine to be open-source, although both would be large steps forward. Indeed, if a search engine could be built that really returned results in order of average desirability to users, and resisted efforts by companies to "game" the system (even if everyone knew precisely how the ranking algorithm worked), it's hard to overstate how much that would change things both for businesses and consumers. The key question is whether such an algorithm could be created that wouldn't be vulnerable to non-merit-based manipulation. Regardless of what algorithms may be currently under consideration by thinkers within the Wikia company, I want to argue logically for some necessary properties that such an algorithm should have in order to be effective. Because if their search engine becomes popular, they will face such huge efforts from companies trying to manipulate the search results, that it will make Wikipedia vandalism look like a cakewalk." The rest of his essay follows.

This will be a trip into theory-land, so it may be frustrating to users who dislike talk about "vaporware" and want to see how something works in practice. I understand where you're coming from, but I submit it's valuable to raise these questions early. This is in any case not intended to supplant discussion about how things are things are currently progressing.

First, though, consider the benefits that such a search engine could bring, both to content consumers and content providers, if it really did return results sorted according to average community preferences. Suppose you wanted to find out if you had a knack for publishing recipes online and getting some AdSense revenue on the side. You take a recipe that you know, like apple pie, and check out the current results for "apple pie". There are some pretty straightforward recipes online, but you believe you can create a more complete and user-friendly one. So you write up your own recipe, complete with photographs of the process showing how ingredients should be chopped and what the crust mixture should look like, so that the steps are easier to follow. (Don't you hate it when a recipe says "cut into cubes" and you want to throttle the author and shout, "HOW BIG??" It drove me crazy until I found CookingForEngineers.com.) Anyway, you submit your recipe to the search engine to be included in the results for "apple pie", and if the sorting process is truly meritocratic, your recipe page rises to the top. Until, that is, someone decides to surpass you, and publishes an even more user-friendly recipe, perhaps with a link to a YouTube video of them showing how to make the pie, which they shot with a tripod video camera and a clip-on mike in their well-lit kitchen. In a world of perfect competition, content providers would be constantly leapfrogging each other with better and better content within each category (even a highly specific one like apple pie recipes), until further efforts would no longer pay for themselves with increased traffic revenue. (The more popular search terms, of course, would bring greater rewards for those listed at the top, and would be able to pay for greater efforts to improve the content within that category.) But this constant leapfrogging of better and better content requires efficient and speedy sorting of search results in order to work. It doesn't work if the search results can be gamed by someone willing to spend effort and money (not worth it for the author of a single apple pie recipe, but worth it for a big money-making recipe site), and it doesn't work if it's impossible for new entrants to get hits when the established players already dominate search results.

Efficient competition benefits consumers even more for results that are sorted by price (assuming that among comparable goods and services, the community promotes the cheapest-selling ones to the top of the search results, as "most desirable"). If you were a company selling dedicated Web hosting, for example, you would submit your site to the engine to be included in results for "dedicated hosting". If you could demonstrate to the community that your prices and services were superior to your competitors', and if the ranking algorithm really did rank sites according to the preferences of the average user, your site could quickly rise to the top, and you'd make a bundle on new sales -- until, of course, someone else had the same idea and knocked you out of the top spot by lowering their prices or improving their services. The more efficient the marketplace, the faster prices fall and service levels rise, until the prices just covered the cost of providing the service and compensating the business owner for their time. It would be a pure buyer's market.

It's important to precisely answer the question: Why would this system be better than a system like Google's search algorithm, which can be "gamed" by enterprising businesses and which doesn't always return the results first that the user would like the most? You might be tempted to answer that in an inefficient marketplace created by an inefficient search result sorting algorithm, a user sometimes ends up paying $79/month for hosting, instead of the $29/month that they might pay if the marketplace were perfectly efficient. But this by itself is not necessarily wasteful. The extra $50 that the user pays is the user's loss, but it's also the hosting company's gain. If we consider costs and benefits across all parties, the two cancel out. The world as a whole is not poorer because someone overpaid for hosting.

The real losses caused by an inefficient search algorithm, are the efforts spent by companies to game the search results (e.g. paying search engine optimization firms to try and get them to the top Google spot), and the reluctance of new players to enter that market if they don't have the resources to play those games. If two companies each spend $5,000 trying to knock each other off of the top spot for a search like "weddings", that's $5,000 worth of effort that gets burned up with no offsetting amount of goods and services added to the world. This is what economists call a deadweight loss, with no corresponding benefit to any party. The two wedding planners might as well have smashed their pastel cars into each other. Even if a single company spends the effort and money to move from position #50 to position #1, that gain to them is offset by the loss to the other 49 companies that each moved down by one position, so the net benefit across all parties is zero, and the effort that the company spent to raise their position would still be a deadweight loss.

On the other hand, if search engine results were sorted according to a true meritocracy, then companies that wanted to raise their rankings would have to spend effort improving their services instead. This is not a deadweight loss, since these efforts result in benefits or savings to the consumer.

I've been a member of several online entrepreneur communities, and I'd conservatively estimate that members spend less than 10% of the time talking about actually improving products and services, and more than 90% of the time talking about how to "game" the various systems that people use to find them, such as search engines and the media. I don't blame them, of course; they're just doing what's best for their company, in the inefficient marketplace that we live in. But I feel almost lethargic thinking of that 90% of effort that gets spent on activities that produce no new goods and services. What if the information marketplace really were efficient, and business owners spent nearly 100% of their efforts improving goods and services, so that every ounce of effort added new value to the world?

Think of how differently we'd approach the problem of creating a new Web site and driving traffic to it. A good programmer with a good idea could literally become an overnight success. If you had more modest goals, you could shoot a video of yourself preparing a recipe or teaching a magic trick, and just throw it out there and watch it bubble its way up the meritocracy to see if it was any good. You wouldn't have to spend any time networking or trying to rig the results, you just create good stuff and put it out there. No, despite whatever cheer-leading you may have heard, it doesn't quite work that way yet -- good online businessmen still talk about the importance of networking, advertising, and all the other components of gaming the system that don't relate to actually improving products and services. But there is no reason, in principle, why a perfectly meritocratic content-sorting engine couldn't be built. Would it revolutionize content on the Internet? And, could Search Wikia be the project to do it, or play a part in it?

Whatever search engine the Wikia company produced, it would probably have such a large following among the built-in open-source and Wikipedia fan base, that traffic wouldn't be a problem -- companies at the top of popular search results would definitely benefit. The question is whether the system can be designed so that it cannot be gamed. I agree with Jimmy Wales's stated intention to make the algorithm completely open, since this makes it easier for helpful third parties to find weaknesses and get them fixed, but of course it also makes it easier for attackers to find those weaknesses and exploit them. If you think Microsoft paying a blogger to edit Wikipedia is a problem, imagine what companies will do to try and manipulate the search results for a term like "mortgage". So what can be done?

The basic problem with any community that makes important decisions by "consensus" is that it can be manipulated by someone who creates multiple phantom accounts all under their control. Then if a decision is influenced by voting -- for example, the relative position of a given site in a list of search results -- then the attacker can have the phantom accounts all vote for one preferred site. You can look for large numbers of accounts created from the same IP address, but the attacker could use Tor and similar systems to appear to be coming from different IPs. You could attempt to verify the unique identity of each account holder, by phone for example, but this requires a lot of effort and would alienate privacy-conscious users. You could require a Turing test for each new account, but all this means is that an attacker couldn't use a script to create their 1,000 accounts -- an attacker could still create the accounts if they had enough time, or if they paid some kid in India to create the accounts. You could give users voting power in proportion to some kind of "karma" that they had built up over time by using the site, but this gives new users little influence and little incentive to participate; it also does nothing to stop influential users from "selling out" their votes (either because they became disillusioned, or because they signed up with that as their intent from the beginning!).

So, any algorithm designed to protect the integrity of the Search Wikia results would have to deal with this type of attack. In a recent article about Citizendium, a proposed Wikipedia alternative, I argued that you could deal with conventional wiki vandalism by having identity-verified experts sign off on the accuracy of an article at different stages. That's practical for a subject like biology, where you could have a group of experts whose collective knowledge covers the subject at the depth expected in an encyclopedia, but probably not for a topic like "dedicated hosting" where the task is to sift through tens of thousands of potential matches and find the best ones to list first. You need a new algorithm to harness the power of the community. I don't know how many possible solutions there are, but here is one way in which it could be done.

Suppose a user submits a requested change to the search results -- the addition of their new Site A, or the proposal that Site A should be ranked higher. This decision could be reviewed by a small subset of registered users, selected at random from the entire user population. If a majority of the users rate the new site highly enough as a relevant result for a particular term, then the site gets a high ranking. If not, then the site is given a low ranking, possibly with feedback being sent to the submitter as to why the site was not rated highly. The key is that the users who vote on the site have to be selected at random from among all users, instead of letting users self-select to vote on a particular decision.

The nice property of this system is that an attacker can't manipulate the voting simply by having a large number of accounts at their control -- they would have to control a significant proportion of accounts across the entire user population, in order to ensure that when the voters were selected randomly from the user population, the attacker controlled enough of those accounts to influence the outcome. (If an attacker ever really did spend the resources to reach that threshold point, and it became apparent that they were manipulating the votes, those votes could be challenged and overridden by a vote of users whose identities were known to the system. This would allow the verified-identity users to be used as an appeal of last resort to block abuse by a very dedicated adversary, while not requiring most users to verify their identity. This is basically what Jimmy Wales does when he steps in and arbitrates a Wikipedia dispute, acting as his own "user whose identity is known".)

This algorithm for an "automated meritocracy" (automeritocracy? still not very catchy at 7 syllables) could be extended to other types of user-built content sites as well. Musicians could submit songs to a peer review site, and the songs would be pushed out to a random subset of users interested in that genre, who would then vote on the songs. (If most users were too apathetic to vote, the site could tabulate the number of people who heard the song and then proceeded to buy or download it, and count those as "votes" in favor.) If the votes for the song are high enough, it gets pushed out to all users interested in that genre; if not, then the song doesn't make it past the first stage. If there are 100,000 users subscribed to a particular genre, but it only takes ratings from 100 users to determine whether or not a song is worth pushing out to everybody, that means that when "good" content is sent out to all 100,000 people but "bad" content only wastes the time of 100 users, the average user gets 1,000 pieces of "good" content for every 1 piece of "bad" content. New musicians wouldn't have to spend any time networking, promoting, recruiting friends to vote for them -- all of which have nothing to do with making the music better, and which fall into the category of deadweight losses described above.

An automeritocracy-like system could even be used as a spam filter for a large e-mail site. Suppose you want to send your newsletter to 100,000 Hotmail users (who really have signed up to receive it). Hotmail could allow your IP to send mail to 100,000 users the first time, and then if they receive too many spam complaints, block your future mailings as junk mail. But if that's their practice, there's nothing to stop you from moving to a new, unblocked IP and repeating the process from there. So instead, suppose that Hotmail stores your 100,000 received messages temporarily into users' "Junk Mail" folders, but selectively releases a randomly selected subset of 100 messages into users' inboxes. Suppose for arguments' sake that when a message is spam, 20% of users click the "This is spam" button, but if not, then only 1% of users click it. Out of the 100 users who see the message, if the number who click "This is spam" looks close to 1%, then since those 100 users were selected as a representative sample of the whole population, Hotmail concludes that the rest of the 100,000 messages are not spam, and moves them retroactively to users' inboxes. If the percentage of those 100 users who click "This is spam" is closer to 20%, then the rest of the 100,000 messages stay in Junk Mail. A spammer could only rig this system if they controlled a significant proportion of the 100,000 addresses on their list -- not impossible, but difficult, since you have to pass a Turing test to create each new Hotmail account.

The problem is, there's a huge difference between systems that implement this algorithm, and systems that implement something that looks superficially like this algorithm but actually isn't. Specifically, any site like HotOrNot, Digg, or Gather that lets users decide what to vote on, is vulnerable to the attack of using friends or phantom users to vote yourself up (or to vote someone else down). In a recent thread on Gather about a new contest that relied on peer ratings, many users lamented the fact that it was essentially rigged in favor of people with lots of friends who could give them a high score (or that ratings could be offset unfairly in the other direction by "revenge raters" giving you a 1 as payback for some low rating you gave them). I assume that the reason such sites were designed that way is that it just seemed natural that if your site is driven by user ratings, and if people can see a specific piece of content by visiting a URL, they should have the option on that page to vote on that content. But this unfortunately makes the system vulnerable to the phantom-users attack.

(Spam filters on sites like Hotmail also probably have the same problem. We don't know for sure what happens when the user clicks "This is spam" on a piece of mail, but it's likely that if a high enough percentage of users click "This is spam" for mail coming from a particular IP address, then future mails from that IP are blocked as spam. This means you could get your arch-rival Joe's newsletter blacklisted, by creating multiple accounts, signing them up for Joe's newsletter, and clicking "This is spam" when his newsletters come in. This is an example of the same basic flaw -- letting users choose what they want to vote on.)

So if the Wikia search site uses something like this "automeritocracy" algorithm to guard the integrity of its results, it's imperative not to use an algorithm vulnerable to the hordes-of-phantom-users attack. Some variation of selecting random voters from a large population of users would be one way to handle that.

Finally, there is a reason why it's important to pay attention to getting the algorithm right, rather than hoping that the best algorithm will just naturally "emerge" from the "marketplace of ideas" that results from different wiki-driven search sites competing with each other. The problem is that competition between such sites is itself highly inefficient -- a given user may take a long time to discover which site provides better search results on average, and in any case, it may be that Wiki-Search Site "B" has a better design but Wiki-Search Site "A" had first-mover advantage and got a larger number of registered users. When I wrote earlier about why I thought the Citizendium model was better than Wikipedia, several users pointed out that it may be a moot point, for two main reasons. First, most users will not switch to a better alternative if it never occurs to them. Second, for sites that are powered by a user community, it's very hard for a new competitor to gain ground, even with a superior design, if the success of your community depends on lots of people starting to use it all at once. You could write a better eBay or a better Match.com, but who would use it? Your target market will go to the others because that's where everybody else is. Citizendium is, I think, a special case, since they can fork articles that started life on Wikipedia, so Wikipedia doesn't have as huge of an advantage over them as they would if Citizendium had to start from scratch. But the general rule about imperfect competition still applies.

It's a chicken-and-egg problem: You can have Site A that works as a pure meritocracy, and Site B that works as an almost-meritocracy but can be gamed with some effort. But Site B may still win because the larger environment in which they compete with each other, is not itself a meritocracy. So we just have to cross our fingers and hope that Search Wikia gets it right, because if they don't, there's no guarantee that a better alternative will rise to take its place. But if they get it right, I can hardly wait to see what changes it would bring about.

148 comments

  1. I don't think it will beat pigeon ranking... by filesiteguy · · Score: 1, Funny

    Seriously: What could an OSS-based user-submitted search algorithm do that Pigeon Rank - http://www.google.com/technology/pigeonrank.html - couldn't? If a team of highly trained pigeons can build an empire like Google, then I seriously doubt that user-based indexing would work.

    Am I wrong?

    1. Re:I don't think it will beat pigeon ranking... by Anonymous Coward · · Score: 0

      Well, if the users submitting to the OSS search algorithm were monkeys, I think they could take Google. Monkeys are definitely superior to pigeons. Let's not get started about bears.

    2. Re:I don't think it will beat pigeon ranking... by homey+of+my+owney · · Score: 1

      I think so. The problem with Google for me is the crap results I get on my searches, that wind up near the top of the results. This comes from the focus of ad revenue, which is never discussed. Certainly I am not alone when I find that I've wasted time visiting a page that has NOTHING to do with what I'm looking for... but it's got a lot of ads for it!

  2. I'll take the Off-topic hit for this by UbuntuDupe · · Score: 4, Interesting
    I like the essay except for this:

    "The real losses caused by an inefficient search algorithm, are the efforts spent by companies to game the search results (e.g. paying search engine optimization firms to try and get them to the top Google spot), and the reluctance of new players to enter that market if they don't have the resources to play those games. If two companies each spend $5,000 trying to knock each other off of the top spot for a search like "weddings", that's $5,000 worth of effort that gets burned up with no offsetting amount of goods and services added to the world. This is what economists call a deadweight loss, with no corresponding benefit to any party."


    This issue has long bugged me and it's hard to get answers about it. I don't understand how this is a deadweight loss (DWL) by his definition. Who got the $5000 worth of effort from each of them that they spent? That was the corresponding benefit to another party. How is this DWL different from the "non-DWL" example directly preceding, in which someone overpaid for hosting, but that was the hosting company's gain?

    Does anyone have a rigorous DWL definition that can be backed up by a valid example?
    1. Re:I'll take the Off-topic hit for this by nine-times · · Score: 1

      Who got the $5000 worth of effort from each of them that they spent? That was the corresponding benefit to another party.

      The SEO expert? I don't really know about deadweight loss, but it does seem that nothing was gained by the exercise that was described, except somebody got to leech money off of the companies paying for SEO.

    2. Re:I'll take the Off-topic hit for this by maxume · · Score: 2, Insightful

      The company that loses put money into the advertising system; that money can very likely be re-purposed for other keywords or whatever. The time their employees spent gaming the system(to no benefit for the company) could have been spent on activities that were beneficial to the company. The employee doesn't care, but the employer would have been better off sending him for donuts or whatever.

      The amounts seem unlikely(of month of employee time with no realized benefit? bah.), but the concept is sound.

      --
      Nerd rage is the funniest rage.
    3. Re:I'll take the Off-topic hit for this by pkulak · · Score: 3, Informative

      Because the first example is equivalent to someone just handing the hosting company 50 bucks a month as a free gift. Money is exchanged, but nothing happens. In the second example, money is exchanged AND people work very hard for a long time to earn it and yet produce nothing. It would be like me paying you to dig a hole and then fill it in. The time you spend doing that is time you can't spend curing cancer.

    4. Re:I'll take the Off-topic hit for this by Pentagram · · Score: 1

      Who got the $5000 worth of effort from each of them that they spent? That was the corresponding benefit to another party.

      Yes, but it's just a transfer of money from one party to another; it's a zero-sum game. No wealth has been produced in the sense of some useful work being done. With respect to the hosting company example, the hosting company received the market price for a useful service, a positive benefit to both parties. (As far as I can see, the company did not overpay for hosting in the example).

      That seems to be the theory, anyway.

    5. Re:I'll take the Off-topic hit for this by Anonymous Coward · · Score: 0

      I've taken a few college econ classes. and my understanding is that in strict economics terms dead weight loss is the reduction in overall utility caused by any transaction that is not at the efficient price level or the efficient quantity level. going by that the example of overpaying for hosting will indeed cause a deadweight loss. the paying $5000 (lets just say it is all a direct bribe to the search engine) may be the efficient price for that transaction since it is agreed upon by both parties and the ammount is probably driven pretty close to some market value since every big site is trying to bribe the search engine. however it still creates an innefficient market because it constitutes what economists call 'rent-seeking behavior' which you can probably look up on wikipedia and get a clearer definition than I can give you, but it is basicaly spending money or effort to increase ones wealth without actually doing anything. The classic examples tend to be forms of government corruption, but I think certain subsets of advertising are also used as basic examples. I hope that helps a little.

    6. Re:I'll take the Off-topic hit for this by UbuntuDupe · · Score: 1

      Well, no, that's not the theory, hence the problem. The definition of the DWL given in the essay (and in treatments of the topic) is a loss "with no corresponding benefit for another party". Whoever got the $5000 benefited; hence it cannot be a DWL. The loss of the search-engine-gamers was the gain of whoever they paid. It doesn't matter if wealth/useful-work has been produced or hasn't. Even in a zero-sum transfer, someone benefits. For it to be a true DWL (by the definition), it must be that no one benefits.

    7. Re:I'll take the Off-topic hit for this by geoffspear · · Score: 2, Funny

      Anyone who's going to take your money to spend all day digging holes and filling them in is probably unlikely to come up with a cure for cancer regardless of how much you pay them to do research.

      --
      Don't blame me; I'm never given mod points.
    8. Re:I'll take the Off-topic hit for this by Anonymous Coward · · Score: 0

      The time you spend doing that is time you can't spend curing cancer. why are you posting on slashdot when you could be curing cancer?
    9. Re:I'll take the Off-topic hit for this by UbuntuDupe · · Score: 1

      To be honest, AC, it doesn't help.

      in strict economics terms dead weight loss is the reduction in overall utility caused by any transaction that is not at the efficient price level or the efficient quantity level.

      Okay, but what does that *mean*? The problem here is that the jargon is obscuring understanding of the concept of a DWL. What does it mean for one price or quantity level to be efficient? I think when you unravel the terms, you see it's basically circular. Try if you disagree.

      the paying $5000 (lets just say it is all a direct bribe to the search engine) may be the efficient price for that transaction since it is agreed upon by both parties and the ammount is probably driven pretty close to some market value since every big site is trying to bribe the search engine. however it still creates an innefficient market because it constitutes what economists call 'rent-seeking behavior'

      Okay, but now you're justifying its classification as a DWL on different grounds than the original author proposed.

      which you can probably look up on wikipedia and get a clearer definition than I can give you,

      No, I understand what rent-seeking is, and how it's wasteful. My point is just that it can't be attacked as being a DWL, because someone certainly does benefit -- the rent-seekers. Yes, there are other (very good reasons) that explain why it's bad, but when you start to appeal to the concept of a Paretian improvement/worsening, which is what the DWL does, you reach a contradiction and can't critique it on those grounds.

    10. Re:I'll take the Off-topic hit for this by jonbryce · · Score: 1

      The Search Engine Optimisation expert gets the money. One of the things he will likely do is make the site compliant with the W3C's accessibility guidelines, as this will likely improve search ranking. That does benefit society as a whole. But other techniques such as url cloaking and keyword stuffing do not benefit society as a whole, so having scarce resources devoted to these tasks is suboptimal as far as the economy is concerned.

    11. Re:I'll take the Off-topic hit for this by Anonymous Coward · · Score: 1, Funny

      A dead weight loss is when idiots waste time arguing on the internet about badly-written, poorly thought-out essays that no one is going to read instead of engaging in more productive activities like downloading pornographic materials or playing World of Warcraft.

      HTH

    12. Re:I'll take the Off-topic hit for this by Anonymous Coward · · Score: 1, Interesting

      How about the constant accountant/tax battle? People who try and avoid paying taxes are pitted against Tax Offices around the world.

    13. Re:I'll take the Off-topic hit for this by dovf · · Score: 1

      I don't know anything about economics, but it seems to me that the real DWL is the fact that these companies have invested the $5000 in something which produced nothing, rather than having invested it in whatever it is they do, and thus actually producing something. So you have one situation in which $10000 were spent, and you have something to show for it, and another situation in which the same $10000 were spent, but there's nothing to show for it. So the real DWL is the "something vs. nothing", not the $10000, which are spent in either situation.

    14. Re:I'll take the Off-topic hit for this by UbuntuDupe · · Score: 1

      So the real DWL is the "something vs. nothing", not the $10000, which are spent in either situation.

      I agree you can use a more rigorous conception of tha DWL, but like with the other responders, that wasn't the definition the original author used. In that definition, what makes it a DWL was that there was a loss *not corresponding to any gain*. While the *net* gain (across all people) may be zero, or even negative, the people they paid to (futilely) improve the search engine ranking certainly did gain a benefit that corresponded to their loss. Hence, my confusion about the the common use of the DWL term.

    15. Re:I'll take the Off-topic hit for this by inviolet · · Score: 1

      This issue has long bugged me and it's hard to get answers about it. I don't understand how this is a deadweight loss (DWL) by his definition. Who got the $5000 worth of effort from each of them that they spent? That was the corresponding benefit to another party. How is this DWL different from the "non-DWL" example directly preceding, in which someone overpaid for hosting, but that was the hosting company's gain?

      The search-engine received all the benefits of the efforts, but those benefits cancelled each other out.

      And even if they didn't cancel, the 'benefit' is not useful to the search-engine, because it only amounted to a minor change in search-results ranking. The market itself (i.e. the users who are searching) would benefit by that change in ranking... but only if the change caused an objectively better company to win a higher slot.

      In the case of (your example) weddings consults, there are probably no significant differences in product quality among the major players. Therefore, there is no benefit to the world if Wedding Consultants #228 takes a larger share of the market than Wedding Consultants #854. Thus we arrive at the general and distrubing conclusion that marketing efforts consume a lot of wealth but often (usually?) create no net wealth in return. A marketing effort usually just steers consumers in a slightly different but meaninglessly equivalent direction.

      I'd hate myself if I worked in marketing. Of course it's a different story if the product you're pushing is one of the rare objectively superior products... but those don't seem to come up very often nowadays.

      --
      FATMOUSE + YOU = FATMOUSE
    16. Re:I'll take the Off-topic hit for this by UbuntuDupe · · Score: 1

      The search-engine received all the benefits of the efforts, but those benefits cancelled each other out.

      No, the SE didn't benefit (or at least not primarily). Rather, the workers they paid to game it, benefited. Thus it can't be a DWL by the definition -- the loss of some corresponded to the gain of those workers. It doesn't matter that there was a net loss after summing over all agents; that's not what DWL refers to. Hence my confusion with the concept.

    17. Re:I'll take the Off-topic hit for this by Elvis+Parsley · · Score: 4, Funny

      'cause somebody's paying him $5000 to.

    18. Re:I'll take the Off-topic hit for this by localman · · Score: 1

      Thanks... I think that's the best answer anyone came up with.

      Cheers.

    19. Re:I'll take the Off-topic hit for this by inviolet · · Score: 1

      No, the SE didn't benefit (or at least not primarily). Rather, the workers they paid to game it, benefited. Thus it can't be a DWL by the definition -- the loss of some corresponded to the gain of those workers. It doesn't matter that there was a net loss after summing over all agents; that's not what DWL refers to. Hence my confusion with the concept.

      Total social wealth is decreased when people employ others to perform useless tasks, such as battling over a search-engine slot. The SEO industry, like the realm of marketing in which it resides, is one millimeter above being a zero-sum game. And yet the people involved all consume a great deal of wealth in going through their contortions. That makes it a DWL... although that may or may not be how the term got used in the original post.

      Now I'm confused too.

      --
      FATMOUSE + YOU = FATMOUSE
    20. Re:I'll take the Off-topic hit for this by Anonymous Coward · · Score: 0

      How about the ultimate in dead weight loss.

      (1) Buy bullets
      (2) Shoot them at Iraqis

      You lose money on the bullets, and the Iraqis lose lives. The people you bought the bullets from could have been using that metal to build things that don't just get thrown away.

      Every gun that is made, every warship launched, every rocket fired signifies in the final sense, a theft from those who hunger and are not fed, those who are cold and are not clothed. This world in arms is not spending money alone. It is spending the sweat of its laborers, the genius of its scientists, the hopes of its children. [...] This is not a way of life at all in any true sense. Under the clouds of war, it is humanity hanging on a cross of iron.
      -- Dwight Eisenhower, April 16, 1953

    21. Re:I'll take the Off-topic hit for this by shmlco · · Score: 1

      The example is bad because it's defining the wrong outcome. If both spend five grand to have the "top spot", and the SEO can actually effect this type of outcome, then one of them will actually have the top spot and one the second. So using his "logic" the first one "won" and the second one "lost", so the second one's money was wasted, and a good portion of the first one's mony was spend simply competing with number two.

      However, since there are other results in the search the net result of "winning" and "losing" is that you're first and second in a list of a million or so entires. Hardly seems like a total waste to me.

      From my perspective, however, it would be better to the SEO types altogether, and avoid at all costs the SEO that "guarantees" you first place, as that's a guarantee that can't be made. Especially when most of them are making the same guarantee.

      --
      Any sect, cult, or religion will legislate its creed into law if it acquires the political power to do so.
    22. Re:I'll take the Off-topic hit for this by OldManAndTheC++ · · Score: 1

      The problem with "deadweight loss" is that it assumes that there is some sort of value to economic activity, and that some activities have more value (or "benefit") than others. So if I plant some seeds in the ground, and later sell the harvested grain on the open market, my planting activity is considered to be beneficial. Whereas if I spit those seeds at my bothersome neighbor, who has been ripping up my turnips in the middle of the night because I scoffed at his explanation of economic theory, I (and my neighbor) are looked down on as mere impediments.

      But who is to say what is beneficial, and what isn't? Why is using a semi to haul a million doses of badly needed vaccine to avert a flu epidemic considered virtuous, while feeding that same semi to Draco, the Dragonator in front of 30,000 screaming monster truck fans is thought of as wasteful?

      Ultimately, all human activity is pointless, so there is no reason to see gaming a search engine as any more or less important than, say, doing cancer research.

      --
      Soylent Green is peoplicious!
    23. Re:I'll take the Off-topic hit for this by dave1g · · Score: 1

      Thats only a fair analogy if there was a limit on the number of semi's available for transporting vaccine and one of them was used for a monster truck show instead.

      In that case I'm certain if you asked the attendees if they would be willing to take their money back so that the semi could go save lives you would get a near 100% approval.

    24. Re:I'll take the Off-topic hit for this by OldManAndTheC++ · · Score: 1

      I'm certain if you asked...(etc)

      Certainly, the first time you ask the attendees, they will agree wholeheartedly. And perhaps even the second or third times. But in a purely utilitarian world, there could never be any monster truck rallies until everyone had their vaccine, and all the little boys who had fallen down wells had been dug out, and all the lonely puppies and kittens in the animal shelters got to go to nice homes. After four or five of these episodes of re-routing semi's, people would start to wonder whether they would ever get to see a monster truck rally again.

      I guess that was my point: no one truly wants to live in a world like that.

      --
      Soylent Green is peoplicious!
    25. Re:I'll take the Off-topic hit for this by Bastard+of+Subhumani · · Score: 1

      True, but think about the guy paying him. He's no longer busy digging and filling, so he's got the spare time to find a cure. It's the law of comparative advantage. Or something.

      --
      Only three things are certain; death, taxes, and apocryphal quotations - Ben Franklin.
    26. Re:I'll take the Off-topic hit for this by Pentagram · · Score: 1

      Whoever got the $5000 benefited; hence it cannot be a DWL. The loss of the search-engine-gamers was the gain of whoever they paid. It doesn't matter if wealth/useful-work has been produced or hasn't.

      I think the dead loss is the time and effort expended by the workers of the SEO company and the administration of the paying company. All that work for a *net* benefit of no wealth.

    27. Re:I'll take the Off-topic hit for this by morie · · Score: 1

      Who? and will he be paying me too?

      --
      Sig (appended to the end of comments I post, 54 chars)
    28. Re:I'll take the Off-topic hit for this by Bastard+of+Subhumani · · Score: 1

      No, because your job is digging holes and then filling them in again.

      --
      Only three things are certain; death, taxes, and apocryphal quotations - Ben Franklin.
    29. Re:I'll take the Off-topic hit for this by Half-pint+HAL · · Score: 1

      I'm kind of with you on this.

      That's not to say I really care about the finer points of definition of DWL, but I'm baffled by the author's purpose in discussing this.

      The article suggests that a dead-weight-loss is bad, because it's money for labour which is in theoretical terms valueless. Fine -- I understand this. However, a non-dead-weight-loss is not a bad thing in the author's eyes. But if that non-DWL is pure profit (as with the hosting in his example), how is it any better?

      At the risk of sounding socialist, the financial costs incurred by unnecessary labour are generally distributed to a team of workers, whereas profits are concentrated on a much smaller set of shareholders.

      HAL.

      --
      Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
    30. Re:I'll take the Off-topic hit for this by UbuntuDupe · · Score: 1

      I understand there's no *net* wealth generated, but that's not what DWL refers to. For it to be a DWL, it must be the case that *no one* benefited. For example, stealing $X from me and giving it to you would have no net benefit, but it would not be called a DWL since my loss was your gain. In the example given in the original essay, the loss of the search engine manipulators and search engine users was the gain of the workers, who benefited from the money they got. That's a non-DWL in the same sense that the preceding "overpaying" example was a non-DWL.

      In short, the flaw in trying to call this (or anything, IMHO) a DWL is that you have to ignore the person who was paid as a result of the futile competition.

    31. Re:I'll take the Off-topic hit for this by Pentagram · · Score: 1

      In short, the flaw in trying to call this (or anything, IMHO) a DWL is that you have to ignore the person who was paid as a result of the futile competition.

      As you define it, I agree. I can't think of anything that would count as a DWL (except maybe literally throwing money away -- but then I suppose you've increased the value of everyone else's cash haven't you?)

      Anyway, Wales's point makes sense, even if his definition of a DWL doesn't agree with the literature (I've no idea if it does or not) and there is certainly a difference between his two examples (which was the point I thought you were disagreeing with).

      Assuming a closed system (let's call it society):

      Scenario 1 is that company A pays company B $5,000 dollars for web hosting. The net cash value change in society stays the same. Society gains the hosting of a website but loses the work done by the workers of company B to set up and maintain the hosting system.

      Scenario 2 is that company A pays company B $5,000 dollars for SEO. The net cash value change in society stays the same. Society loses the work done by the workers of company B to research and game the search engine, but gains nothing. This is a DWL as I understand Wales's interpretation.

    32. Re:I'll take the Off-topic hit for this by UbuntuDupe · · Score: 1

      First, I don't think it was Jim Wales that made the DWL point, but the contributor who quoted him. Otherwise, we're in agreement. I understand how there can be an efficiency loss as a result of the effort expended with no net benefit. But it can't be explained through the mechanism of a deadweight loss, which I consider theoretically unsound. If there were something that truly benefited no one (*as they perceive it*) and had a high cost, it would not be done to begin with, rendering the point moot.

      Small aside: I've always seen DWL as economists' sly way of saying "Oh, we're not making value judgments, no sir, we're just saying that some people were made worse off, while no one was made better off. You wouldn't support that ... would you?" But I think to advocate any useful efficiency gain, you have to say, "yes, this person is worse off -- he'd like to get the money from being paid to game a search engine -- but his small gain simply isn't justified by the tremendous losses incurred."

  3. Google by Anonymous Coward · · Score: 1, Insightful

    This is what Google already does - using linking as a proxy for the average desirability others have to see the content at the links end. As with all systems, it can be gamed. But it sure does a good job of returning results. It is so good, in fact, Google has not had to update its search syntax available to the general public in order to stay ahead of the competition. I wish Google would. Maybe some one else coming up with another way to have a meritocratic search engine will be the impetus for Google to improve this aspect. But do not pretend that Google does not already more or less do what is desired in providing results for a given search term. Google is meritocratic, and in probably the most neutral way possible, with its search results.

    1. Re:Google by GoCanes · · Score: 1

      Actually, Google is highly regressive in how it displays search results. Companies that can afford SEO tricks, renting links from other high PR sites, hiring staff to write useless content on blogs with links, etc will get the best results. The small company with a better mousetrap gets very little attention from Google. The chicken-and-egg problem will exist in the meritocracy too --- there's no way to rise unless people can find you.

    2. Re:Google by Anonymous Coward · · Score: 0

      Your argument seems to be since others can game the search result system, the system itself is regressive. This does not really make sense unless you are suggesting a system can be developed that cannot be gamed.

      A small company with a better product will get higher ranking from Google if other websites on the Internet have come to the same conclusion and linked to it. There is no better way to measure what people think is worth seeing on the internet than what those people have chosen to create links for on the internet, at least at this time. Come up with a better way and you too can be a billionaire.

  4. Google is more than just software. by Anonymous Coward · · Score: 0

    Even with an open source algorithm that was the 'best ever', the internet is big, and you're going to need huge amounts of computer power to compete with Google.

  5. Get Back To Me On This One by mpapet · · Score: 1

    Wikia search site uses something like this "automeritocracy" algorithm to guard the integrity of its results, it's imperative not to use an algorithm vulnerable to the hordes-of-phantom-users attack

    That right there is a billion-dollar idea that I'm sure more than a small horde of devs are working on for themselves or for vulture capitalists.

    Will Mr. Wales own the magic algorithm to use as he sees fit or what?

    --
    http://www.maxineudall.com/2010/02/should-economists-be-sued-for-malpractice.html
  6. Gaming Google by Bluesman · · Score: 1

    All you have to do to substantially reduce "gaming" the system is to not make it worthwhile.

    Since you can pay Google to have your site link placed right at the top of the search results, for less that what you'd pay someone to game the system to reach a similar position, it wouldn't make sense for large companies to try to "game" Google at all.

    If it weren't for the advertising, we'd probably see a lot more of this on Google.

    Maybe this project could implement something similar.

    --
    If moderation could change anything, it would be illegal.
    1. Re:Gaming Google by Anonymous Coward · · Score: 0

      Maybe this project could implement something similar.

      Err, wouldn't that defeat the entire purpose of what he's proposing. The point is to get "better" search results. If you allow others to somehow "buy" their place on the list, then what have you accomplished vs what is already there now?

    2. Re:Gaming Google by Bluesman · · Score: 1

      Google separates them by color, so it's easy to tell an ad vs. the real result.

      This project could do something similar.

      --
      If moderation could change anything, it would be illegal.
    3. Re:Gaming Google by Andrewkov · · Score: 1

      I don't think it's so simple. Google seems to be very commercialized. If you are trying to find reviews about a product, Google's top results always point at retailers selling the product. I find that other search engines will point you towards review sites and other third party info, rather than to marketing sites.

    4. Re:Gaming Google by skoaldipper · · Score: 1

      > Google seems to be very commercialized. [...] Google's top results always point at retailers selling the product.

      I agree. I get about 50/50 commercial to actual content in my results these days. I long for the good 'ole days of non commercial google lore. As soon as I find a viable contender, I'm hopping off this titantic at first sight of a lifeboat.

      And from the lengthy but insightful article, he talks about cost savings to the consumer and not having to compete with industries spending 5 thousand on optimizations. However, he uses an applie pie recipe as an example showing how consumers can increase their page rank (among other factors) by linking to youtube or putting more resources (in general) into their page. He thinks this levels the playing field for us consumers. However, companies will just funnel that 5 thousand from optimization engines into those phantom users he talks about instead. And those phantom users become that new optimization engine. How many of us have that type of disposable income to do the same, albeit in time or money?

      No. I was quite fascinated and hopeful before I first visited that wikia link from an earlier /. article. But when I saw the google ads to the right, it was like actually looking at the expiration date on a rank package of meat when you get home. The promise of wikipedia (to me anyways) is the ad free user content. I sense no financial motivation behind that content; only personal, which can be easily checked by such a meritocracy in place now. Remove any financial tentacles from this grand search engine, then I'll jump to that lifeboat.

      --
      I hope, when they die, cartoon characters have to answer for their sins.
  7. Merit is in the eye of the beholder by SirGarlon · · Score: 5, Insightful

    I seriously doubt this will turn into anything useful because it relies on a collective definition of "merit." When you and I search for information on the same topic, your needs and my needs may be totally differnt (I may be looking for a little bit of general background and you may be looking to compare and contrast the opinions of two recognized experts in the field). Even if all the hurdles against manipulation can be overcome, I don't see how "merit" rankings will amount to anything more than a popularity contest.

    --
    [Sir Garlon] is the marvellest knight that is now living, for he destroyeth many good knights, for he goeth invisible.
    1. Re:Merit is in the eye of the beholder by nine-times · · Score: 3, Insightful

      In fairness, I don't think that "merit" is relative with respect to search-engine results. In a simplified example, if I search for "sony", I'm probably looking for one of three things:

      1. The Sony website
      2. A website that sells Sony products
      3. A website that gives reviews of Sony products

      Therefore, the top results should reflect that. Most likely, I'm not looking for porn. I remember the days where search engines would return porn for any and all searches. The fact that Google was able to avoid this is part of what brought about its rise to power.

      Of course, not every example is so simple, but clearly there are results that are or are not correct for a given search.

    2. Re:Merit is in the eye of the beholder by timeOday · · Score: 2, Interesting

      I seriously doubt this will turn into anything useful because it relies on a collective definition of "merit."


      Good point. But furthermore, I can guarantee you this won't work, simply because web page rankings and spam filtering are essentially the same thing, and the spam issue has not been solved. That is, even when we don't have the problem of multiple conflicting opinions and all we're trying to do is model the preferences of a single recipient, we still can't do it!
  8. Patents by jimwelch · · Score: 1

    Let's see, when does Google Patents run out?

    --
    Never trust a man wearing a coat and tie!
  9. Re:Hrmmm by fyngyrz · · Score: 2, Insightful

    Won't work. Here's why, in a nutshell: There are huge numbers of sites on the net. There are not huge numbers of sets of people who will be willing to compare sites for relative merit (and there probably aren't even large numbers of such sets who can do so, even if you paid them for the results, which would be a huge cost that would not repay for most types of sites.)

    Sorry. Only computers can handle a task like this. It is automation or failure.

    --
    I've fallen off your lawn, and I can't get up.
  10. If they get this right... by Panaqqa · · Score: 1

    ...then I think the benefits could be tremendous, but whenever I hear the term "meritocracy" or it's derivatives, I start to get skeptical and/or nervous. One person's eyesore of a website could be someone else's lovingly tended but badly coded page that is popular with all their friends. Also, by definition, those who are willing to spend time in a "modified wiki" project such as this will likely be more technically oriented and likely have a bias against poor design and/or poor coding. Bear in mind that of the first million or so webpages out there, by far the majority were put together by "power users" who self taught HTML - and coded without any form of compliance with any W3C standard as it existed at that time.

  11. Economic inefficiencies by atomic777 · · Score: 2, Informative
    I especially like his point on the economic inefficiencies that result from Google's vulnerability to results manipulation or 'tweaking'. In a certain unnamed, small internet company I worked for, fully 10% of our staff were SEM/SEO people, and a good chunk of our development time was spent on projects led by them trying to optimize our page rankings. I'm sure we're not the only ones.


    If a theoretical "merit-based" search engine existed, those non-trivial resources would be spent building a better mousetrap, making our site faster, etc. I hope such an engine exists some day...

    1. Re:Economic inefficiencies by knewter · · Score: 1

      I think, realistically, if a merit-based search engine existed then the same number of resources would be spent trying and failing. Did you have marked improvement for the effort spent on SEO?

      --
      -knewter
  12. Meritocracy = aristocracy with genetics for wealth by Anonymous Coward · · Score: 0

    Meritocracy is aristocracy with genetics substituted for wealth. In both cases, it's about who your daddy was - just a different part of him. Such concepts are valuable neither in building good societies nor creating effective search engines.

  13. StumbleUpon by EricBoyd · · Score: 3, Informative

    It's not a "search engine" per-say but a lot of your talk of "automated meritocracy" sounds exactly like what StumbleUpon does in order to recommend content to users. People vote on a page, those votes are passed through an automated collaborative filtering system, and then the page is shown to more users who are predicted to like it, rinse lather and repeat. Good content rises to the top of the recommendation queue, so that new users (or people who just joined a category) are shown the things which the vast majority of people liked, in order to build up a rating history to personalize that person's recommendations.

    --
    augment your senses: http://sensebridge.net/
    1. Re:StumbleUpon by truthsearch · · Score: 1

      With the data they have they can probably build a very personalized search engine. With everything at the top having very positive votes from many users I imagine it would be less susceptible to gaming.

    2. Re:StumbleUpon by MajinBlayze · · Score: 1

      You, and many others, are missing the point, and failed even to RTFS (as long as it may be) it is worth reading if you beleive that this is an extension of Google's PageRank, or of any other voting site out there. The primary concept is this: Users cannot select what to vote for, preventing a subsect of users to overwhelm the voting. Instead random users are chosen (like meta-moderating on Slashdot) to vote yes or no. In order for SEO to be possible, the SEO company would have to own more than a majority of the entire population of the user base. The largest flaws, however, lie in 2 problems: there would be a constant reshuffeling of results based on new votes, and that it requires users to care about the result of their voting.

      --
      "Hate is baggage. Life's too short to be pissed off all the time." Danny Vinyard -American History X
    3. Re:StumbleUpon by Anonymous Coward · · Score: 0

      It's not a "search engine" per-say

      Nitpick: you mean per se. It's latin, pronounced just like "per-say" so I suppose you've usually heard the phrase spoken rather than written. I don't mean to be an asshole about it, I just thought you'd like to know the proper spelling of the phrase you're using.
  14. Two approaches to the search problem by currivan · · Score: 2, Interesting

    There are two main directions where search can improve. One is better understanding of natural language, to disambiguate query terms and provide results where the wording on pages is different from the wording of the query.

    The other, which this approach can address, is to improve the term relevance scores and overall page quality metrics that mainstream search engines are based on. Google had its initial success because of two features of this type: one was Page Rank, a measure of overall topic-independent site popularity, and two, bettor use of anchor text, the words people write when linking to other pages.

    In both cases, they mined the link structure of the web, which was essentially aggregate community generated information about site quality that wasn't being spammed at the time. As they succeeded, regular people put less effort into writing their own link text, and spammers took over.

    The next source of this type of community generated content will probably be something incidental instead of deliberately created. If you build a central repository of reviews of web sites, you both make it easy for people to game your results, and you open yourself up to lawsuits from interested parties.

    However, untapped information already exists on what people find useful on the web in the form of their browsing histories, a special case of this being their bookmarks. Someone who could aggregate this information on what millions of people ended up looking at after they ran a particular search query would be in an excellent position to improve the traditional search engine scoring algorithm beyond link data.

    1. Re:Two approaches to the search problem by russellh · · Score: 2, Interesting

      There are two main directions where search can improve. One is better understanding of natural language, to disambiguate query terms and provide results where the wording on pages is different from the wording of the query.
      I'm highly skeptical about this path because NL works best in a specified (narrow) context. So if you can specify the context, then you must have already put web pages into context - driven by what? the semantic web? If you've done that, then NL is almost redundant. Like, maybe I want to search for "reaction" in the context of "chemistry" but not medicine or politics. The ability to say this particular bit of information on the web is 30% political and 70% chemistry related is the kind of thing you want to get to and where NL and AI is ultimately useful for search; contextualizing information when it is gathered (or ideally, created), not during the search process.

      The other, which this approach can address, is to improve the term relevance scores and overall page quality metrics that mainstream search engines are based on. Google had its initial success because of two features of this type: one was Page Rank, a measure of overall topic-independent site popularity, and two, bettor use of anchor text, the words people write when linking to other pages.
      I hope that for search, "pages" go away and what we have is more structured information. I would say that there is a lot of work being done here, but it seems to be gigantic committee style work... maybe an open source search engine can provide a straightfoward way or standard for information providers to readily adopt that will really move it along instead of simply trying to adapt to all the crud that is already out there. The engine would have a model of the information universe, like a weighted tag graph, which is editable in a wiki style, to which information authors could link in their pages, like HTML meta tags. That's a fairly simple and easy scheme to start with.
      --
      must... stay... awake...
    2. Re:Two approaches to the search problem by SatanicPuppy · · Score: 1

      That will never work. Understanding natural language is hugely difficult for people, and mind-bogglingly difficult for computers. You have to account for the fact that meaning is contextual, meaning is not fixed, and that people make mistakes in their use of language.

      There is a whole branch of philosophy dedicated to theory of language, and I'd recommend books, but they're by and large so hopelessly abstruse that it would be little more than intellectual hazing if you don't already have pretty solid knowledge of the subject.

      Look at computer translation software...Even special purpose driven, it produces extremely clunky translations.

      What we really need is not a search engine that can figure out what we want, but instead a search engine that returns extremely accurate results for what we tell it we want. That puts the linguistic burden on us, and we're much better equipped to handle it.

      --
      ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
    3. Re:Two approaches to the search problem by MindStalker · · Score: 1

      You know, if what they are looking for is "what the average community perfers" Why don't they just implement a decent search engine, even better if they can just use googles. And then record clicks. If a result gets more clicks put it farther to the top. Only problem with this is it puts a barrier to new guys getting in. So maybe add some randomization to put non favorites towards the top but make sure the top 8 or so get on the front page no matter what.

    4. Re:Two approaches to the search problem by currivan · · Score: 1

      We tried using clickthroughs, but the data is very noisy. Users often don't know if a page is useful until they go to it, and they often open many pages from the same list of results. The best application turned out to be "how often is this link the last one people click on", but that assumes they're using the back button rater than opening several links in tabs.

      You also don't know if the user finds what they really want linked off of a result page, or if they give up. The skewing of clicks toward the top results is very strong, but it seems to vary by the type of query. To even get this data, you need redirect links, and the fact Google doesn't use them tells me they didn't find it useful either.

    5. Re:Two approaches to the search problem by MindStalker · · Score: 1

      Hmm, good point. If there only was a way for force a user to say. Yea.. this is what I wanted.. But alas... If you implemented search as a sidebar or something that was more integrated into the browser this would probably be easy.

  15. Bootvis' Theorem by Bootvis · · Score: 2

    Bootvis' Theorem states:
    It is not possible to create an algoritm that takes as input any dataset and a search query and outputs the results 'best' matching the query.

    I have truly marvellous proof but ...

    --
    Read, refresh, repeat.
  16. Efficient Labour markets by Dorceon · · Score: 1

    I like how the post talks about making search an efficient market, but completely discounts another important market that is already a lot closer to efficient: labour. If you're good enough to write an ungameable search engine, you're going to have substantial job offers from at least Google, Yahoo, and Microsoft.

    --
    What sound do people on rollercoasters make? Hint: it's not Xbox 360.
    1. Re:Efficient Labour markets by AmberBlackCat · · Score: 1

      I think any search algorithm is going to be as un-gameable as DRM is uncrackable. The goal would be to convince the big corporations that the algorithm is un-gameable long enough to collect the money.

  17. I had this idea a while ago by Paulrothrock · · Score: 1

    An open source search engine would be a good idea, except that the index would have to be hosted somewhere and indexed somehow.

    I'd gladly donate some spare processor cycles, hard drive space, and bandwidth to an open source search engine like a BOINC project.

    --
    I'm in the hole of the broadband donut.
    1. Re:I had this idea a while ago by rrohbeck · · Score: 1

      >I'd gladly donate some spare processor cycles, hard drive space, and bandwidth

      If it's along the lines of P2P apps, DHTs etc., this could really work.
      Kad already does a pretty good job of searching. Use something like it to point to Internet content, and use swarming for downloads... There's a Firefox extension waiting to happen.

  18. In open Source by Anonymous Coward · · Score: 0

    In Open Source, Engine searches you!

  19. Re:Hrmmm by eln · · Score: 3

    You're probably right. One question, though:

    What the hell does that have to do with the post you replied to? Stop piggybacking on nonsensical early posts to pump up your karma.

  20. Re:Meritocracy = aristocracy with genetics for wea by Dunbal · · Score: 0, Offtopic

    Such concepts are valuable neither in building good societies

    Considering every human society has a "privileged" class - call it what you will, aristocracy or otherwise - I would think that it's the only way to HAVE a society.

    --
    Seven puppies were harmed during the making of this post.
  21. Already-existing grassroots google by Bananatree3 · · Score: 3, Informative

    There already exists a distributed, open source engine which has been around a while, which is called Majestic 12. It uses a client-based search engine, which crawls the web for hundreds of millions of URLS, and then sends the data back to central servers. The servers than compile the data and use user-based searching algorithms to perform the search. While the algorithms are still very much in alpha, it is still a very noteworthy project. Also, its URL base is currently around 30-35 Billion URLs.

    1. Re:Already-existing grassroots google by TodMinuit · · Score: 1

      But Majestic 12 sucks. Mozdex, on the other hand, which is buiilt on open source technology. It's not the best search engine out there, but it's about a billion times better than Majestic 12.

      --
      I wonder if I use bold in my signature, people will notice my posts.
  22. Systems by their nature are always "gamed" by Anonymous Coward · · Score: 1, Insightful

    In general I see the termed "gamed" as subjective. When outcomes are matched to an individual's expectations, they see the system as working, when they disagree with the outcome, they call it gaming.

    As long as people are the engine behind this "pure meritocracy," the system will be gamed. I find the google results to be good enough that I am not looking for an alternative. Google provides the basis for research. If you want the best deals, you still have to shop around and do the due diligence. If you want to do research to still have to follow-up on the citations yourself. Anyone who suspends their own reason to defer exclusively to the magic algorithm, will get gamed.

    1. Re:Systems by their nature are always "gamed" by Kelson · · Score: 4, Interesting

      In general I see the termed "gamed" as subjective. When outcomes are matched to an individual's expectations, they see the system as working, when they disagree with the outcome, they call it gaming.

      Very true. For an example, look no further than the subset of SEO that sees no difference between settings up hundreds of automatically-generated pages linking to a site for the sole purpose of increasing search rankings and hundreds of individual people independently writing about (and linking to) a site. I've actually seen people in the linkfarm business claim that they're not doing anything different from bloggers.

      This is basically equivalent to saying that there's no difference between one person writing 10 letters to a politician under assumed names, and 10 people writing their own letters.

  23. fair and un-gamable rankings <> meritocracy by Anonymous Coward · · Score: 2, Insightful

    The use of a ranking system (even a fair and un-gamable one) is biased against a true meritocracy. If I'm looking for apple pie recipies, I (and likely anyone else looking for apple pie recipies) will pluck one from the top-ranked choices.

    This "top-10-cherry-picking" makes it highly unlikely that the possibly-superior newcomers will be seen. You have to be seen in order to be ranked up.

    It's only through "outside" mention (blogs, word-of-mouth, etc.) that newcomers have much of a chance of being looked at.

  24. Which Community? by RAMMS+EIN · · Score: 2, Insightful

    ``First, though, consider the benefits that such a search engine could bring, both to content consumers and content providers, if it really did return results sorted according to average community preferences.''

    It's also interesting to ask "which community?" There is a small number of categories of things that define some high percentage of the things I search for. I am pretty sure there is a very small intersection of those categories with the categories of things the world's population as a whole searches for. There are also differences based on location and language. In short, my preferences are almost certainly very different from the average of all searchers.

    On the other hand, there are definitely groups of searchers whose preferences coincide with mine. For example, people who are involved in open source development, *nix users, computer scientists, environmentalists, English speakers, and people in the Netherlands probably have preferences that largely overlap with mine.

    This suggests to me that some sort of machine learning might be used, where the system guesses your search preferences based on what links you have followed in the past, and what links other people have followed in the past. In other words, the system (implicitly) tries to determine which communities you are part of, and gives you results that are prefered by members of these communities.

    --
    Please correct me if I got my facts wrong.
  25. Someone please read this by Anonymous Coward · · Score: 0

    For three years now I have been waiting for someone to develop a new kind of search engine that I was sure was so obvious that it would happen any moment. I have made attempts to do it myself because the program is trivial to write but I just don't have the time(Ive got a company to run). So in the hopes that someone will read this and do what I propose so they can change the world and get rich and I will finally get the information I need here is the idea.

    (1) Scrape bookmarking sites and store lists of bookmarks associated with a user identifier.(delicious, digg, redit, ect..)

    (2) Take your own bookmarks and find other users with similar bookmarks (The number of bookmarks they have in common with you the more similar they are to you).

    (3) Print out the bookmarks of similar people that you don't already have.

    (4) Profit!

    This is so obviously more effective than a search engine. The information will begin to flow naturally to where it is needed spawning the next big information revolution. And information accelerates all other forms of technology.

    The singularity is nigh!!!!!!!!!!

  26. A popularity contest would be great by Per+Abrahamsen · · Score: 1

    The main reason Google was so much better than AltaVista was that it sorted the results according to a "popularity contest" based on how many other pages referred to it. This was way more useful than sorting according to how often your search term occur in it.

    Don't dismiss popularity contests, the popular choice will, almost by definition, usually be the most interesting choice for most people. You may not feel you belong to "most people", most people don't, but if you leave your feeling of elitism and/or alienation aside for a moment, chances are that the popular choice are often also the right one for you in most areas.

    Of course, a general search engine that would return the most popular hits "by people with similar tastes as me", would be even better. Such personalized answers results is already done in some book or movie recommendation systems.

    This doesn't preclude the need for a good baseline though, something that would put roses higher than dog poo in a "things that smell great" list.

    1. Re:A popularity contest would be great by ivan256 · · Score: 1

      This doesn't preclude the need for a good baseline though, something that would put roses higher than dog poo in a "things that smell great" list.

      That's exactly what this kind of system *doesn't* need. (Well, it needs it because if we don't use the same definition of "merit" for all users, or at least limit the numbers of definition of "merit" that are available this will become a computationally infeasible project... But let's talk theoretically).

      This theoretical system should learn through user feedback exactly what that particular individual's idea of 'merit' is. Using a baseline may reduce the time it would take for the system to be useful to any given user, would insert bias into the system. Bias is what everybody wants removed from search results because bias is exactly the thing that people use when they game the algorithm to put their page at the top of the results list. Any search engine that uses a static baseline definition of "merit" can be gamed by the operators of sites that want to move up the list.

      I don't think it's possible for any engine, open source or not, to do significantly better than google, since without the nearly infinite processing power required for individualized definitions of merit all you will be doing is replacing one form of bias for another. Perhaps you could come up with bias that is more to the tastes of some subset of users, but empirically that's no better than what we have already.

  27. Sounds really like peer ranking .. by roguegramma · · Score: 1

    Sounds like the algorithm he really wants to talk about is the one Highlander names "peer ranking system" on his page at Everything2.com: http://www.everything2.com/index.pl?node_id=152171 2

    I somehow believe that Google is quite aware of this algorithm and has already implemented it.

    --
    Hey don't blame me, IANAB
  28. Reminds me of Indiana Jones... by gd23ka · · Score: 1

    If you know from which Indiana Jones movie this scene is from tell me. I remember Jones facing off
    some huge Samurai with swords in the middle of a market place. The Samurai twirls his swords
    and delivers one hell of a impressive martial arts show before challenging Jones to attack.
    Jones instead just shrugs, draws his colt and shoots the Samurai point blank.

    With this analog in mind, it's easy for me to draw my colt and shoot this long missive down with
    one single argument: A Wikipedia-like process for a search engine means Administrators decide
    what is worthy of inclusion into the index and what is not. Administrators are voted in by peers
    so in order to become one he or she must consistently demonstrate the ruling orthodox attitude.
    So in the end we would get a political correct search engine much worse than Google. Connect that
    to the Mother Gaia complex deep in your ecosocialist asshole, Wales.

    Check out "Administrators" like this ndividual here http://www.google.co.za/search?hl=en&q=SlimVirgin& btnG=Search&meta=

    1. Re:Reminds me of Indiana Jones... by micromuncher · · Score: 1

      I think you're confused with Kill Bill. In Raiders of the Lost Arc, Indy's first encounter was with an Arab Egyptian and a scimitar. In Temple of Doom, it was Sikh with a khunda.

      --
      /\/\icro/\/\uncher
    2. Re:Reminds me of Indiana Jones... by Anonymous Coward · · Score: 0

      It's not that hard to work out which it was - there were only three movies and the second two were complete rubbish. So through logical analysis we can deduce that if there was a good scene it must have been in Raiders of the Lost Ark (the first one).

    3. Re:Reminds me of Indiana Jones... by Anonymous Coward · · Score: 0

      I guess the "long missive" was too long for you, and you didn't actually read it. In the proposed algorithm, users are chosen at random from the entire pool. No "Administrators" are involved.

    4. Re:Reminds me of Indiana Jones... by gd23ka · · Score: 1

      I guess my post was long enough for you to notice then, but too short for you to read. I was
      talking about applying de facto "Wikipedia process" to a search engine index. But even with
      editors chosen from a random pool there will undoubtedly be ways to force orthodox "consensus"
      and if only by installing a fast-track process of dealing with "vandals". You can be certain
      that thought outside the box is the last thing Jimmy-jimbo-Wales projects are about.

  29. Let a million algorithms bloom by DysenteryInTheRanks · · Score: 2, Interesting

    He's thinking about this all wrong.

    A true open source search engine would let anyone roll their own algorithm. Each algorithm would be a sort of "plug in."

    The index would be the shared, open source part, collaboratively crawled (via PC software or browser plugin) by everyone who elects to participate.

    Algorithms would either work on the index after the fact, or, if they need access to the indexing process itself, would be part of a series of plugins run on the full HTML of each page.

    The index itself would have an open API, so people could build their own front end search websites.

    Trying to design the right algorithm up front is a premature optimization. I have no interest in helping Jimmy Wales become the next Sergey Brin. But I *would* participate in something that gives _me_ a shot, however distant, at founding the next Google, minus the massive spider farm.

    1. Re:Let a million algorithms bloom by SanityInAnarchy · · Score: 1

      The index would be the shared, open source part, collaboratively crawled (via PC software or browser plugin) by everyone who elects to participate.

      The real trick is making this truly open in the Freenet kind of way -- no centralized servers at all (other than existing DNS and such).

      Think for a moment: Suppose Google allowed anyone to write a plugin of sorts to allow specialized kinds of searches, and extended their API to support any kind of frontend accessing these plugins. So, anyone could use Google's index in pretty much any way that the database is capable of (so long as they don't cause problems for anyone else).

      Let's ignore technological issues of where Google's going to get the brute force to be able to handle everyone's pet algorithms, not to mention preventing trolls and their forkbombs from damaging the system, while still allowing enough flexibility for everyone else. Setting those aside, there's still one very big issue: Google controls the index.

      And as long as it's on Google's servers, no one is really going to be satisfied that nothing fishy is going on. As much as I like Google as a company, that much is the truth, and is why I, personally, don't use my Gmail account for any email I really care about.

      The real trick, then, is to have a situation where setting up your own private Google is like setting up your own website. When I setup a website -- email, web, IM, anything -- I just have to buy a domain, which basically means I'm registered with a registrar and with ICANN. However, all I have to trust them to do is point people to my DNS server, and after that, absolutely everything can be on my own servers, in my basement if I like.

      I'm not sure I can even imagine how we could create a search -- a fast, usable search -- which is as distributed as that. Basically, it would have to be not much more vulnerable to manipulation and trolling than if I built my own Google-like index farm and stored absolutely everything locally. I sort of know how to do that kind of thing for files, for instance -- Freenet has theoretically solved that problem, but Freenet is slow as a dog, and I have no clue how to do anything like that for indexes. Even if you ignore the anonymity, how do you make it truly peer to peer, without the lag that implies?

      --
      Don't thank God, thank a doctor!
  30. The Quantum Bookkeepers by Jimekai · · Score: 2, Interesting

    Such an auto meritocracy could truly work if the self-pruning clustering algorithm created semantically-bound transactions in a feedback system that was designed at the outset to rival capitalism. I know that Google could be tweaked to do this, were it not for capitalist noses being unable to pick up on the scent of profit.

    --
    Argumentum ad Probabilitum
  31. MERITOCRATIC! by Jeremiah+Cornelius · · Score: 3, Funny

    Seems more likely to lead to Medeocratic.

    --
    "Flyin' in just a sweet place,
    Never been known to fail..."
  32. Jimmy knows it's gonna suck by El+Mariachi+94 · · Score: 1

    During the talk, Jimmy acknowledge that the Beta of the engine is gonna suck and the media is gonna shit all over it. When the beta is released, they're gonna type in bold letters "We know this sucks" to curve some of that negative karma from the press. At least he's realistic about the project. Check it out the video of Jimmy's NYU talk here: http://video.google.com/videoplay?docid=-741696809 2951113589 or download the MP3 here: http://homepages.nyu.edu/~gd586/Jimmy%20Wales%20-% 20NYU%20-%201-31-07.mp3

  33. User-based ranking is patented by IBM by Animats · · Score: 3, Interesting

    Rating by asking random users has been tried. At IBM. See United States Patent 7,080,064, Sundaresan July 18, 2006, "System and method for integrating on-line user ratings of businesses with search engines". Sundaresan has several patents related to schemes for asking users for ratings and using that info to adjust search rankings.

    The basic trouble with this approach is that, if you ask random users to rate random sites, they don't have enough time, energy, or effort to do a good job of it. If you ask self-selected users of the sites, the system can be gamed.

    This sort of thing only works where the set of things to rate is small compared to the interested user population. So it's great for movies, marginal for restaurants, and poor for websites generally.

  34. Google by RAMMS+EIN · · Score: 1

    I sometimes think that we already know the way to do searching - and Google has a patent on it.

    --
    Please correct me if I got my facts wrong.
  35. Couldn't be more wrong by Bluesman · · Score: 3, Insightful

    >The extra $50 that the user pays is the user's loss, but it's also the hosting company's gain.
    >If we consider costs and benefits across all parties, the two cancel out.
    >The world as a whole is not poorer because someone overpaid for hosting.

    And thus the broken window fallacy continues...

    Wealth is created through increased efficiency. A decrease in efficiency is a decrease in wealth, regardless of who benefits.

    By the "world is not poorer" logic, we might as well all ride horses, since we'd be paying oat producers and horseshoe manufacturers instead of the auto industry, so the world as a whole wouldn't be poorer.

    By paying more for inefficient hosting, that takes money away from more efficient uses.

    --
    If moderation could change anything, it would be illegal.
    1. Re:Couldn't be more wrong by bennetthaselton · · Score: 1

      If the seller is providing the hosting inefficiently, that's a separate problem, and the world is indeed poorer if someone set up their hosting company in a way that makes an inefficient use of resources. However if the hosting company has set up their hosting efficiently to minimize the cost to them, and they've simply gamed the system so that they can charge $79 for it and the customer won't find cheaper offers elsewhere, then the world isn't poorer.

      (It is of course true that when you're paying for more expensive hosting, there's a greater chance they've set it up inefficiently.)

      In either case though I don't think this falls under the Broken Window Fallacy, which states that money changing hands is something good (the boy broke the shop window, thus generating business for the window fixer -- but ignoring the costs to the shopkeeper). My example didn't involve a window being broken -- the hosting company may be overcharging, but they didn't create the problem that their service is attempting to solve.

    2. Re:Couldn't be more wrong by Bluesman · · Score: 1

      Let's say there's one web hosting company, that manages to get a law passed that allows them the exclusive right to sell web hosting. (Gaming the system, in your example.)

      Wouldn't it be true that the extra money they'd make would be better spent elsewhere, as the market would not bear such a price without the artificial scarcity caused by the law? Isn't that the reason everyone complains about monopolies and lack of competition?

      In such a case, the world is poorer, because the money isn't being used as efficiently as it could have. Arguing that it is used more efficiently in the hands of the web hosting company is to argue that that web hosting company knows more about the economy than the collective knowledge of its customers.

      The broken window fallacy exists because the opportunity cost is ignored -- what could that shopkeeper have done with the money he spent fixing the window if it hadn't been broken?

      The same logical error is used in the above article. He's ignoring the potentially better use that extra money spent on web hosting could have been put.

      --
      If moderation could change anything, it would be illegal.
    3. Re:Couldn't be more wrong by lennier · · Score: 1

      "By the "world is not poorer" logic, we might as well all ride horses, since we'd be paying oat producers and horseshoe manufacturers instead of the auto industry, so the world as a whole wouldn't be poorer."

      Given that horses run on renewable carbon-neutral fuels, are nanoscale self-assemblers, emit fully biodegradable wastes, and can operate efficiently on a far wider scale of terrain than any four-wheeled vehicle: I'm not so sure that's a particular good example of a technology change that made the world richer.

      Just sayin'.

      --
      You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC
    4. Re:Couldn't be more wrong by bennetthaselton · · Score: 1

      Even if the hosting company gets a law passed giving them a monopoly, they might still provide the hosting in the most cost-effective way possible (that's still in their best interest, after all, since it saves them money). Assuming the hosting company provides hosting cost-efficiently, the money that the customer pays them is indeed a lost opportunity for the customer, however it now represents spending opportunities for the hosting company. IF the hosting company is not providing the hosting cost-effectively, then indeed the world is poorer when you're rewarding the inefficient hosting. And then the broken window fallacy does definitely apply, because the money lost by the customer, is not offset by a gain to the hosting company -- because they've burned up an inefficiently large portion of that money to finance the hosting. However, it's true that even if the monopolist market leader is providing services efficiently, there is still some deadweight loss occurring because other companies, who could provide the same services more cheaply to people who can't afford the monopolist's higher prices, can't get their foot in the door. Therefore, transactions that will benefit both sides, never end up taking place. But even in that case, the losses are occurring because mutually beneficial transactions are blocked, not because the monopolist's customers are overpaying. Originally I mentioned that in the relevant paragraph in the article (yes, "he" is me :) ). It looks like I took it out and forgot to put it back in. I think if I mention the importance of "meritocracies" again I'd better make sure not to miss anything!

  36. Open Source hasn't even led by Anonymous Coward · · Score: 0

    to a meritocratic development model. The founding principle of Open Source is "If I make it, I make it my way, regardless of what mere users want." I'm not passing judgements on this principle, but look at the devs behind Gaim for instance -- they delete feature requests they don't like. They don't deny them. They *delete* them. Same with Mozilla devs. How is this going to change just because it's a website?

  37. PSSST: Merit for sale!!! by EmbeddedJanitor · · Score: 1
    With a few script changes, that whole spambot army out there could easily be rejigged as meritbots.

    Companies could very easily request/encourage/force employees to do a merit update every morning.

    Any system is open to abuse. At least the Google model is pretty easy to understand.

    --
    Engineering is the art of compromise.
  38. Meritocratic Search Doesn't Make Sense by logicnazi · · Score: 2, Insightful

    The author of this piece takes about meritocratic search as if it were some real fixed ordering of the search results that we just have to be smart enough to uncover. This is anything but the case. For instance is the recipe for apple pie that makes better tasting pie but is too complicated for the inexperienced chief to make better or worse than the one which is extremely easy to follow but isn't as good? When talking about pie this sort of issue might not be a big deal but what happens when we start talking about things like climate science. Is the best result some sort of environmental activists site, a mass media story, a global warming skeptic's site or the actual scientific results that are too technical for most of the public to understand?

    Sure, wikipedia makes these compromises quit well but the idea of content neutral encyclopedia entries provides a well defined goal. The second that we get to a search engine we can no longer cling to content neutrality because we must choose how to rank the advocacy sites on both sides of the spectrum. Unlike wikipedia where one can neutrally remark that some people believe X and others Y in a search engine the community has to decide if "unwanted pregnancy" is going to take someone to the planned parenthood site, an abortion clinic or an anti-abortion site.

    In short there is no notion of the meritocratic search order, there are just tradeoffs between different sorts of searchers. Google is already navigating this maze of tradeoffs, including looking at what users like, so I fail to see the argument that a community search will obviously make better tradeoffs than Google.

    In fact anyone who has spent much time on the Internet realizes that every community tends to develop its own prejudices and biases pushing away those who disagree and attracting those who agree. Slashdot attracts open source zealots and repels the technically inept. Whatever community develops this search engine will have its own biases which will discourage participation by those who don't agree. This is just human nature.

    Likely I might enjoy the results returned by such a search since I suspect the participants are likely to be technically sophisticated nerds and others who have similar views as I do. However, it seems doubtful that they will provide the results people who are very different than those who run the search engine will appreciate.

    Besides, this whole project just smells hokey to me. It sounds like Wales is drunk on his success with wikipedia and advocating it as THE solution to any problem. Problems are pragmatic things and they shouldn't be solved by ideologies.

    --

    If you liked this thought maybe you would find my blog nice too:

    1. Re:Meritocratic Search Doesn't Make Sense by foniksonik · · Score: 1

      This is where a tagging system would make a lot of sense. Websites should be able to be meta-filtered by content type...

      Professional Grade Article - does it contain an abnormal number of jargon/professional terms or have a number of equations beyond a threshold level, tag it as a professional article (Use standard algorithms to determine it's popularity and popular authority... leave it up to those who know to determine it's accuracy)

      Consumer Grade Article - does it contain few if any jargon/professional terms within the body... is it in the first or third person... what level of grammar is it at?

      Commercial Grade Article - how many ads on the page, how many offers... are there price listings or email addresses with the words sales, customer support, etc.

      This is just a rough example of what could be done to further meta-filter content by standards that are useful to everyone.

      On the results page of such a search engine, each type of content would be given a grading... and an icon or color-coding or whatever...

      This type of meta-filter would be difficult to game and to what end? Yet it would be ultimately helpful in providing accurate search results to people looking for information. Some people are doing generic research on a topic, others are looking for info on a product related to a topic, others are looking for professional resources and literature on a topic, others are looking for a service professional to hire.

      --
      A fool throws a stone into a well and a thousand sages can not remove it.
  39. Broken window fallacy by numberthre · · Score: 1

    The world as a whole is not poorer because someone overpaid for hosting.

    Erm, yes it is. That difference in price could have been used to produce value. If you believe that the world is in fact not poorer, then you believe that the point of an economy is just to shuffle money around.

    See the broken window fallacy: http://en.wikipedia.org/wiki/Broken_window_fallacy

  40. Our answer for search - SiteTruth by Animats · · Score: 3, Insightful

    We hadn't planned to announce this quite yet, but this is a good opportunity.

    We have a new answer to search - SiteTruth. It's working, but not yet open to the public.

    Other search engines rate businesses based on some measure of popularity - incoming links or user ratings. SiteTruth rates businesses for legitimacy.

    What determines legitimacy? The sources anti-fraud investigators tell you to check, but nobody ever does. Corporate registrations. Business licenses. Better Business Bureau reports. The contents of SSL certificates. Business addresses. Business credit ratings. Credit card processors. All that information is available. It's a data-mining problem, and we've solved it. The process is entirely automated.

    Most of the phony web sites, doorway pages, and other junk on the web have no identifiable business behind them. Try to find out who really owns them, and you can't. When we can't, we downgrade their ranking. With SiteTruth, you can create all the phony web sites you want, but they'll be nowhere the beginning of any search result.

    Creating a phony company, or stealing the identity of another company, is possible, but it's difficult, expensive and involves committing felonies. Thus, SiteTruth cannot be "gamed" without committing a felony. This weeds out most of the phonies.

    SiteTruth only rates "commercial" sites. If you're not selling anything or advertising anything, SiteTruth gives you a neutral or blank rating. If you're engaged in commerce, you can't be anonymous. In many jurisdictions, it's a criminal offense to run a business without disclosing who's behind it. That's the key to SiteTruth.

    Our tag line: "SiteTruth - Know who you're dealing with."

    The site will open to the public in a few months. Meanwhile, we're starting outreach to the search engine optimization community to get them ready for SiteTruth. We want all legitimate sites to get the highest rating to which they're entitled. An expired corporate registration or seal of trust hurts your SiteTruth ranking, so we want to remind people to get their paperwork up to date.

    The patent is pending.

    1. Re:Our answer for search - SiteTruth by micromuncher · · Score: 1

      The crappy front page makes it look like a scam.

      --
      /\/\icro/\/\uncher
    2. Re:Our answer for search - SiteTruth by Animats · · Score: 1

      The crappy front page makes it look like a scam.

      To some extent, that page was made to discourage unwanted attention during the early phases. But it's all real.

  41. Re:Agreed, but for a different reason by rowama · · Score: 1

    I also think it won't work, but for a different reason.

    The solution offered in the essay can be gamed. Not only that, it would open a new industry of gaming facilitators, or brokers. Gaming the system would only require that a third party (i.e., the broker) provide incentive (e.g., money?) for the chosen raters to vote a certain way. Raters who want to make a few bucks would basically be selling their votes by having the highest paying broker tell them how to vote. As soon as the proportion of bought votes reaches some critical value, the entire system would begin to implode.

  42. Re:Hrmmm by Overly+Critical+Guy · · Score: 2, Funny

    Thank you for your interesting response to "first post." Apparently, first post won't work because there are a huge number of sites on the net. Only computers can handle a task like first post.

    --
    "Sufferin' succotash."
  43. State of trust networks by numberthre · · Score: 1
    What is the current state of trust networks? Many problems (spam, SEO, 'gaming') seem to hinge on the absence of a trust certificate for ratings. Further, the value of ratings is subjective. I shouldn't have to `trust' Wikia selected users with their ratings. I myself should choose users whose ratings I value for a particular topic.

    Is there some fundamental flaw in current trust network algorithms that prevent them from being implemented?

  44. Scale Issue - This is unlikely to work... by Phrogman · · Score: 1

    You have a major problem with the scale of providing search results. What you are proposing here is that individual users *rate* websites in some manner according to their merit. Leaving aside the fact that users are no inherently qualified to rate websites, the fact that a given website may have great merit for a given subject, but not others, and the fact that people will actively find a way to "game" this system just as they have all the others, how big exactly is the internet? lets assume there are 10m useful websites on the web (analy extracting a number). How exactly do you expect the users of this new system to manually rate all of those websites? What is their individual reward for doing so? Even spending 5s to select a number between 1 and 10 for 100 websites on a regular basis is going to occupy considerable time for no apparent reward beyond "helping the system". Most users are probably not that philanthropic, they will simply want to gain benefits from the effort of others. In other words the system will depend on those individuals willing to participate and give up their free time. Now, granted you get all this participation to occur, why would the results be any different or better than those of Google or the other major search engines.

    In the original thesis paper that gave rise to the Google algorithm, they essentially worked off a couple of simple concepts - that are no doubt very difficult to implement. They called the key factors "Hubs" and "Authorities". A Hub is a page that contains many hyper-links to other pages. An Authority is a page that is linked to from many other pages. Google's algorithm not only rates each page according to how hub-like or authority-like it is, but it also increases the ranking of the page according to the ranking score of the pages it links to, and more importantly the pages that link to it. In other words if lots of other pages think your page is important and link to it, you gain a higher ranking, if you link to lots of high ranking pages that too is weighed to some degree. As well the actual contents of your page are rated according to their relevance to the search query a user enters at google. Now there are of course thousands of other rules that have been added to avoid all the various tricks that arsehat website developers have attempted to use to fool the system and garner a higher ranking, but the end result is a pretty efficient and accurate search system - probably the best around given google's popularity. The key thing being that of course all this indexing can be done automatically.

    How exactly is any manual system going to improve on that? how can it hope to keep up with the billions of webpages out there that are constantly being updated and improved, deleted, moved etc? The manual system can only rate individual websites, not pages - that simply wouldn't be practical. In an ideal world where one website dealt exclusively with one subject and one subject only, this might work, but most websites have dozens if not hundreds of different informational elements that might be of interest to a user, and there is no manual way to determine that, in other words any such meritocratic system is still going to have to fall back on indexing the individual pages using a spider, and the only real result of the meritocratic system is to determine the user's perspective on how valuable the information on that site is - and thats entirely subjective. Google is more objective by far, and any approach to a search engine pretty much has to be as objective as possible.

    I don't honestly think any manually driven system has a hope of keeping up with the web. There is simply far too much data being generated, edited, deleted and moved daily for any such effort to offer any real economy of scale.

    --
    "The first time I got drunk, I got married. The second time I bought a chimpanzee, after that I stayed sober" Arian Seid
  45. Google Toolbar? by lanfor · · Score: 1

    Isn't Google doing something very similar with their Google Toolbar? When I search using Google, they record which results/websites I opened and they can improve ranking of those websites. Of course this is biased toward getting feedback for only the top 10 results. Nevertheless I don't see much difference in the proposed algorithms.

    One could also ask if bots would be a problem for gaming the Google Toolbar system. If you can game a WikiSearch engine, then you should also be able to game the Google Toolbar, right? Unless those projects do some sort of identity verification, then they will be always exploitable by bots (even with the proposed, random algorithm: imagine WikiSearch with 1,000,000 bots and 1,000 active real users - randomization wouldn't help here, right?)

    Lukasz
    http://www.hikipedia.com/ - a free database of hiking trails built by the hiking community

    --
    Lukasz Anforowicz
    Hikipedia - a free database of hi
  46. Re:fair and un-gamable rankings meritocracy by Tony+Hoyle · · Score: 1

    That's why you have decay in rankings.. If nobody keeps voting things over time the number of votes attributed to a page lowers until it goes back to zero - this allows new pages to be on the same footing as older ones.

  47. Re:Patents - Google has more than I thought! by jimwelch · · Score: 1

    US Patents in general run for 20 years after the filing date.

    "Ranking search results by reranking the results based on local inter-connectivity"
    US Pat. 6526440 Filed Jan 30, 2001

    "Methods and apparatus for using a modified index to provide search results in response to an ambiguous search query"
    US Pat. 6865575 - Filed Jan 27, 2003

    "Systems and methods for highlighting search results"
    US Pat. 6839702 Filed Dec 13, 2000

    Adaptive computation of ranking
    US Pat. 7028029 - Filed Aug 23, 2004

    "Graphical user interface for a display screen of a communications terminal"
    US Pat. D529920 - Filed Sep 29, 2003
    US Pat. D528553 Filed Sep 29, 2003

    "System and method for encoding and decoding variable-length data"
    US Pat. 7068192 - Filed Aug 13, 2004

    "Cable management for rack mounted computing system"
    US Pat. 6870095 - Filed Sep 29, 2003

    "Voice interface for a search engine"
    US Pat. 7027987 - Filed Feb 7, 2001

    "Drive cooling baffle "
    US Pat. 6906920 - Filed Sep 29, 2003

    --
    Never trust a man wearing a coat and tie!
  48. Open Directory by bcrowell · · Score: 1

    Sounds like they're reinventing Open Directory, which has been doing just fine for many years. I believe Google actually uses Open Directory as one of its seeds for the pagerank algorithm. The Wikimedia foundation keeps on starting up projects, many of which ever become very successful. Wikibooks, for instance, has never achieved its original, grandiose goals, and it's been struggling for years now without making much headway. Its only big area of success was gaming guides (not the college textbooks it was originaly supposed to create), but then they deleted all the gaming guides. I can count the high-quality, complete wikibooks on my thumbs. How about getting rid of some of the failed projects before proposing more?

  49. Re:Hrmmm by nacturation · · Score: 1

    Won't work. Actually, it did. The poster you replied to who wondered "First... post?" did, in fact, get first post.

    Sorry. Only computers can handle a task like this. It is automation or failure. That could very well be the explanation. Perhaps the first poster automated the process. It would certainly explain the outcome.
    --
    Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
  50. Mathematically Impossible by Skewray · · Score: 1

    Any system which depends on user input to rank pages is fundamentally a voting system. It has been proven (and discussed on slashdot), that any voting system with more than two candidates can be gamed. The entire project sounds like a waste of time.

    1. Re:Mathematically Impossible by Razzy · · Score: 1

      From a theoretical standpoint this is true, and actually even worse. Arrow's theorem shows that there might not be any transitive social ranking of alternatives out there. It isn't just that the order can be gamed, there might not be any way to aggregate people's preferences into a single ordering fairly (See http://en.wikipedia.org/wiki/Arrow's_theorem for a definition of "fairly"). On the other hand, voting cycles seem to be relatively uncommon in the real world when enough voters are involved (http://www.cup.cam.ac.uk/us/catalogue/catalogue.a sp?isbn=0521536669), so maybe this isn't a hopeless cause after all, at least for the top n results returned (I'm pretty sure that things get uglier as the number of results->infinity even in real data).

  51. summary + while we're wishing, I'd like a pony by sdedeo · · Score: 1

    I read this essay (long on words, short on content.) The summary:

    "I have a new idea for a search engine. You should be allowed to suggest a modification to the search results. Your modification will be anonymously reviewed, Slashdot-moderation style, by a small, random subset of search engine users. It's nice to learn that the algorithm solves a problem that does not exist with contemporary link-network algorithms, but does with a hypothetical bad idea (the sockpuppetry issue.)"

    Now can we talk about the idea? It's an interesting suggestion, actually, despite my snark. It does indeed bust out of the box a little, by leapfrogging google's focus on an arms race of increasingly arcane (one presumes) link-network analysis.

    My feeling, though, is that it won't work: the issue being that it eliminates one serious advantage of the google algorithm, which is that it takes work to "vote". If we put aside the question of linkspam (I understand it's an issue, but if my search experience is representative, google is winning the arms race) for a moment. In order to affect search results, a good faith user has to create links to the pages he considers useful for her readers. It's reasonable to expect that such a linker has knowledge greater than the average member of the community.

    Indeed, the whole point is that the average member of the community is going to a search engine in order to find out something he doesn't know already. The advantage google has is that it has figured out reasonably good algorithms that take advantage of this "hidden knowledge." Websites gain credibility, and accrue links; in proportion to this credibility, their own opinions count more. It's important to note that google is not a "one man one vote" system -- it is indeed a meritocracy to begin with.

    --
    Protect your liberties. Donate to the ACLU
  52. Cooking for Engineers? by Anonymous Coward · · Score: 0

    Don't you hate it when a recipe says "cut into cubes" and you want to throttle the author and shout, "HOW BIG??" It drove me crazy until I found CookingForEngineers.com. I'm having a hard time finding a recipe which is technically sound. For example, their Dark Chocolate Souffle:

    "Prepare two 6 ounce (180 mL) soufflé ramekins by applying a layer of cold butter to the interior of the ramekins."

    How thick of a layer?

    "Pour some granulated sugar into the ramekin and shake and roll the ramekin to coat the bottom and sides with sugar."

    Some sugar? How much? How fine a granulation... regular granulated or berry sugar? How hard should I shake and roll? How thick should it be coated on the sides?

    "Bring some water to a boil in a pot."

    How much water? How big of a pot? And so on... not very precise.
  53. Not in Belgium apparently... by Chas · · Score: 0, Offtopic

    In Soviet Belgiumistan, the engine search YOU!

    --


    Chas - The one, the only.
    THANK GOD!!!
  54. is Google REALLY highly regressive by arete · · Score: 2, Informative

    I don't think Google is highly regressive in the way you describe, but I suppose it certainly depends on your definition of regressive.

    Google is definitely regressive from the point of view that it tries to represent the average total mindshare about search terms - NOT the average CURRENT mindshare. So if you want to find the up and coming site that's ABOUT to be the new hotness but hasn't reached critical mass yet, you need something like the derivative of Google's PageRank.

    But this is definitely NOT what I want from my everyday search - that's more like the Digg of search. But everybody loving something today doesn't mean it has staying power or consistency, both of which ARE true for the major Google results. I don't WANT my search engine to forget about links that aren't hot anymore, and that no one is posting NEW links to.

    There's no doubt that paying for a bunch of links kicks up your PageRank - but stopping that completely is literally impossible. As long as the system is democratic, people are going to buy votes to at least some extent. The only thing you can do is take sites that sell their opinion, detect them as much as possible, and drop their PR as much as possible.

    But that's much harder and more expensive than taking your better mousetrap, sending it to some appropriate, prominent bloggers, and - if it really IS so much better - watching it tear through the blogosphere and therefore your rank go up. Plenty of people are writing links about anything they think is better.

    Notwithstanding "fight the tricks better", which is obviously an ongoing battle, there's only 3 major ways I can see to change this situation. 1) Make it more derivative-of-mindshare. But I don't actually want that for MY search engine.
    2) Make at least some of the ranking more explicit 3) Make the ranking more personalized.

    #2 is something that would be cool. You shouldn't let people get more power than their PR implies, but - I definitely want there to be a way for me to put a link in a page and say "hey Google, I'm saying this link is BAD, not good!" (Yes, I realize putting "worst site ever" in the link text helps.) And a way to say "hey Google, these 10 links I really care about, so if you could give them slightly more PR and all my other ones slightly less PR worth of linkification, I'd like that. But that second part would have to be a pretty tiny effect anyway.

    The other way to approach #2 is through some explict ranking instead of links at all. And while this gives you a LOT more flexibility, it also gives you a LOT less democratization, because you're cutting out all the people who merely have websites and don't care about your search engine. In this way Google is very extroverted instead of being introverted, and that's a good thing.

    #3 is another option, and it doesn't have the same democratization problems. In this zone, we get to make a list of our personal sites and say "hey, I really trust these sites. Links from there should be much more powerful - for me." And if that's true, then sites that those sites link to... etc. In this way you could easily build trust networks that were more personalized. Linux fans would never get Microsoft answers to networking questions :) But you have to recalculate PR on a per-user basis, which seems pretty daunting.`

    The big problem I see is that adding in both of those things and doing it perfectly will aboslutely not make up for the underlying search being less well executed. So there's a huge hurdle to overcome for any minor player to catchup using these kinds of techniques.

    --
    Looking for freelance Actionscript (Flash/Flex) or ColdFusion work and/or freelance developers. Email me, put Slashdot
    1. Re:is Google REALLY highly regressive by IdolizingStewie · · Score: 1

      You can kind of do #2 with the rel="nofollow" attribute. It's not explicitly bad, but at least it doesn't help the site.

  55. Arrow's Theorem by attonitus · · Score: 3, Informative
    Such a theorem does exist and is proven! Arrow's Theorem states that it's impossible to design a voting system that satisfies three really basic conditions:

    a) The removal of one candidate from the race would not affect the rank of the others;

    b) If everyone prefers candidate A to candidate B then the algorithm should rank A above B;

    c) There is no dictator (i.e. there's more than one person voting).

    The same criteria should also apply to a perfect search engine - the removal of one page from the web should not affect the relative ranking of the others, if everyone thinks page A is better than page B, page A should come first and, to be practical, the engine should take as input the priorities of more than just one person (it's not feasible to build a customized search engine that knows exactly the priorities of each individual user).

    Therefore, a perfect search algorithm does not exist

  56. relative ranking units by GunFodder · · Score: 1

    I don't think we need to bring bears into the discussion. The real question is whether we should use monkeys, ninjas, pirates or robots.

    1. Re:relative ranking units by bane2571 · · Score: 1

      In all honestly do we really need to seperate them? Bring on the shark riding monkey ninja pirates I say.

  57. Wikipedia was built on a meritocracy? by Anonymous Coward · · Score: 0


        This post was very poorly phrased. To use "meritocracy" with regards to Wikipedia is ludicrous. In posing the question I think the author is trying to push his own political agenda rather than rationally acknowledge what made Wikipedia work... which was the ability for ANYONE to contribute.

          In fact this very point was an issue between Wales and fellow Randroid Sanger. Jimbo of course ended up being right, that easy access was ultimately superior to restricting access to only qualified individuals. I've always wondered how he's ever reconciled this obvious conflict with his elitist Objectivist views (which clearly contradict). Mind you he seems to have mellowed out from his old OOist porn addict days. Charity is something that causes real Randroids to vomit uncontrollably.... yet Jim has stretched his linguistic/philosophical envelope to suggest feeding starving children in Africa is NOT altruistic?

          Any how.... there are plenty of encyclopedias that are run by experts. Too bad virtually no one uses them as a primary resource online (hopefully no one uses Wikipeida exclusively though). Wikipedia works because freedom encourages participation.... not because everyone that doesn't complete their PHd thesis on a subject is an idiot that has nothing worthwhile to say.

        I've heard some people call Wikipedia "a democracy" (where everyone is equal) but this isn't true either. I would say it's closer to slightly managed anarchy (and breathes some new credibility into the idea IMO)

  58. Mod Parent Up by What+is+a+number · · Score: 1

    And I just had mod points last week. ah well.

    Anyhow, this is exactly the theorem I was going to mention, and I agree - it means the perfect search engine can't exist.

    --_
    I type this every time.

  59. American Website Idol? by GamblerZG · · Score: 1

    Sounds like ochlocratic search engine to me.

    rank sites according to the preferences of the average user
    Who the hell is "the average user"? Why should some "average user" decide what search results will _I_ get? Good search engine should return webpages that match my request, not webpages that are deemed worthy by some poll. And that is my biggest issue with Google right now - it relies on PageRank that effectively works as a popularity measurement system.

  60. Who defines "merit"? by vrmlguy · · Score: 1

    The big problem with this proposal is that it assumes that there is only one definition of "good". For instance, look at the example of searching for a web hosting firm. Am I interested in the same criteria as you? Yes, cheap is nice, but maybe I want to pay a bit more to survive a slash-dotting. Maybe I want "five nines" reliability. Maybe I want to run CGI scripts written in Haskell instead of PHP or Python. Or maybe I just want to run a generic Wordpress blog. Different firms provide different capabilities, that's how they stay in business.

    And the problem just gets worse if you look at music. I might rank ska bands higher than hip-hop, you might do the opposite. The only music that would rise to the top of such a competition would be "American Idol" type stuff, which while successful has yet to inspire the devotion seen in Dead/Parrot/etc-heads.

    --
    Nothing for 6-digit uids?
    1. Re:Who defines "merit"? by Anonymous Coward · · Score: 0

      Then you would include "wordpress" or "PHP" or whatever in your search terms when you were doing the searching. Isn't that what you would do with Google, too?

  61. CHACHA by Anonymous Coward · · Score: 0

    you should see CHACHA.com, they are already doing it.

  62. Better than any algorithm...humans by jokewallpaper · · Score: 1

    I think site like Wikia will be great. I think ChaCha.com http://www.chacha.com/ has a good idea too. Human's searching, finding and ranking sites.

  63. Getting included by ProfessionalCookie · · Score: 1

    DMOZ has been slow and incomplete for ages. It's also an example of hierarchies gone bad. I never really know where I should submit my site (edified.org). Shouldn't /California/Camping/RV and /Camping/RV/California be the same? Errr

  64. Before they dive into this.. by Skythe · · Score: 1

    How about they fix up wikipedia's search function? The other day i tried to go to an article directly, but i made the terrible mistake of forgetting to capitalize one of the words. I found my article about halfway down the page with 50% relevancy.

  65. I couldn't resist :-) by dave1g · · Score: 1

    Wikipedia's take on dead weight loss.

    http://en.wikipedia.org/wiki/Dead_weight_loss

  66. Offtopic, but by SanityInAnarchy · · Score: 1

    Was he actually a Samurai?

    Anyway... I do remember hearing that the scene was an accident. Basically, Harrison Ford had diarrhea at the time. It was actually supposed to be a nice long fight, swords vs Indy's whip, but when you gotta go...

    Ah, the useless facts you pick up watching movies... After Morpheus' fully-VR PowerPoint-like talk about the Real World, and Neo gets unplugged and staggers around saying "I don't believe it..." Then Cypher goes "He's gonna pop" and Neo pukes... That was real. Apparently they had really bad chicken pot pies that day...

    --
    Don't thank God, thank a doctor!
  67. Questions by jawahar · · Score: 1

    Is it possible? Is it desirable?

  68. Make everyone a moderator by Hugo+Graffiti · · Score: 1

    There are two parts to this - one is getting a list of which web pages contain the search terms, and the second is ranking them. The first is easy (ish) to achieve. For the second, rather than fancy algorithms like Google page rank which can be manipulated by SEO, just let the people decide. Everyone gets to vote just once on the ranking for a particular web page. To prevent abuse make use of something like OpenID to authenticate users. Sure you might see very popular pages at the top of the list that don't have much relevancy but better that than the current system and anyway suitable choice of keywords should filter out most of the fluff.

  69. Why are companies more important than people? by iamcf13 · · Score: 1

    A search for Ford should yield Ford Motor Co., as the correct first answer, Wales said.

    Why are companies more important than people?

    Why not a page about former U.S. President Gerald Ford as the 'correct first answer'?

    Or a page about actor Glenn Ford as the 'correct first answer'?

    Aren't people more important than companies which are nothing more than legal constructs created by people to facilitate commerce amongst themselves?

    Anyway, I belive a 'fair' search engine would not use linking to determine popularity and authority. The 'spamdexing' and 'Googlebombing' of Google proves that is possible.

    Why not have the results page return 10-30 links at random from all the available links for a given search term entered by a user?

    On top of that, use people (not software algorithms) to prune out all the spam pages submitted to it. This could be done with editors who visit submitted pages before adding them to the search engine (time consuming, can be 'gamed'), or accept ALL links and us a 'thumbs up/down' method to inform the user how useful the link is (also time consuming, can be 'gamed').

  70. You'd need users to filter out garbage... by blahplusplus · · Score: 1

    Utlimately at first you'd need users to tell you what was spam and what was not to help develop an anti-system gaming algorithm, also you could offer financial rewards for outing people gaming the search engine.

    Ideally you'd need users to help you fight the constant battle with those trying to game the search results, but users would need some kind of incentive or payment to keep the search engine running smoothly. Ideally maybe you could select random samples of people and pay them to filter out garbage? I have no idea at this point. If anyone has better ones, shoot!

  71. One weakness by bytesex · · Score: 1

    One potential weakness is that attackers could perennially throw up searches on certain topics for re-examination. The problem lies not in the fact that I have to vote _once_ that bank X provides the best mortgage, but that might have to vote twenty times ('yes' 'I said yes' 'You know I already said yes') to establish this, because some bozo wanted a vote on this every day for years on end. After all, if I, as a user, press 'I do not agree with this order, take X to the top' twenty times a second (which I would be able to do in an automated fashion), I could have certain topics to be perennially up for doubt. A side effect is that it is advertising. If I created a vote every day that sent a message to 100 users, which read 'do you think that bank X provides the best mortgages ?' - then that's advertising. And a different, but effective way to game the system.

    --
    Religion is what happens when nature strikes and groupthink goes wrong.
  72. Meritocracy? by quirkyalone · · Score: 1

    I've been a member of several online entrepreneur communities, and I'd conservatively estimate that members spend less than 10% of the time talking about actually improving products and services, and more than 90% of the time talking about how to "game" the various systems that people use to find them, such as search engines and the media.
    Good point, BUT if the new meritocracy search will be based on voting, wouldn't the company need to spend some of those 90% of effort to prove the "merit" to the community, trying to influence the voting? The abovementioned problem will still exist, imho.
  73. Re:Meritocracy = aristocracy with genetics for wea by Hognoxious · · Score: 1

    Hey, it's not my fault you're poor and stupid. Get over it already.

    P.S. you're ugly too.

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."