Domain: digg.com
Stories and comments across the archive that link to digg.com.
Stories · 20
-
How Many Days Americans Waste Commuting In The Course Of A Lifetime, Mapped By City (digg.com)
An anonymous reader writes: Have you ever stopped to think that over the course of your lifetime, you will likely spend hundreds of days commuting back and forth from home and work? If not, we've got a great map that's sure to make you question what you're doing with your life. The good folks over at Educated Driver used Census Bureau data on average daily roundtrip commute times in hundreds of cities nationwide to calculate how much time Americans spend traveling to and from work over the course of their lives. (They assumed a 45-year career working 250 days a year.) The results, mapped by city, are pretty horrifying. -
Why Don't We Care About The Rotten Tomatoes Scores Of TV Shows? (digg.com)
Why do we never utter sentences like "'Cobra Kai' has been certified 100% fresh on Rotten Tomatoes?" or "'Stranger Things'" was rated 8.9 out of 10 on IMDb"? It's not because the reviews of TV shows aren't aggregated by these websites -- they are. Contrary to what you might think of IMDb, given that its name is Internet Movie Database, TV shows also occupy an essential, if relatively smaller, place than movies there. And the same thing goes for Rotten Tomatoes. An exploration: So if the lack of availability of TV rating sites isn't the issue, why is it that we hardly use critical or audience scores as a way to measure the quality of a TV show to our peers? Here are a few of my theories:
There Are Too Many Good Shows Out There
It's an odd dilemma to have, but it's true that when it comes to TV shows, there are so many high-quality programs for us to consume. People have been talking about Peak TV for a few years now, and a quick scroll through Rotten Tomatoes' website would seem to confirm that we've been offered an embarrassment of riches. [...]
The Price Of Admission Is Higher For Movies
Another reason why viewers might care less about a TV show's critical scores than a film's might be the high price of moviegoing. Tickets in metropolitan areas in the US can be extremely expensive, costing up to $25.49 if you're going for an IMAX screening in New York City unless you're subscribed to a service like Moviepass or AMC's new subscription program.
Networks And Platforms Market Emmys More Than Critical Scores Compared to critical scores on review websites, networks and platforms seem to place more stock on the Emmys when it comes to the marketing of TV shows. Despite the fact that the Emmy, arguably the best TV award, might not offer shows as big of a ratings boost as it did decades ago, the awards still play a crucial part in helping create social buzz around television shows, especially for shows with smaller audiences. -
Magic Leap Finally Demoed Its Headset And It Is 'Disappointing' (digg.com)
From a story on Digg, via DaringFireball: Magic Leap, the secretive augmented reality company that has raised $2.3 billion, finally demoed its long-rumored, much-vaunted headset on Wednesday (and announced that the headset will ship this summer). It was disappointing. Magic Leap has promised big things -- remember the tiny elephant in your hands? Remember that whale jumping out of the gym floor? But the animations demonstrated on Wednesday fall short of those promises. Waaaay short. An executive with Magic Leap, which has long remained tight lipped on its roadmap and commercial availability of its products, said on a Twitch livestream this week that the Magic Leap One, a developer-geared headset, will ship this season. (Summer ends September 22, so the company has 10 weeks to meet its self-imposed deadline.) -
Yale Researchers Prove That ACID Is Scalable
An anonymous reader writes "The has been a lot of buzz in the industry lately about NoSQL databases helping Twitter, Amazon, and Digg scale their transactional workloads. But there has been some recent pushback from database luminaries such as Michael Stonebraker. Now, a couple of researchers at Yale University claim that NoSQL is no longer necessary now that they have scaled traditional ACID compliant database systems." -
Buried By The Brigade At Digg
Slashdot regular Bennett Haselton writes in with an essay on a subject we've dealt with internally at Slashdot for years: user abuses of social news... this time at Digg. He starts "Alternet uncovers evidence of a 'bury brigade' coordinating efforts to 'bury' left-leaning stories on Digg. Digg had previously announced that the 'bury' button will be removed from the next version of their site, to prevent these types of abuses, but that won't fix the real underlying issue — you can show mathematically that artificially promoting stories is just as harmful in the long run. Here's a simple fix that would address the real problem."Even if you just arrived from Mars and have never heard of Digg, that description of the service should make it obvious how easy it is to game the system, by rounding up groups of friends to vote on stories that you want to promote, or to bury stories that you want to kill. The former type of abuse (and it is abuse, under Digg's Terms of Use; search for "organized effort") is far more common, since people usually have more incentive (commercial or otherwise) to promote their own work than to bury someone else's. And in fact, Digg has announced that the next version of the service will remove the "bury" button, replacing it with a "Report" button for reporting bona fide cases of abuse, not just to bury boring stories.
The thinking seems to be that abusive "digging" to promote a story, is less harmful than abusive "burying", and this has the ring of plausibility — that a creative effort is better than a destructive one. After all, Alternet had previously highlighted several artificial right-wing "digg brigades" mentioned in their story (Diggs And Buries, theliberalheretic, etc.), but they didn't blow the lid off of the situation until their report on the Digg Patriots bury brigade, as if to say, "Now we've found something really scandalous!" Annalee Newitz cheekily reported on how she bought votes to boost a story to the front page of Digg, but probably would have felt guilty if she'd hired a service to bury someone else's story. And when a Digg user organized an effort to bury Ron Paul stories that he thought were "spamming" the system, Ron Paul supporters protested that they were merely organizing to vote up stories they agreed with — the clear implication being that this was more honorable than organizing to vote stories down.
But this, I think, is a fallacy. If a story's ranking is artificially inflated, then the extra eyeballs for that story have to come from somewhere, and they come from users paying less attention to the other stories that the phony up-and-comer pushed out of the way. Artificially bumping a story up is just as harmful as artificially burying a story, but the harm is distributed among many innocent victims, not just one. (By the same reasoning, in fact, you could argue that burying a story does no net harm to other users of the Digg site, because the harm done to one story is cancelled out by the benefit to all the other stories that rise in prominence when the victimized story is pushed out of the way. So by strict economic logic, recruiting friends to boost your own story at the expense of everyone else's, is actually more harmful than organizing a bury brigade!)
So I don't think that Digg's replacing the "bury" button with a "report" button will fix the problem. For one thing, obviously groups could abuse the "report" button in the same way — issuing calls to action to report a story for violating the TOU. Since a flurry of bona fide abuse reports is presumably what Digg uses to identify and remove truly abusive stories like MLM spam, how are they going to tell the difference between these cases and cases of abusive "reporting"? (My suggestion: See if there is a sudden change in the percentage of users who view a story and make an abuse report. For stories that are genuine TOU violations, the percentage of users who "report" it should remain steady; for stories that are victimized by a "report brigade," you'll see a sudden spike in viewers and in the percentage of those viewers who report the story for abuse. This might have worked for detecting and stopping the bury brigades as well, although we'll never know now.)
But more fundamentally, even if this change does stop the "bury/report brigades" from killing stories at will, that only fixes the most obvious symptom of the underlying problem, which is that the system can be gamed by recruiting your friends to vote either way. It won't stop "brigades" from artificially promoting shallow stories that agree with their opinions, which does the same net harm overall.
Indeed, the most long-term harm that the DiggPatriots Yahoo Group might have done is that their cheating was so egregious that it makes other examples of cheating look benign by comparison, and might prevent people from realizing that "benign cheating" is just as harmful. As detailed in the Alternet report, the DiggPatriots group talked openly about cycling through different Digg accounts and circumventing bans on their IP addresses. The welcome message to the Yahoo Group told new users that the group was operating "under the radar." The group leader, a woman with the handle "bettverboten," talked about how to prevent Digg from monitoring their actions. And of course the vast majority of posts were calls to bury stories. But what if all of that had been inverted? If the group had operated in the open, while still focusing on recruiting conservative members? If each user limited to themselves to only one Digg account like they were supposed to? And if they focused not on burying stories, but on digging stories that promoted their viewpoints? Just as bad. It just doesn't sound as bad.
I still think the only way to make Digg a true meritocracy, would be to use some version of an algorithm I outlined in an earlier article, inauspiciously titled "How to Stop Digg-cheating, Forever." The gist of it is that in addition to collecting votes from friends, stories should be shown to a random subset of users on the site (perhaps in a box that occasionally appears at the top of the screen when they're logged in), who are asked to vote it up or down. The votes of a random sampling of users would be more representative of how much value the story would have to the Digg community as a whole. Even if most users who are asked to vote on a "random story" simply ignore the request, all you need is to show the story to a large enough sample that you can measure the difference in responses to a truly good story vs. one that has been promoted by digg-cheaters. You don't necessarily have to run this procedure for every story, only the ones that are about to gain some benefit from a large number of diggs (such as being pushed to the front page), and you need to decide whether the story really deserves that big boost. The only way to game that system would be to organize a group of dedicated Digg users so enormous that they constituted a significant percentage of all users on the system — something pretty hard to do without getting caught.
Still, the only site that I know of, that uses a version of this "random sampling" algorithm is HotOrNot.com, which lets you recruit your friends to vote on the "hotness" of your picture on a scale of 1 to 10 (by sending them a link to that specific picture), but also shows a stream of random pictures to visitors, so that your picture can collect votes from strangers. If the votes from the users who visit your picture via the link are significantly different from the votes from users who see your picture via the random stream, then HotOrNot discounts the votes from users who view your page via the link. This prevents digg-style gaming from people who want all their friends to give them a 10. (Note that if you think about it, this is essentially the same as always throwing out the votes from people who visit your picture via the link. If you collect votes from group A and B, but you only count the votes from group A if they agree with the votes from group B, then you're really only counting votes from group B! All the extra votes really give you is the ability to brag that X many people voted on your picture.)
This seems like the simplest way to prevent Digg-cheating, although there may be others. Still unresolved is how to solve the general problem of "gaming" in traditional media and the blogosphere. For the foreseeable future, it's going to be the simple truth that if a major media outlet wants to run a story, it will be heard, and if no media outlet wants to run it, it won't be heard, regardless of how many viewers or readers would have voted in some hypothetical poll that, yes, they want to read that story, and yes, they liked it afterward. That's true for Internet articles as well, except to the extent that a deserving article might be rescued from obscurity by Digg, but the more that system can be gamed, the less it will reward articles that really deserve it. Digg is gameable because power users can recruit votes from their friends; the media and the blogosphere are so obviously "gameable" that we don't even call it "gameable," because "power users" — media outlets and A-list bloggers — can run whatever they want. Right now, the only way I can think of to change this situation that is even logically possible, would be for a site like Digg to adopt some version of the random-sampling algorithm, and to continue growing in power until a significant percentage of the public (not just Internet users, but everybody) relied on it for information. Then, if you had something important to say, people would hear it, but you wouldn't be able to cheat your way to the top.
The ultimate irony is that Alternet's story may never have seen the light of day, if it hadn't been the beneficiary of the same gameable, non-meritocratic inefficiencies that exist in the media-blogo-outrage-o-sphere, just as they exist on Digg. Yes, the Alternet story deserved to be heard, but you don't get the publicity you deserve, you get the publicity that you organize, and Alternet had the organizational publicity structure in place to get their voice heard. If a kid blogging from his bedroom had infiltrated the Digg Patriots group and made essentially the same discovery, would anybody ever have heard about it? (Well, maybe, because of the political hot-button factor — but even then, only after the story had been picked up by a major site like Alternet.) A truly meritocratic Digg algorithm could make it possible to get a good story out without a lot of organizational support behind it — and to ensure that an organized effort can't kill a good story either.
-
Multiple Fiber Cuts In San Francisco Area
georgewilliamherbert writes "Multiple news reports, mailing list posts, blogs, and tweets are pointing out two overnight acts of sabotage in the San Francisco Bay area, with long distance fiber network cables being cut in two locations in the early morning hours. The first cut, around 1:30 AM, affecting landline and cell phone service and 911 calls in the communities of Morgan Hill, Gilroy, and parts of Santa Cruz counties, was on an AT&T fiber alongside Monterey Highway near Blossom Hill Road, in San Jose. A second cut, around 3:30 AM, in San Carlos, affected Sprint fiber and has significantly disrupted services at the 200 Paul datacenter in southern San Francisco. Rumor says that this may be related to a AT&T communications workers contract having just expired — but no evidence has been published yet in the media, and this could be an intentional act of sabotage by someone unrelated to the company's workers." -
Hashing Email Addresses For Web Considered Harmful
cce writes "The MicroID standard, despite getting thrashed soundly by Ben Laurie two years ago, has since been recommended by the DataPortability Project and published on the user profiles of millions of users at Digg and Last.fm. MicroID is basically a hash calculated using a user's profile page URL and registered email address, producing a token that makes the email address vulnerable to dictionary attacks. To see how easy it was to crack these tokens, I conducted a small study, choosing 56,775 random Digg users, and cracking the email addresses of 14,294 of them (25%) using just their MicroID, username, and a list of popular email domains. Digg has more than 2 million users, and that means half a million of them — mostly people who had never heard of MicroID, and had probably not logged in for a long time — had their email addresses exposed to this trivial attack. I also applied this attack to Last.fm (19%) and ClaimID (34%). Digg and Last.fm have since removed support for MicroID, but the lesson is clear: don't publish a hash of my email address online, guys!" -
Digg.com Attempts To Suppress HD-DVD Revolt
fieryprophet writes "An astonishing number of stories related to HD-DVD encryption keys have gone missing in action from digg.com, in many cases along with the account of the diggers who submitted them. Diggers are in open revolt against the moderators and are retaliating in clever and inventive ways. At one point, the entire front page comprised only stories that in one way or another were related to the hex number. Digg users quickly pointed to the HD DVD sponsorship of Diggnation, the Digg podcast show. Search digg for HD-DVD song lyrics, coffee mugs, shirts, and more for a small taste of the rebellion." Search Google for a broader picture; at this writing, about 283,000 pages contain the number with hyphens, and just under 10,000 without hyphens. There's a song. Several domain names including variations of the number have been reserved. Update: 05/02 05:44 GMT by J : New blog post from Kevin Rose of Digg to its users: "We hear you." -
Digg.com Attempts To Suppress HD-DVD Revolt
fieryprophet writes "An astonishing number of stories related to HD-DVD encryption keys have gone missing in action from digg.com, in many cases along with the account of the diggers who submitted them. Diggers are in open revolt against the moderators and are retaliating in clever and inventive ways. At one point, the entire front page comprised only stories that in one way or another were related to the hex number. Digg users quickly pointed to the HD DVD sponsorship of Diggnation, the Digg podcast show. Search digg for HD-DVD song lyrics, coffee mugs, shirts, and more for a small taste of the rebellion." Search Google for a broader picture; at this writing, about 283,000 pages contain the number with hyphens, and just under 10,000 without hyphens. There's a song. Several domain names including variations of the number have been reserved. Update: 05/02 05:44 GMT by J : New blog post from Kevin Rose of Digg to its users: "We hear you." -
How to Stop Digg-cheating, Forever
The following was written by frequent Slashdot editorial contributor Bennett Haselton. He writes "Recently author Annalee Newitz created a bit of a stir with the revelation that she had bought her way to the front page of the story-ranking site Digg. Since Digg allows any registered user to go to a story's URL and "digg it" in order to push it upward through the story-ranking system, it was inevitable that services like User/Submitter would come along, where a Digg user can pay for other users to cast votes to push their story up to the top. User/Submitter says they are currently backlogged and not taking new orders, but they say the service will return and will soon feature services for manipulating similar sites like Digg competitor reddit. Even if the new U/S features are vaporware, it probably won't be long before other companies offer similar services. But it seems like all of these story-ranking sites could prevent the manipulation by making one simple change to their voting algorithm."Before getting to that though, what's at stake? The revelation that Digg could be trivially manipulated did not cause the site to be overrun with bogus stories all at once -- most of the links on the front page still look interesting. Newitz said that her story, which was deliberately chosen to be as lame as possible, got buried by users soon after it hit the front page, which is how Digg cleans spam stories out of the system. However, she also said that in the time that the story was on the front page, the story got about 35,000 hits, whereupon her server crashed and the traffic was thereafter divided with two other mirror sites; presumably if the server had stayed up, she would have gotten about 100,000 hits, all for an initial expenditure of $100, which is orders of magnitude cheaper than buying advertising any other way. (If she had done the same thing with a good story instead of a deliberately lame one, presumably the traffic gains resulting from word-of-mouth and repeat visitors would have been even higher.) As long as the benefits outweigh the cost, more and more unscrupulous users are likely to pay for such services, and since the service provided by User/Submitter is easy to copy, probably similar services will spring up to drive the price down even further. If nothing changes, then eventually sites like Digg and reddit will be flooded with nothing but paid stories. Most of the stories on the front page will probably still be interesting (why would you pay to promote a link, unless it was good enough to draw repeat visitors and get the most value for your money?), but everybody who didn't pay for votes would eventually get crowded out.
One Good Samaritan, Jim Messenger, managed to shut down one Digg manipulation service called Spike The Vote, by buying it out (for a paltry $1,275 - they must have wanted to get out fast) and then turning over to Digg. He warned people that the moral was: Don't sign up for Digg manipulation services, since Digg might get your information from them and then you'll be banned. Actually, I think the moral is simpler: if you're going to try anything like that, do it from a throwaway account that you don't care about losing if you get caught. (Or, only sign up with manipulation services which publish a privacy policy promising never to share your information, especially not with sites like Digg. Then if Digg buys them out, then the site has violated their privacy policy and Digg as the new owner inherits the liability for that, so you can sue them, right?) But as the idea spreads, it will probably become impractical to play whack-a-mole by shutting down manipulation services as they keep springing up. Any time the cost of providing a service (clicking on a few buttons) is small compared to the benefits of receiving the service (100,000 hits in 24 hours), a market will exist for it one way or another, whether you're talking about drug-smuggling, prostitution, or selling Digg votes.
However, I think there's a way to fix it, and here it is. Have you ever seen people put a link in their profile to their HotOrNot picture, saying "Go here and vote me a 10!!"? Similar to the people who send links to their friends and say, "I just posted this, please Digg this for me!" The difference is that on HotOrNot, it doesn't work. On HotOrNot, you can cast votes for a picture in one of two ways. The first way is to go directly to the URL for someone's picture; the second way is to load the front page, where a random picture from the database is selected at random, and vote for whatever picture comes up. The catch is that the votes that you cast by going directly to someone's picture, are simply ignored in calculating the average score for that photo. The only votes that are counted are the votes cast for random pictures displayed on the front page. So if you want to manipulate the voting for your own photo, you'd have to load the front page hundreds of thousands of times waiting for your own picture to come up repeatedly, which is hard to do without being detected.
To enable an algorithm like this on Digg and reddit, the sites could present users with a sidebar box that displays random stories from the pool of recent submissions. (reddit already has a serendipity feature that users can use to select a random story from the available pool, which could be leveraged for this purpose.) Once a story has collected, say, 100 votes -- or whatever number is considered sufficient to provide a representative random sample of how the story appeals to people -- then on that basis the story can either be buried or promoted to the top, where it would be seen by, say, 100,000 people. The elegance of this system is that bad content would only be seen by 100 people on average before it's buried, whereas good content would be seen by all the 100,000 people who view it on the front page, so the average user sees 1,000 pieces of good content for every 1 piece of crap. Even if 75% of users ignore the random story box completely, that just means you have to display it to 400 users instead of 100 before you have enough data points for a good random sample.
I suggested essentially the same algorithm for how an open-source search engine could work without being vulnerable to gaming even by those who understood all of its inner workings. The main difference, of course, is that Digg and reddit actually exist now. Digg declined to comment on the possible merits of such an algorithm; reddit's Steve Huffman said that the idea sounded interesting, although even if the idea got full buy-in, naturally any proposed change would take a long time to bring to fruition.
But it seems that an algorithm similar to this one would be the only way to prevent cheating on sites like Digg that sort content based on user votes. So it's ironic that HotOrNot, the only site I know of that is using a variation of this algorithm and hence is probably the most secure against cheating, is also the one where cheating is least likely to be a problem. Getting a high placement on Digg might enable you to make some money, but getting a highly rated picture on HotOrNot isn't going to make you rich (unless it helps you meet a millionaire who is using the site to find his third wife). Also, making HotOrNot meritocratic doesn't give people an incentive to improve the "content" that they submit, because up to the limits of what can be done with hair and wardrobe, you can't make yourself that much more attractive. With Digg and reddit, on the other hand, I might work harder at submitting a good story, if I knew that it worked in a perfectly meritocratic fashion that pushed good stories right to the top.
If you do this, you don't need any of the other countermeasures listed in Annalee Newitz's follow-up piece "Herding the Mob", such as analyzing user account history for suspicious behavior. As long as most users in the system are legitimate, most of the users in your random sample will be legitimate as well, and their voting will be representative of what most of the community would think. A story could also get a high score within a specific sub-area of the site like the sports page, but kept off of the main site front page, if the story got a high score from a random sampling of sports-oriented users but a low score from a sample of everyone else.
You could even sub-divide the topical areas further, down to a level of granularity like "Would Barack Obama make a good president?" A site called Helium is currently trying something like this -- users can submit essays on subjects like "Racial inequality or oppression: Do they truly exist in todays society?", and vote on how to rank other essays against each other. The voting works on the random selection principle that I'm advocating here -- users are presented with a pair of randomly chosen essays from a given category (not necessarily the same category for which you submitted an essay) and told to vote for the better one, so there's no way to tell all your friends to go to the link for your essay and give it a high rating. The main limitation though is that while the votes can push you to the top of a particular sub-category, that won't cause your article to "break out" and get to the front page of the site -- Helium says that those front-page articles are chosen at random by employees from the among those articles that are highly rated within their narrow category, so just being good is not enough. And if you want to write something that doesn't fit into any existing categories, you have to create a new category for your essay like I did, which will then be a category containing one essay that nobody else ever sees. Perhaps both of these limitations could be overcome by adding the option to rate randomly selected essays on a scale of 1 to 10 -- thus providing a way to rate essays that exist alone in their own category, and also a way to find the best essays across the entire site, rated against each other.
If Digg or reddit adopts a model that uses the random-voter-selection method, then there's the issue of how to handle the votes cast by users under the current system -- the ones who go to a story link and click "digg it", which is what makes the existing system vulnerable to gaming. Digg could do what HotOrNot does, and just ignore those votes outright, but users would probably view this as deceptive. Perhaps Digg could say that votes cast by self-selected users (the ones who go straight to the story link) are counted along with votes from randomly-selected users, unless the average of the self-selected votes is significantly different from the average from the randomly-selected votes, in which case the self-selected votes are ignored. Hopefully this would satisfy most users and preserve the "community" feel of the site, and only a spoilsport would point out that counting the self-selected votes only if they agree with the randomly-selected votes, is exactly the same thing as ignoring the self-selected votes entirely.
I asked the owner of User/Submitter what he thought about this. He was willing to talk with surprising candor (except about things like his real name) and spoke as if he'd like nothing better than for Digg to make changes to their service that would block his system from working. To both Annalee Newitz and me, he said, "We find it interesting that Digg still allows anybody to view any user's diggs. By way of this 'feature,' User/Submitter is able to verify that our users actually digg the stories they're given. Without this feature, Digg users are given complete digging privacy, and User/Submitter cannot exist." Some have expressed skepticism that the Digg cheaters really want Digg to fix the problem. But as a security tester, I can understand that mentality. If you report a problem, and a company doesn't fix it, eventually you get tempted to publicize the problem to draw attention to it. And if they still don't fix it, and it's a fairly benign security hole that merely enables some pranksters to get some undeserved attention, why not build a service around exploiting the hole, if will highlight the problem and encourage it to get fixed?
So I'm going to go out on a limb and say the U/S guy sincerely wants Digg to be more secure. However I disagree with him about his proposed fix, that of hiding a user's digg history. First of all, it won't stop anyone who creates a multitude of accounts all under their control -- you can use Tor to make it appear that you're coming from many different IP addresses, and build up a history of "legitimate" votes before using your votes to push sites deliberately. (Be sure to use different browsers, or vary your User-Agent header if you know how to do that, so that a series of votes from identical browser types doesn't give you away.) If your service does work by paying other users to cast votes, then you could still audit whether they're casting their votes honestly -- for example, create a test story, use 5 sockpuppet accounts to digg it 5 times, then tell your confederate to digg it. If the number of diggs doesn't go up to 6, then you know they're not honoring their end of the deal, and kick them out of the system. As long as most confederates think there might be some chance of getting caught if they don't play along, most of them would probably cast the votes that they were paid for, since it costs them nothing to do so and they wouldn't want to jeopardize their stream of easy money.
I asked the owner of User/Submitter if his service could defeat the random-sampling algorithm I described. "It would slow down our service," he answered, "but certainly wouldn't eliminate it because eventually a U/S User will have an opportunity to vote on a U/S Submission by way of chance." But I don't see how this would beat the algorithm -- some U/S voters would still get to vote on the story, but as long as there are far more legitimate voters than U/S voters, then a random sampling will almost always contain far more legitimate voters. The U/S owner also said, "Randomized voting privileges would be unnecessarily confusing, frustrating, and fragmenting. Not to forget: unfair and undemocratic." Well, you could keep it from being "confusing" or "frustrating" by keeping the existing interface (with the possible addition of a randomly-selected-story box), so that the only changes would be in how the votes are handled under the hood. "Fragmenting"? If anything, it seems to me that the existing Digg/reddit algorithms would be more fragmenting, keeping users within their existing communities of friend who vote for each others' stories; a random-selection box would give stories with "crossover appeal" a greater chance of success, bringing them to the attention of users who might otherwise never have seen them. As for "unfair and undemocratic", presumably this is a reaction to the fact that the votes of 100 users decide what everyone else sees. But it's already the case with Digg that the votes of a small number of users decide what content becomes popular. At least with a random sample of users, it would be the case that the vast majority of the time, the voting outcome would be the same as it would have been if the entire site had voted, due to the magic of representative sampling.
So, I'm putting this suggestion out there for the same reason that Jim Messenger bought out Spike The Vote -- because I don't want sites like Digg and reddit to be manipulated by the abusers. In fact, if they used this algorithm, they would become more meritocratic than they are now, because the systems would strictly favor the highest-rated content, instead of content written by people who have informal networks of friends who can all go digg their stories for them. If I were to design the user rating system to make it cheat-proof, these are the exact details of what I would do:
- Wherever they decide to post the "random story sampling" box (on the front page, or on a link off to a separate page, etc.), have it work so that as soon as new stories are submitted, they can be rotated into that box and displayed to a random set of users, until it's reached its total of 100 votes or however many are required to get a random sample.
- You can have "shutout voting" to kill off stories early that are obvious spam or otherwise really useless, without going through the full 100 votes. (For example, if 90% of the first 10 votes are negative, then stop collecting votes.) This decreases the number of users "inconvenienced" by really obvious spam and other garbage.
- For someone to submit content that gets rotated into that voting process, have them submit a Turing test (read numbers off of a graphic and type them in), or something similar. This prevents spammers from submitting spam content over and over just to have it viewed by those initial 10 voters. If they have to type in a number each time, it's not worth it.
- When users give votes to a story, give them the option to say why they voted the way that they did. (This is especially valuable if they're giving negative votes, then the submitter would know what to improve.) Personally I think the comments would be more valuable if each user can't see other users' comments, at the time they submit their own comments; this prevents the "me too" effect where everybody echoes the first two commenters. (When I ask for independent comments from people, and they almost all say the same thing without seeing each other's comments, that's when I know they have a point!)
- To prevent an attacker from having their own username hit the random-voting page over and over in hopes of voting up their own content, make sure that each user account is only allowed to vote on a given piece of content once (even if they found the content through the random-story page).
- Require a Turing test for new user signups. This would prevent an attacker from registering a huge number of accounts just to hit the random voting page with different users over and over, in hopes getting to vote on their own submitted content eventually.
Then after running this system for a while, look through some collected data to determine if the system could be more efficient. For example, do you really need a sample of 100 votes every time? Suppose you determine that in 99% of cases, you get the same result just from tabulating the first 50 votes, as you would have gotten from tabulating all 100 votes. Then you could modify the system to collect only the first 50 votes, and then make a decision.
Suggestions for improvement? Flaws (hopefully not fatal)? Everyone who cares about keeping community sites like Digg free from abuse, and who wants to create a path for the best content to rise to the top, let's put our heads together and see what we can think of. The above is intended merely as a jumping-off point, and although I've worked it over and I can't see any specific points to improve efficiency, that's probably just because I've been looking at it too long. And if you Digg this story for me I'll give you 1,000 times as much cash as I gave my Mom last Mother's Day.
-
How to Stop Digg-cheating, Forever
The following was written by frequent Slashdot editorial contributor Bennett Haselton. He writes "Recently author Annalee Newitz created a bit of a stir with the revelation that she had bought her way to the front page of the story-ranking site Digg. Since Digg allows any registered user to go to a story's URL and "digg it" in order to push it upward through the story-ranking system, it was inevitable that services like User/Submitter would come along, where a Digg user can pay for other users to cast votes to push their story up to the top. User/Submitter says they are currently backlogged and not taking new orders, but they say the service will return and will soon feature services for manipulating similar sites like Digg competitor reddit. Even if the new U/S features are vaporware, it probably won't be long before other companies offer similar services. But it seems like all of these story-ranking sites could prevent the manipulation by making one simple change to their voting algorithm."Before getting to that though, what's at stake? The revelation that Digg could be trivially manipulated did not cause the site to be overrun with bogus stories all at once -- most of the links on the front page still look interesting. Newitz said that her story, which was deliberately chosen to be as lame as possible, got buried by users soon after it hit the front page, which is how Digg cleans spam stories out of the system. However, she also said that in the time that the story was on the front page, the story got about 35,000 hits, whereupon her server crashed and the traffic was thereafter divided with two other mirror sites; presumably if the server had stayed up, she would have gotten about 100,000 hits, all for an initial expenditure of $100, which is orders of magnitude cheaper than buying advertising any other way. (If she had done the same thing with a good story instead of a deliberately lame one, presumably the traffic gains resulting from word-of-mouth and repeat visitors would have been even higher.) As long as the benefits outweigh the cost, more and more unscrupulous users are likely to pay for such services, and since the service provided by User/Submitter is easy to copy, probably similar services will spring up to drive the price down even further. If nothing changes, then eventually sites like Digg and reddit will be flooded with nothing but paid stories. Most of the stories on the front page will probably still be interesting (why would you pay to promote a link, unless it was good enough to draw repeat visitors and get the most value for your money?), but everybody who didn't pay for votes would eventually get crowded out.
One Good Samaritan, Jim Messenger, managed to shut down one Digg manipulation service called Spike The Vote, by buying it out (for a paltry $1,275 - they must have wanted to get out fast) and then turning over to Digg. He warned people that the moral was: Don't sign up for Digg manipulation services, since Digg might get your information from them and then you'll be banned. Actually, I think the moral is simpler: if you're going to try anything like that, do it from a throwaway account that you don't care about losing if you get caught. (Or, only sign up with manipulation services which publish a privacy policy promising never to share your information, especially not with sites like Digg. Then if Digg buys them out, then the site has violated their privacy policy and Digg as the new owner inherits the liability for that, so you can sue them, right?) But as the idea spreads, it will probably become impractical to play whack-a-mole by shutting down manipulation services as they keep springing up. Any time the cost of providing a service (clicking on a few buttons) is small compared to the benefits of receiving the service (100,000 hits in 24 hours), a market will exist for it one way or another, whether you're talking about drug-smuggling, prostitution, or selling Digg votes.
However, I think there's a way to fix it, and here it is. Have you ever seen people put a link in their profile to their HotOrNot picture, saying "Go here and vote me a 10!!"? Similar to the people who send links to their friends and say, "I just posted this, please Digg this for me!" The difference is that on HotOrNot, it doesn't work. On HotOrNot, you can cast votes for a picture in one of two ways. The first way is to go directly to the URL for someone's picture; the second way is to load the front page, where a random picture from the database is selected at random, and vote for whatever picture comes up. The catch is that the votes that you cast by going directly to someone's picture, are simply ignored in calculating the average score for that photo. The only votes that are counted are the votes cast for random pictures displayed on the front page. So if you want to manipulate the voting for your own photo, you'd have to load the front page hundreds of thousands of times waiting for your own picture to come up repeatedly, which is hard to do without being detected.
To enable an algorithm like this on Digg and reddit, the sites could present users with a sidebar box that displays random stories from the pool of recent submissions. (reddit already has a serendipity feature that users can use to select a random story from the available pool, which could be leveraged for this purpose.) Once a story has collected, say, 100 votes -- or whatever number is considered sufficient to provide a representative random sample of how the story appeals to people -- then on that basis the story can either be buried or promoted to the top, where it would be seen by, say, 100,000 people. The elegance of this system is that bad content would only be seen by 100 people on average before it's buried, whereas good content would be seen by all the 100,000 people who view it on the front page, so the average user sees 1,000 pieces of good content for every 1 piece of crap. Even if 75% of users ignore the random story box completely, that just means you have to display it to 400 users instead of 100 before you have enough data points for a good random sample.
I suggested essentially the same algorithm for how an open-source search engine could work without being vulnerable to gaming even by those who understood all of its inner workings. The main difference, of course, is that Digg and reddit actually exist now. Digg declined to comment on the possible merits of such an algorithm; reddit's Steve Huffman said that the idea sounded interesting, although even if the idea got full buy-in, naturally any proposed change would take a long time to bring to fruition.
But it seems that an algorithm similar to this one would be the only way to prevent cheating on sites like Digg that sort content based on user votes. So it's ironic that HotOrNot, the only site I know of that is using a variation of this algorithm and hence is probably the most secure against cheating, is also the one where cheating is least likely to be a problem. Getting a high placement on Digg might enable you to make some money, but getting a highly rated picture on HotOrNot isn't going to make you rich (unless it helps you meet a millionaire who is using the site to find his third wife). Also, making HotOrNot meritocratic doesn't give people an incentive to improve the "content" that they submit, because up to the limits of what can be done with hair and wardrobe, you can't make yourself that much more attractive. With Digg and reddit, on the other hand, I might work harder at submitting a good story, if I knew that it worked in a perfectly meritocratic fashion that pushed good stories right to the top.
If you do this, you don't need any of the other countermeasures listed in Annalee Newitz's follow-up piece "Herding the Mob", such as analyzing user account history for suspicious behavior. As long as most users in the system are legitimate, most of the users in your random sample will be legitimate as well, and their voting will be representative of what most of the community would think. A story could also get a high score within a specific sub-area of the site like the sports page, but kept off of the main site front page, if the story got a high score from a random sampling of sports-oriented users but a low score from a sample of everyone else.
You could even sub-divide the topical areas further, down to a level of granularity like "Would Barack Obama make a good president?" A site called Helium is currently trying something like this -- users can submit essays on subjects like "Racial inequality or oppression: Do they truly exist in todays society?", and vote on how to rank other essays against each other. The voting works on the random selection principle that I'm advocating here -- users are presented with a pair of randomly chosen essays from a given category (not necessarily the same category for which you submitted an essay) and told to vote for the better one, so there's no way to tell all your friends to go to the link for your essay and give it a high rating. The main limitation though is that while the votes can push you to the top of a particular sub-category, that won't cause your article to "break out" and get to the front page of the site -- Helium says that those front-page articles are chosen at random by employees from the among those articles that are highly rated within their narrow category, so just being good is not enough. And if you want to write something that doesn't fit into any existing categories, you have to create a new category for your essay like I did, which will then be a category containing one essay that nobody else ever sees. Perhaps both of these limitations could be overcome by adding the option to rate randomly selected essays on a scale of 1 to 10 -- thus providing a way to rate essays that exist alone in their own category, and also a way to find the best essays across the entire site, rated against each other.
If Digg or reddit adopts a model that uses the random-voter-selection method, then there's the issue of how to handle the votes cast by users under the current system -- the ones who go to a story link and click "digg it", which is what makes the existing system vulnerable to gaming. Digg could do what HotOrNot does, and just ignore those votes outright, but users would probably view this as deceptive. Perhaps Digg could say that votes cast by self-selected users (the ones who go straight to the story link) are counted along with votes from randomly-selected users, unless the average of the self-selected votes is significantly different from the average from the randomly-selected votes, in which case the self-selected votes are ignored. Hopefully this would satisfy most users and preserve the "community" feel of the site, and only a spoilsport would point out that counting the self-selected votes only if they agree with the randomly-selected votes, is exactly the same thing as ignoring the self-selected votes entirely.
I asked the owner of User/Submitter what he thought about this. He was willing to talk with surprising candor (except about things like his real name) and spoke as if he'd like nothing better than for Digg to make changes to their service that would block his system from working. To both Annalee Newitz and me, he said, "We find it interesting that Digg still allows anybody to view any user's diggs. By way of this 'feature,' User/Submitter is able to verify that our users actually digg the stories they're given. Without this feature, Digg users are given complete digging privacy, and User/Submitter cannot exist." Some have expressed skepticism that the Digg cheaters really want Digg to fix the problem. But as a security tester, I can understand that mentality. If you report a problem, and a company doesn't fix it, eventually you get tempted to publicize the problem to draw attention to it. And if they still don't fix it, and it's a fairly benign security hole that merely enables some pranksters to get some undeserved attention, why not build a service around exploiting the hole, if will highlight the problem and encourage it to get fixed?
So I'm going to go out on a limb and say the U/S guy sincerely wants Digg to be more secure. However I disagree with him about his proposed fix, that of hiding a user's digg history. First of all, it won't stop anyone who creates a multitude of accounts all under their control -- you can use Tor to make it appear that you're coming from many different IP addresses, and build up a history of "legitimate" votes before using your votes to push sites deliberately. (Be sure to use different browsers, or vary your User-Agent header if you know how to do that, so that a series of votes from identical browser types doesn't give you away.) If your service does work by paying other users to cast votes, then you could still audit whether they're casting their votes honestly -- for example, create a test story, use 5 sockpuppet accounts to digg it 5 times, then tell your confederate to digg it. If the number of diggs doesn't go up to 6, then you know they're not honoring their end of the deal, and kick them out of the system. As long as most confederates think there might be some chance of getting caught if they don't play along, most of them would probably cast the votes that they were paid for, since it costs them nothing to do so and they wouldn't want to jeopardize their stream of easy money.
I asked the owner of User/Submitter if his service could defeat the random-sampling algorithm I described. "It would slow down our service," he answered, "but certainly wouldn't eliminate it because eventually a U/S User will have an opportunity to vote on a U/S Submission by way of chance." But I don't see how this would beat the algorithm -- some U/S voters would still get to vote on the story, but as long as there are far more legitimate voters than U/S voters, then a random sampling will almost always contain far more legitimate voters. The U/S owner also said, "Randomized voting privileges would be unnecessarily confusing, frustrating, and fragmenting. Not to forget: unfair and undemocratic." Well, you could keep it from being "confusing" or "frustrating" by keeping the existing interface (with the possible addition of a randomly-selected-story box), so that the only changes would be in how the votes are handled under the hood. "Fragmenting"? If anything, it seems to me that the existing Digg/reddit algorithms would be more fragmenting, keeping users within their existing communities of friend who vote for each others' stories; a random-selection box would give stories with "crossover appeal" a greater chance of success, bringing them to the attention of users who might otherwise never have seen them. As for "unfair and undemocratic", presumably this is a reaction to the fact that the votes of 100 users decide what everyone else sees. But it's already the case with Digg that the votes of a small number of users decide what content becomes popular. At least with a random sample of users, it would be the case that the vast majority of the time, the voting outcome would be the same as it would have been if the entire site had voted, due to the magic of representative sampling.
So, I'm putting this suggestion out there for the same reason that Jim Messenger bought out Spike The Vote -- because I don't want sites like Digg and reddit to be manipulated by the abusers. In fact, if they used this algorithm, they would become more meritocratic than they are now, because the systems would strictly favor the highest-rated content, instead of content written by people who have informal networks of friends who can all go digg their stories for them. If I were to design the user rating system to make it cheat-proof, these are the exact details of what I would do:
- Wherever they decide to post the "random story sampling" box (on the front page, or on a link off to a separate page, etc.), have it work so that as soon as new stories are submitted, they can be rotated into that box and displayed to a random set of users, until it's reached its total of 100 votes or however many are required to get a random sample.
- You can have "shutout voting" to kill off stories early that are obvious spam or otherwise really useless, without going through the full 100 votes. (For example, if 90% of the first 10 votes are negative, then stop collecting votes.) This decreases the number of users "inconvenienced" by really obvious spam and other garbage.
- For someone to submit content that gets rotated into that voting process, have them submit a Turing test (read numbers off of a graphic and type them in), or something similar. This prevents spammers from submitting spam content over and over just to have it viewed by those initial 10 voters. If they have to type in a number each time, it's not worth it.
- When users give votes to a story, give them the option to say why they voted the way that they did. (This is especially valuable if they're giving negative votes, then the submitter would know what to improve.) Personally I think the comments would be more valuable if each user can't see other users' comments, at the time they submit their own comments; this prevents the "me too" effect where everybody echoes the first two commenters. (When I ask for independent comments from people, and they almost all say the same thing without seeing each other's comments, that's when I know they have a point!)
- To prevent an attacker from having their own username hit the random-voting page over and over in hopes of voting up their own content, make sure that each user account is only allowed to vote on a given piece of content once (even if they found the content through the random-story page).
- Require a Turing test for new user signups. This would prevent an attacker from registering a huge number of accounts just to hit the random voting page with different users over and over, in hopes getting to vote on their own submitted content eventually.
Then after running this system for a while, look through some collected data to determine if the system could be more efficient. For example, do you really need a sample of 100 votes every time? Suppose you determine that in 99% of cases, you get the same result just from tabulating the first 50 votes, as you would have gotten from tabulating all 100 votes. Then you could modify the system to collect only the first 50 votes, and then make a decision.
Suggestions for improvement? Flaws (hopefully not fatal)? Everyone who cares about keeping community sites like Digg free from abuse, and who wants to create a path for the best content to rise to the top, let's put our heads together and see what we can think of. The above is intended merely as a jumping-off point, and although I've worked it over and I can't see any specific points to improve efficiency, that's probably just because I've been looking at it too long. And if you Digg this story for me I'll give you 1,000 times as much cash as I gave my Mom last Mother's Day.
-
How to Stop Digg-cheating, Forever
The following was written by frequent Slashdot editorial contributor Bennett Haselton. He writes "Recently author Annalee Newitz created a bit of a stir with the revelation that she had bought her way to the front page of the story-ranking site Digg. Since Digg allows any registered user to go to a story's URL and "digg it" in order to push it upward through the story-ranking system, it was inevitable that services like User/Submitter would come along, where a Digg user can pay for other users to cast votes to push their story up to the top. User/Submitter says they are currently backlogged and not taking new orders, but they say the service will return and will soon feature services for manipulating similar sites like Digg competitor reddit. Even if the new U/S features are vaporware, it probably won't be long before other companies offer similar services. But it seems like all of these story-ranking sites could prevent the manipulation by making one simple change to their voting algorithm."Before getting to that though, what's at stake? The revelation that Digg could be trivially manipulated did not cause the site to be overrun with bogus stories all at once -- most of the links on the front page still look interesting. Newitz said that her story, which was deliberately chosen to be as lame as possible, got buried by users soon after it hit the front page, which is how Digg cleans spam stories out of the system. However, she also said that in the time that the story was on the front page, the story got about 35,000 hits, whereupon her server crashed and the traffic was thereafter divided with two other mirror sites; presumably if the server had stayed up, she would have gotten about 100,000 hits, all for an initial expenditure of $100, which is orders of magnitude cheaper than buying advertising any other way. (If she had done the same thing with a good story instead of a deliberately lame one, presumably the traffic gains resulting from word-of-mouth and repeat visitors would have been even higher.) As long as the benefits outweigh the cost, more and more unscrupulous users are likely to pay for such services, and since the service provided by User/Submitter is easy to copy, probably similar services will spring up to drive the price down even further. If nothing changes, then eventually sites like Digg and reddit will be flooded with nothing but paid stories. Most of the stories on the front page will probably still be interesting (why would you pay to promote a link, unless it was good enough to draw repeat visitors and get the most value for your money?), but everybody who didn't pay for votes would eventually get crowded out.
One Good Samaritan, Jim Messenger, managed to shut down one Digg manipulation service called Spike The Vote, by buying it out (for a paltry $1,275 - they must have wanted to get out fast) and then turning over to Digg. He warned people that the moral was: Don't sign up for Digg manipulation services, since Digg might get your information from them and then you'll be banned. Actually, I think the moral is simpler: if you're going to try anything like that, do it from a throwaway account that you don't care about losing if you get caught. (Or, only sign up with manipulation services which publish a privacy policy promising never to share your information, especially not with sites like Digg. Then if Digg buys them out, then the site has violated their privacy policy and Digg as the new owner inherits the liability for that, so you can sue them, right?) But as the idea spreads, it will probably become impractical to play whack-a-mole by shutting down manipulation services as they keep springing up. Any time the cost of providing a service (clicking on a few buttons) is small compared to the benefits of receiving the service (100,000 hits in 24 hours), a market will exist for it one way or another, whether you're talking about drug-smuggling, prostitution, or selling Digg votes.
However, I think there's a way to fix it, and here it is. Have you ever seen people put a link in their profile to their HotOrNot picture, saying "Go here and vote me a 10!!"? Similar to the people who send links to their friends and say, "I just posted this, please Digg this for me!" The difference is that on HotOrNot, it doesn't work. On HotOrNot, you can cast votes for a picture in one of two ways. The first way is to go directly to the URL for someone's picture; the second way is to load the front page, where a random picture from the database is selected at random, and vote for whatever picture comes up. The catch is that the votes that you cast by going directly to someone's picture, are simply ignored in calculating the average score for that photo. The only votes that are counted are the votes cast for random pictures displayed on the front page. So if you want to manipulate the voting for your own photo, you'd have to load the front page hundreds of thousands of times waiting for your own picture to come up repeatedly, which is hard to do without being detected.
To enable an algorithm like this on Digg and reddit, the sites could present users with a sidebar box that displays random stories from the pool of recent submissions. (reddit already has a serendipity feature that users can use to select a random story from the available pool, which could be leveraged for this purpose.) Once a story has collected, say, 100 votes -- or whatever number is considered sufficient to provide a representative random sample of how the story appeals to people -- then on that basis the story can either be buried or promoted to the top, where it would be seen by, say, 100,000 people. The elegance of this system is that bad content would only be seen by 100 people on average before it's buried, whereas good content would be seen by all the 100,000 people who view it on the front page, so the average user sees 1,000 pieces of good content for every 1 piece of crap. Even if 75% of users ignore the random story box completely, that just means you have to display it to 400 users instead of 100 before you have enough data points for a good random sample.
I suggested essentially the same algorithm for how an open-source search engine could work without being vulnerable to gaming even by those who understood all of its inner workings. The main difference, of course, is that Digg and reddit actually exist now. Digg declined to comment on the possible merits of such an algorithm; reddit's Steve Huffman said that the idea sounded interesting, although even if the idea got full buy-in, naturally any proposed change would take a long time to bring to fruition.
But it seems that an algorithm similar to this one would be the only way to prevent cheating on sites like Digg that sort content based on user votes. So it's ironic that HotOrNot, the only site I know of that is using a variation of this algorithm and hence is probably the most secure against cheating, is also the one where cheating is least likely to be a problem. Getting a high placement on Digg might enable you to make some money, but getting a highly rated picture on HotOrNot isn't going to make you rich (unless it helps you meet a millionaire who is using the site to find his third wife). Also, making HotOrNot meritocratic doesn't give people an incentive to improve the "content" that they submit, because up to the limits of what can be done with hair and wardrobe, you can't make yourself that much more attractive. With Digg and reddit, on the other hand, I might work harder at submitting a good story, if I knew that it worked in a perfectly meritocratic fashion that pushed good stories right to the top.
If you do this, you don't need any of the other countermeasures listed in Annalee Newitz's follow-up piece "Herding the Mob", such as analyzing user account history for suspicious behavior. As long as most users in the system are legitimate, most of the users in your random sample will be legitimate as well, and their voting will be representative of what most of the community would think. A story could also get a high score within a specific sub-area of the site like the sports page, but kept off of the main site front page, if the story got a high score from a random sampling of sports-oriented users but a low score from a sample of everyone else.
You could even sub-divide the topical areas further, down to a level of granularity like "Would Barack Obama make a good president?" A site called Helium is currently trying something like this -- users can submit essays on subjects like "Racial inequality or oppression: Do they truly exist in todays society?", and vote on how to rank other essays against each other. The voting works on the random selection principle that I'm advocating here -- users are presented with a pair of randomly chosen essays from a given category (not necessarily the same category for which you submitted an essay) and told to vote for the better one, so there's no way to tell all your friends to go to the link for your essay and give it a high rating. The main limitation though is that while the votes can push you to the top of a particular sub-category, that won't cause your article to "break out" and get to the front page of the site -- Helium says that those front-page articles are chosen at random by employees from the among those articles that are highly rated within their narrow category, so just being good is not enough. And if you want to write something that doesn't fit into any existing categories, you have to create a new category for your essay like I did, which will then be a category containing one essay that nobody else ever sees. Perhaps both of these limitations could be overcome by adding the option to rate randomly selected essays on a scale of 1 to 10 -- thus providing a way to rate essays that exist alone in their own category, and also a way to find the best essays across the entire site, rated against each other.
If Digg or reddit adopts a model that uses the random-voter-selection method, then there's the issue of how to handle the votes cast by users under the current system -- the ones who go to a story link and click "digg it", which is what makes the existing system vulnerable to gaming. Digg could do what HotOrNot does, and just ignore those votes outright, but users would probably view this as deceptive. Perhaps Digg could say that votes cast by self-selected users (the ones who go straight to the story link) are counted along with votes from randomly-selected users, unless the average of the self-selected votes is significantly different from the average from the randomly-selected votes, in which case the self-selected votes are ignored. Hopefully this would satisfy most users and preserve the "community" feel of the site, and only a spoilsport would point out that counting the self-selected votes only if they agree with the randomly-selected votes, is exactly the same thing as ignoring the self-selected votes entirely.
I asked the owner of User/Submitter what he thought about this. He was willing to talk with surprising candor (except about things like his real name) and spoke as if he'd like nothing better than for Digg to make changes to their service that would block his system from working. To both Annalee Newitz and me, he said, "We find it interesting that Digg still allows anybody to view any user's diggs. By way of this 'feature,' User/Submitter is able to verify that our users actually digg the stories they're given. Without this feature, Digg users are given complete digging privacy, and User/Submitter cannot exist." Some have expressed skepticism that the Digg cheaters really want Digg to fix the problem. But as a security tester, I can understand that mentality. If you report a problem, and a company doesn't fix it, eventually you get tempted to publicize the problem to draw attention to it. And if they still don't fix it, and it's a fairly benign security hole that merely enables some pranksters to get some undeserved attention, why not build a service around exploiting the hole, if will highlight the problem and encourage it to get fixed?
So I'm going to go out on a limb and say the U/S guy sincerely wants Digg to be more secure. However I disagree with him about his proposed fix, that of hiding a user's digg history. First of all, it won't stop anyone who creates a multitude of accounts all under their control -- you can use Tor to make it appear that you're coming from many different IP addresses, and build up a history of "legitimate" votes before using your votes to push sites deliberately. (Be sure to use different browsers, or vary your User-Agent header if you know how to do that, so that a series of votes from identical browser types doesn't give you away.) If your service does work by paying other users to cast votes, then you could still audit whether they're casting their votes honestly -- for example, create a test story, use 5 sockpuppet accounts to digg it 5 times, then tell your confederate to digg it. If the number of diggs doesn't go up to 6, then you know they're not honoring their end of the deal, and kick them out of the system. As long as most confederates think there might be some chance of getting caught if they don't play along, most of them would probably cast the votes that they were paid for, since it costs them nothing to do so and they wouldn't want to jeopardize their stream of easy money.
I asked the owner of User/Submitter if his service could defeat the random-sampling algorithm I described. "It would slow down our service," he answered, "but certainly wouldn't eliminate it because eventually a U/S User will have an opportunity to vote on a U/S Submission by way of chance." But I don't see how this would beat the algorithm -- some U/S voters would still get to vote on the story, but as long as there are far more legitimate voters than U/S voters, then a random sampling will almost always contain far more legitimate voters. The U/S owner also said, "Randomized voting privileges would be unnecessarily confusing, frustrating, and fragmenting. Not to forget: unfair and undemocratic." Well, you could keep it from being "confusing" or "frustrating" by keeping the existing interface (with the possible addition of a randomly-selected-story box), so that the only changes would be in how the votes are handled under the hood. "Fragmenting"? If anything, it seems to me that the existing Digg/reddit algorithms would be more fragmenting, keeping users within their existing communities of friend who vote for each others' stories; a random-selection box would give stories with "crossover appeal" a greater chance of success, bringing them to the attention of users who might otherwise never have seen them. As for "unfair and undemocratic", presumably this is a reaction to the fact that the votes of 100 users decide what everyone else sees. But it's already the case with Digg that the votes of a small number of users decide what content becomes popular. At least with a random sample of users, it would be the case that the vast majority of the time, the voting outcome would be the same as it would have been if the entire site had voted, due to the magic of representative sampling.
So, I'm putting this suggestion out there for the same reason that Jim Messenger bought out Spike The Vote -- because I don't want sites like Digg and reddit to be manipulated by the abusers. In fact, if they used this algorithm, they would become more meritocratic than they are now, because the systems would strictly favor the highest-rated content, instead of content written by people who have informal networks of friends who can all go digg their stories for them. If I were to design the user rating system to make it cheat-proof, these are the exact details of what I would do:
- Wherever they decide to post the "random story sampling" box (on the front page, or on a link off to a separate page, etc.), have it work so that as soon as new stories are submitted, they can be rotated into that box and displayed to a random set of users, until it's reached its total of 100 votes or however many are required to get a random sample.
- You can have "shutout voting" to kill off stories early that are obvious spam or otherwise really useless, without going through the full 100 votes. (For example, if 90% of the first 10 votes are negative, then stop collecting votes.) This decreases the number of users "inconvenienced" by really obvious spam and other garbage.
- For someone to submit content that gets rotated into that voting process, have them submit a Turing test (read numbers off of a graphic and type them in), or something similar. This prevents spammers from submitting spam content over and over just to have it viewed by those initial 10 voters. If they have to type in a number each time, it's not worth it.
- When users give votes to a story, give them the option to say why they voted the way that they did. (This is especially valuable if they're giving negative votes, then the submitter would know what to improve.) Personally I think the comments would be more valuable if each user can't see other users' comments, at the time they submit their own comments; this prevents the "me too" effect where everybody echoes the first two commenters. (When I ask for independent comments from people, and they almost all say the same thing without seeing each other's comments, that's when I know they have a point!)
- To prevent an attacker from having their own username hit the random-voting page over and over in hopes of voting up their own content, make sure that each user account is only allowed to vote on a given piece of content once (even if they found the content through the random-story page).
- Require a Turing test for new user signups. This would prevent an attacker from registering a huge number of accounts just to hit the random voting page with different users over and over, in hopes getting to vote on their own submitted content eventually.
Then after running this system for a while, look through some collected data to determine if the system could be more efficient. For example, do you really need a sample of 100 votes every time? Suppose you determine that in 99% of cases, you get the same result just from tabulating the first 50 votes, as you would have gotten from tabulating all 100 votes. Then you could modify the system to collect only the first 50 votes, and then make a decision.
Suggestions for improvement? Flaws (hopefully not fatal)? Everyone who cares about keeping community sites like Digg free from abuse, and who wants to create a path for the best content to rise to the top, let's put our heads together and see what we can think of. The above is intended merely as a jumping-off point, and although I've worked it over and I can't see any specific points to improve efficiency, that's probably just because I've been looking at it too long. And if you Digg this story for me I'll give you 1,000 times as much cash as I gave my Mom last Mother's Day.
-
Could Open Source Lead to a Meritocratic Search Engine?
Slashdot contributor Bennett Haselton writes "When Jimmy Wales recently announced the Search Wikia project, an attempt to build an open-source search engine around the user-driven model that gave birth to Wikipedia, he said his goal was to create "the search engine that changes everything", as he underscored in a February 5 talk at New York University. I think it could, although not for the same main reasons that Wales has put forth -- I think that for a search engine to be truly meritocratic would be more of a revolution than for a search engine to be open-source, although both would be large steps forward. Indeed, if a search engine could be built that really returned results in order of average desirability to users, and resisted efforts by companies to "game" the system (even if everyone knew precisely how the ranking algorithm worked), it's hard to overstate how much that would change things both for businesses and consumers. The key question is whether such an algorithm could be created that wouldn't be vulnerable to non-merit-based manipulation. Regardless of what algorithms may be currently under consideration by thinkers within the Wikia company, I want to argue logically for some necessary properties that such an algorithm should have in order to be effective. Because if their search engine becomes popular, they will face such huge efforts from companies trying to manipulate the search results, that it will make Wikipedia vandalism look like a cakewalk." The rest of his essay follows.This will be a trip into theory-land, so it may be frustrating to users who dislike talk about "vaporware" and want to see how something works in practice. I understand where you're coming from, but I submit it's valuable to raise these questions early. This is in any case not intended to supplant discussion about how things are things are currently progressing.
First, though, consider the benefits that such a search engine could bring, both to content consumers and content providers, if it really did return results sorted according to average community preferences. Suppose you wanted to find out if you had a knack for publishing recipes online and getting some AdSense revenue on the side. You take a recipe that you know, like apple pie, and check out the current results for "apple pie". There are some pretty straightforward recipes online, but you believe you can create a more complete and user-friendly one. So you write up your own recipe, complete with photographs of the process showing how ingredients should be chopped and what the crust mixture should look like, so that the steps are easier to follow. (Don't you hate it when a recipe says "cut into cubes" and you want to throttle the author and shout, "HOW BIG??" It drove me crazy until I found CookingForEngineers.com.) Anyway, you submit your recipe to the search engine to be included in the results for "apple pie", and if the sorting process is truly meritocratic, your recipe page rises to the top. Until, that is, someone decides to surpass you, and publishes an even more user-friendly recipe, perhaps with a link to a YouTube video of them showing how to make the pie, which they shot with a tripod video camera and a clip-on mike in their well-lit kitchen. In a world of perfect competition, content providers would be constantly leapfrogging each other with better and better content within each category (even a highly specific one like apple pie recipes), until further efforts would no longer pay for themselves with increased traffic revenue. (The more popular search terms, of course, would bring greater rewards for those listed at the top, and would be able to pay for greater efforts to improve the content within that category.) But this constant leapfrogging of better and better content requires efficient and speedy sorting of search results in order to work. It doesn't work if the search results can be gamed by someone willing to spend effort and money (not worth it for the author of a single apple pie recipe, but worth it for a big money-making recipe site), and it doesn't work if it's impossible for new entrants to get hits when the established players already dominate search results.
Efficient competition benefits consumers even more for results that are sorted by price (assuming that among comparable goods and services, the community promotes the cheapest-selling ones to the top of the search results, as "most desirable"). If you were a company selling dedicated Web hosting, for example, you would submit your site to the engine to be included in results for "dedicated hosting". If you could demonstrate to the community that your prices and services were superior to your competitors', and if the ranking algorithm really did rank sites according to the preferences of the average user, your site could quickly rise to the top, and you'd make a bundle on new sales -- until, of course, someone else had the same idea and knocked you out of the top spot by lowering their prices or improving their services. The more efficient the marketplace, the faster prices fall and service levels rise, until the prices just covered the cost of providing the service and compensating the business owner for their time. It would be a pure buyer's market.
It's important to precisely answer the question: Why would this system be better than a system like Google's search algorithm, which can be "gamed" by enterprising businesses and which doesn't always return the results first that the user would like the most? You might be tempted to answer that in an inefficient marketplace created by an inefficient search result sorting algorithm, a user sometimes ends up paying $79/month for hosting, instead of the $29/month that they might pay if the marketplace were perfectly efficient. But this by itself is not necessarily wasteful. The extra $50 that the user pays is the user's loss, but it's also the hosting company's gain. If we consider costs and benefits across all parties, the two cancel out. The world as a whole is not poorer because someone overpaid for hosting.
The real losses caused by an inefficient search algorithm, are the efforts spent by companies to game the search results (e.g. paying search engine optimization firms to try and get them to the top Google spot), and the reluctance of new players to enter that market if they don't have the resources to play those games. If two companies each spend $5,000 trying to knock each other off of the top spot for a search like "weddings", that's $5,000 worth of effort that gets burned up with no offsetting amount of goods and services added to the world. This is what economists call a deadweight loss, with no corresponding benefit to any party. The two wedding planners might as well have smashed their pastel cars into each other. Even if a single company spends the effort and money to move from position #50 to position #1, that gain to them is offset by the loss to the other 49 companies that each moved down by one position, so the net benefit across all parties is zero, and the effort that the company spent to raise their position would still be a deadweight loss.
On the other hand, if search engine results were sorted according to a true meritocracy, then companies that wanted to raise their rankings would have to spend effort improving their services instead. This is not a deadweight loss, since these efforts result in benefits or savings to the consumer.
I've been a member of several online entrepreneur communities, and I'd conservatively estimate that members spend less than 10% of the time talking about actually improving products and services, and more than 90% of the time talking about how to "game" the various systems that people use to find them, such as search engines and the media. I don't blame them, of course; they're just doing what's best for their company, in the inefficient marketplace that we live in. But I feel almost lethargic thinking of that 90% of effort that gets spent on activities that produce no new goods and services. What if the information marketplace really were efficient, and business owners spent nearly 100% of their efforts improving goods and services, so that every ounce of effort added new value to the world?
Think of how differently we'd approach the problem of creating a new Web site and driving traffic to it. A good programmer with a good idea could literally become an overnight success. If you had more modest goals, you could shoot a video of yourself preparing a recipe or teaching a magic trick, and just throw it out there and watch it bubble its way up the meritocracy to see if it was any good. You wouldn't have to spend any time networking or trying to rig the results, you just create good stuff and put it out there. No, despite whatever cheer-leading you may have heard, it doesn't quite work that way yet -- good online businessmen still talk about the importance of networking, advertising, and all the other components of gaming the system that don't relate to actually improving products and services. But there is no reason, in principle, why a perfectly meritocratic content-sorting engine couldn't be built. Would it revolutionize content on the Internet? And, could Search Wikia be the project to do it, or play a part in it?
Whatever search engine the Wikia company produced, it would probably have such a large following among the built-in open-source and Wikipedia fan base, that traffic wouldn't be a problem -- companies at the top of popular search results would definitely benefit. The question is whether the system can be designed so that it cannot be gamed. I agree with Jimmy Wales's stated intention to make the algorithm completely open, since this makes it easier for helpful third parties to find weaknesses and get them fixed, but of course it also makes it easier for attackers to find those weaknesses and exploit them. If you think Microsoft paying a blogger to edit Wikipedia is a problem, imagine what companies will do to try and manipulate the search results for a term like "mortgage". So what can be done?
The basic problem with any community that makes important decisions by "consensus" is that it can be manipulated by someone who creates multiple phantom accounts all under their control. Then if a decision is influenced by voting -- for example, the relative position of a given site in a list of search results -- then the attacker can have the phantom accounts all vote for one preferred site. You can look for large numbers of accounts created from the same IP address, but the attacker could use Tor and similar systems to appear to be coming from different IPs. You could attempt to verify the unique identity of each account holder, by phone for example, but this requires a lot of effort and would alienate privacy-conscious users. You could require a Turing test for each new account, but all this means is that an attacker couldn't use a script to create their 1,000 accounts -- an attacker could still create the accounts if they had enough time, or if they paid some kid in India to create the accounts. You could give users voting power in proportion to some kind of "karma" that they had built up over time by using the site, but this gives new users little influence and little incentive to participate; it also does nothing to stop influential users from "selling out" their votes (either because they became disillusioned, or because they signed up with that as their intent from the beginning!).
So, any algorithm designed to protect the integrity of the Search Wikia results would have to deal with this type of attack. In a recent article about Citizendium, a proposed Wikipedia alternative, I argued that you could deal with conventional wiki vandalism by having identity-verified experts sign off on the accuracy of an article at different stages. That's practical for a subject like biology, where you could have a group of experts whose collective knowledge covers the subject at the depth expected in an encyclopedia, but probably not for a topic like "dedicated hosting" where the task is to sift through tens of thousands of potential matches and find the best ones to list first. You need a new algorithm to harness the power of the community. I don't know how many possible solutions there are, but here is one way in which it could be done.
Suppose a user submits a requested change to the search results -- the addition of their new Site A, or the proposal that Site A should be ranked higher. This decision could be reviewed by a small subset of registered users, selected at random from the entire user population. If a majority of the users rate the new site highly enough as a relevant result for a particular term, then the site gets a high ranking. If not, then the site is given a low ranking, possibly with feedback being sent to the submitter as to why the site was not rated highly. The key is that the users who vote on the site have to be selected at random from among all users, instead of letting users self-select to vote on a particular decision.
The nice property of this system is that an attacker can't manipulate the voting simply by having a large number of accounts at their control -- they would have to control a significant proportion of accounts across the entire user population, in order to ensure that when the voters were selected randomly from the user population, the attacker controlled enough of those accounts to influence the outcome. (If an attacker ever really did spend the resources to reach that threshold point, and it became apparent that they were manipulating the votes, those votes could be challenged and overridden by a vote of users whose identities were known to the system. This would allow the verified-identity users to be used as an appeal of last resort to block abuse by a very dedicated adversary, while not requiring most users to verify their identity. This is basically what Jimmy Wales does when he steps in and arbitrates a Wikipedia dispute, acting as his own "user whose identity is known".)
This algorithm for an "automated meritocracy" (automeritocracy? still not very catchy at 7 syllables) could be extended to other types of user-built content sites as well. Musicians could submit songs to a peer review site, and the songs would be pushed out to a random subset of users interested in that genre, who would then vote on the songs. (If most users were too apathetic to vote, the site could tabulate the number of people who heard the song and then proceeded to buy or download it, and count those as "votes" in favor.) If the votes for the song are high enough, it gets pushed out to all users interested in that genre; if not, then the song doesn't make it past the first stage. If there are 100,000 users subscribed to a particular genre, but it only takes ratings from 100 users to determine whether or not a song is worth pushing out to everybody, that means that when "good" content is sent out to all 100,000 people but "bad" content only wastes the time of 100 users, the average user gets 1,000 pieces of "good" content for every 1 piece of "bad" content. New musicians wouldn't have to spend any time networking, promoting, recruiting friends to vote for them -- all of which have nothing to do with making the music better, and which fall into the category of deadweight losses described above.
An automeritocracy-like system could even be used as a spam filter for a large e-mail site. Suppose you want to send your newsletter to 100,000 Hotmail users (who really have signed up to receive it). Hotmail could allow your IP to send mail to 100,000 users the first time, and then if they receive too many spam complaints, block your future mailings as junk mail. But if that's their practice, there's nothing to stop you from moving to a new, unblocked IP and repeating the process from there. So instead, suppose that Hotmail stores your 100,000 received messages temporarily into users' "Junk Mail" folders, but selectively releases a randomly selected subset of 100 messages into users' inboxes. Suppose for arguments' sake that when a message is spam, 20% of users click the "This is spam" button, but if not, then only 1% of users click it. Out of the 100 users who see the message, if the number who click "This is spam" looks close to 1%, then since those 100 users were selected as a representative sample of the whole population, Hotmail concludes that the rest of the 100,000 messages are not spam, and moves them retroactively to users' inboxes. If the percentage of those 100 users who click "This is spam" is closer to 20%, then the rest of the 100,000 messages stay in Junk Mail. A spammer could only rig this system if they controlled a significant proportion of the 100,000 addresses on their list -- not impossible, but difficult, since you have to pass a Turing test to create each new Hotmail account.
The problem is, there's a huge difference between systems that implement this algorithm, and systems that implement something that looks superficially like this algorithm but actually isn't. Specifically, any site like HotOrNot, Digg, or Gather that lets users decide what to vote on, is vulnerable to the attack of using friends or phantom users to vote yourself up (or to vote someone else down). In a recent thread on Gather about a new contest that relied on peer ratings, many users lamented the fact that it was essentially rigged in favor of people with lots of friends who could give them a high score (or that ratings could be offset unfairly in the other direction by "revenge raters" giving you a 1 as payback for some low rating you gave them). I assume that the reason such sites were designed that way is that it just seemed natural that if your site is driven by user ratings, and if people can see a specific piece of content by visiting a URL, they should have the option on that page to vote on that content. But this unfortunately makes the system vulnerable to the phantom-users attack.
(Spam filters on sites like Hotmail also probably have the same problem. We don't know for sure what happens when the user clicks "This is spam" on a piece of mail, but it's likely that if a high enough percentage of users click "This is spam" for mail coming from a particular IP address, then future mails from that IP are blocked as spam. This means you could get your arch-rival Joe's newsletter blacklisted, by creating multiple accounts, signing them up for Joe's newsletter, and clicking "This is spam" when his newsletters come in. This is an example of the same basic flaw -- letting users choose what they want to vote on.)
So if the Wikia search site uses something like this "automeritocracy" algorithm to guard the integrity of its results, it's imperative not to use an algorithm vulnerable to the hordes-of-phantom-users attack. Some variation of selecting random voters from a large population of users would be one way to handle that.
Finally, there is a reason why it's important to pay attention to getting the algorithm right, rather than hoping that the best algorithm will just naturally "emerge" from the "marketplace of ideas" that results from different wiki-driven search sites competing with each other. The problem is that competition between such sites is itself highly inefficient -- a given user may take a long time to discover which site provides better search results on average, and in any case, it may be that Wiki-Search Site "B" has a better design but Wiki-Search Site "A" had first-mover advantage and got a larger number of registered users. When I wrote earlier about why I thought the Citizendium model was better than Wikipedia, several users pointed out that it may be a moot point, for two main reasons. First, most users will not switch to a better alternative if it never occurs to them. Second, for sites that are powered by a user community, it's very hard for a new competitor to gain ground, even with a superior design, if the success of your community depends on lots of people starting to use it all at once. You could write a better eBay or a better Match.com, but who would use it? Your target market will go to the others because that's where everybody else is. Citizendium is, I think, a special case, since they can fork articles that started life on Wikipedia, so Wikipedia doesn't have as huge of an advantage over them as they would if Citizendium had to start from scratch. But the general rule about imperfect competition still applies.
It's a chicken-and-egg problem: You can have Site A that works as a pure meritocracy, and Site B that works as an almost-meritocracy but can be gamed with some effort. But Site B may still win because the larger environment in which they compete with each other, is not itself a meritocracy. So we just have to cross our fingers and hope that Search Wikia gets it right, because if they don't, there's no guarantee that a better alternative will rise to take its place. But if they get it right, I can hardly wait to see what changes it would bring about.
-
Viva Piñata Apparently 'For Girls'
An anonymous reader writes "Bill Gates has demonstrated his unique public speaking skillset again, this time by further ostracizing gamers who grew to love one of the best Xbox 360 titles of last year - Viva Piñata. Comments made by Mr. Gates during an interview on the Charlie Rose show include the choice comment 'We have a thing called Viva Piñata that's for young girls, where you're tending a garden and these animals come along...'. His comment are carried by Eurogamer, who also provide a link to the YouTube video of the interview. For gamers who really appreciated this under-marketed and lovably quirky title, this is just another low blow." -
Did Producer Timbaland Steal From the Demoscene?
gloom writes "In 2000 the Finnish demoscene musician Janne Suni (also known as 'Tempest') won the Oldskool Music Competition at the Assembly demoparty with his four-channel Amiga .MOD entitled 'Acid Jazzed Evening.' A Commodore 64 musician called 'grg' remade the song on the C64 (using the infamous SID soundchip); it is this that was stolen. The producer's name is Timbaland and he is one of the hottest names in American music these days. The track in question is called 'Do it' and it is featured on the Nelly Furtado album 'Loose' on the Geffen label. Getting nowhere with Geffen, the demoscene has now risen to the aid of Tempest, first by creating a stir at SomethingAwful (files downloadable from the forum), then at Digg.com, then on YouTube, with a video demonstrating the blatant ripoff. Being an online-posting musician myself — what rights do I have if this should ever happen to me, and what can be done to raise awareness about such things?" -
Citizen Journalism Expert Jay Rosen Answers Your Questions
We posted Jay Rosen's Call for Questions on September 25. Here are his answers, into which he's obviously put plenty of time and thought. This is a "must read" for anyone interested in the growing "citizen journalism" movement either as a writer/editor or as an audience member -- and please note that Rosen and many others say, over and over, that one of the major shifts in the news media, especially online, is that there is no longer any need to be one or the other instead of both.
1) Where do you see newspapers' role in this?
by Stick_Fig
First off, my credentials: I'm the former employee of an experimental newspaper, Bluffton Today, located in Bluffton, South Carolina. It's an exciting place, let me tell you. The focus has been on reverse publishing but at the same time tempering blogs with traditional journalism. The staff still writes articles; they still edit heavily. They use the web only to the degree where it doesn't dip into libel and slander and builds on its strengths. My question to you is, do you think Bluffton is on the right track? It felt like, in the 15 months I was there, they definitely were, but I'm a biased party. I left thinking, "If only newspapers did more of this..." I know what I'm betting the farm on in my career, and it isn't tired, boring, traditional journalism. It isn't the straight and narrow of blogs, either. Rather, I feel that it's important to look at both sides and find how they can work together, because God knows there's some 60-year-old editor somewhere who won't look at Bluffton as anything more than a gimmick. I'm gonna be that guy in the newsroom fighting the good fight to get more untraditional voices into the the paper in more places than the editorial page.
Rosen:
Bluffton Today (Bluffton, SC is near Hilton Head Island) did several things that were important to try in 2005. They said the editorial engine would be the online edition; it would "produce" the printed paper. This is the opposite of how newspapers did things for the first ten years of their Web lives. They just re-purposed the content from the print edition, and called that an "online newspaper."
By reversing what's primary in production you change head sets in the newsroom because a professional newsroom engineers everything--including the talents of its employees--around the production ordeal. The "daily miracle" it was once called, because making the newspaper required such a fantastic act of just-in-time coordination. Many things had to be routinized for the miracle to occur. (Including ideas about journalism and the user's place in it.)
Steve Yelvington of Morris Digital Works, who worked on the Bluffton Today site, called it an "inversion" because content would flow from the Web to print rather than vice versa. The editorial engine should be the more interactive one, in which more of the community can participate. The goal was a virtuous circle. "Community conversation feeds professional journalism. Journalism feeds conversation. And around, and around." I think there is something to that idea.
How well it works is for people in Bluffton to address. I like that Bluffton Today tried to go Lessig on the news industry. It ditched the read only platform and re-built on read/write. Yelvington said at the launch: "Everyone gets a blog. Not just staffers, but everyone in the community. LeMonde (France) and the Mail and Guardian (South Africa) are doing this, too." Giving everyone a blog may be an obvious idea. But it's a different track. "Everyone gets a photo gallery. Everyone can contribute events to a shared public community calendar...." The site was built on Drupal technology. It had free classifieds. It was different.
If the experience of doing Bluffton Today has tempered some of that initial boldness, that's as it should be. I'm not surprised that the staff still writes articles; they still edit heavily. A web-to-print, highly-interactive, low barrier to entry, read-write, everyone-contributes newspaper is still a daily production headache. Articles, photos, headlines, and ads have to come together. Unedited, the site would have almost no value, although it can have unedited parts with high value.
"It isn't tired, boring, traditional journalism. It isn't the straight and narrow of blogs, either. It's important to look at both sides..." I agree with that, Stick. My new adventure, NewAssignment.Net, is a hybrid site for that reason. (Pros and amateurs collaborate on reporting projects.) In January of 2005 I wrote Bloggers vs. Journalists is Over for the same reason.
Bluffton today was a first wave attempt at innovation. Today initiatives like that face some second wave facts. Bringing capacity online does not itself create activity, so if you're counting on user activity, you better come with more than nifty new capacity. Create more writers and suddenly you may need more editors. "The conversation feeds journalism, journalism feeds the conversation" is a powerful idea, but we are several steps away from knowing how it works to create a live, intelligent filter in the newsroom.
There's just a long way to go. But yeah, you were on the right track working for those guys. Deeply so.
2) How to Get More Respect
by NewYorkCountryLawyer
I am convinced that online media have made a huge contribution to getting out the truth when the corporate media are seeking to suppress the truth. While there are a growing number of people aware of this phenomenon, reports in the 'blogosphere' just do not get the same respect and currency received by reports in the 'major' or 'corporate' media. What do we, as a community, need to do to enhance the respect internet journalists receive in the world at large?
Rosen:
Well, "suppressing" the truth is not how I see the failures of modern journalism, or of our current press. I think it's bigger than that.
Bob Woodward, who is in the news this week, is at the top of the reporting game, an industry unto himself. In two books, Bush at War and Plan of Attack, he failed to tell the truth about the Bush White House because his methods were not up to the obstacle they met: an administration that had broken through all the reality checks normally placed on a president and his closest aides. One by one these measures came under abnormal stress. The policy-making process used by presidents got subverted. The normal channels for sounding out opinion were just disowned. The intelligence community came under extreme stress when asked to supply facts for a decision already made.
A Congress controlled by the same party was expected to go along, which meant accepting the president's definition of reality. Oversight got evacuated. The normal tensions with the press were driven deeper: keep them back, keep them out, tell them nothing, tear them down. If someone does break a story from inside you immediately punish and isolate anyone who spoke to the reporter. You make them disown their words. You make them repent.
This is the story Woodward missed because he got inside it, so to speak. Ron Suskind, one of the few in Washington who did not miss that story, called it "the retreat from empiricism." To me, it's the big narrative yet to come out about the Bush White House. Attack Without a Plan was too crazy to be credible to Woodward. So he wrote Plan of Attack instead. I haven't read his new book yet, just the reviews and excerpts. But from early accounts, State of Denial is his attempt to get back the ground he lost, despite having the best access.
Woodward didn't "suppress" the story. Rather, he couldn't imagine it. Those are the kinds of failures that interest me. Sometimes things are suppressed. Often, the truth eludes professional journalism because no one thought to look for it. I welcome your question, What do we, as a community, need to do to enhance the respect internet journalists receive in the world at large? My first answer is: we have to look for it.
You know how, when you've really mastered something and there's a news account of it, the news story will invariable get several (basic) things wrong? Eliminate the several things and respect will rise. If you want to inform the world of something, grok it before you rock it is a good simple rule.
Correct ourselves early and often. Correct the reporting in the major media, early and often. Fact check your own ass first, then your neighbor's. We should major in transparency; the "major" media will take a minor in that. Diversity of outlook in the reporters ultimately improves the reporting. The blogosphere has advantages there, especially as it does more reporting.
I think we have to accept that Big Media, which isn't going anywhere, is society's default legitimacy-distribution machine. But that doesn't mean it works well. The machine itself can lose legitimacy without exactly falling apart. If you're an upstart publisher of news and you suck at it, Big Media will try to ignore you. If you're an upstart publisher of news and you're really good at it, Big Media will try to ignore you. Then when you assume the shape of a writes-itself story--first bloggers to go to the political conventions!--Big Media will over-cover you, spreading a small bit of understanding over lots and lots of stories. Six months later it's time to debunk the trend they missed, then over-hyped and finally misdescribed. It's not personal. It's protective. It's also cheaper than figuring out what's going on.
We can win a lot of points for Net journalism just by being the opposite of that.
3) What about mob-rule journalism?
by Chas
What sort of safeguards are in place to do fact-checking and prevent false/obviously slanted mob-rule style reports from being propagated as fact?
Rosen:
People hear phrases like "an experiment in open source reporting" and they see it immediately: What's open to the wisdom of the crowd is vulnerable to the actions of the mob. Wanting to be helpful, the volunteer may slant reports without realizing it. Through the portals marked "citizen," the paid operative can also go. How do you prevent all of that?
To me this is a puzzle with many pieces. It won't have one solution; it will take many overlapping systems working together. I can't tell you--yet--how we're going to build a fact-checking and verification system into NewAssignment.Net. But I can tell you that the site will fail without one, so we'll have to try to figure it out, with help from a lot of people. To simply pass along unchecked reports received from strangers over the Net would be fantastically dumb. To discount the possibility of people trying to game the system would be dumb, too; the more successful the site is, the more probable the gaming is. Not to mention spam, duplication, all kinds of junk.
What sort of safeguards are in place? Here are my answers so far. You tell me what is missing or cracked in this foundation:
One: The editors are full time on it. Assignments flow through editors several times before they are published by NewAssignment.Net. That's the pro-am way. It's an editor's job not to be gamed, not to publish bum facts. Everything that goes out has the editor's name on it. It's not an answer to everything--this reliance on "good editors"--but it's a proven system, a simple one, and a start.
Two: Users Self-Police. I'm not sure "community" is the right word for the eventual users of New Assignment. People use that term too loosely, in my opinion. But if NewAssignment.Net develops a base of active, loyal and intelligent users, it's not unreasonable that they can help police the site, especially if they understand that verifiying information and preventing fraud are basic to everything we're trying to do. And so a second answer, after editors, is a culture among users: catch errors, catch mistakes, catch fraud and manipulation. A mob mentality has to be met by something stronger; if you attract the right kind of users, that can happen. It would be foolish to think it will just because you're counting on it.
Three: Given enough eyeballs, all facts can be checked. I think there is every chance of developing a special subgroup of users who are effective fact checkers of the larger base of contributors, including new and casual contributors. One thing we are definitely going to do is see whether retired journalists and ex-journalists will volunteer to work with other natural born sticklers and operate our fact-checking system, which not only has to work, but eventually be better than industry standard. I don't know yet what that system will look like, or how systematic it will be. One of my advisers is interested in this puzzle and working on some ideas, assisted by a professional fact checker who emailed me offering to help. That's how we are going to solve this. Social scientists call it "muddling through."
Four: The site itself has to make verification easy. I mean in the way it is built and meant to operate. For example, editors have to be able to sort the raw from the initially verified from the double checked. This is one of the challenges for the developers of the New Assignment site, which will be Chapter Three. It's a new partnership--here's an about page for them--formed by Zack Rosen, who is my nephew, one of the originators of Dean Space and the co-founder of CivicSpace on the Drupal platform; and Josh Koenig, a co-founder of DeanSpace who started Music for America, a non-profit. They are both Drupal developers, active in that community. The third partner is Matt Cheney, who is trained as a librarian and worked as a researcher at National Center for SuperComputing Applications.
They're going to build the site with open source tools. Josh Koenig has a post up about the New Assignment project. It promises an Open Practice model: "posting tutorials, video screen casts, interviews, and write ups as our own work progresses and as we research others." Verification and fact-checking have to become open practices themselves. The developers understand that.
Five: The one percent rule.. Experience suggests a small slice of users will do most of the volunteer work. According to the one percent rule in social media, which is more of a tendency than a law, "if you get a group of 100 people online then one will create content, 10 will 'interact' with it (commenting or offering improvements) and the other 89 will just view it." This bears on the verification puzzle because we're not talking about "checking" vast hordes of people. If regular contributors provide most of the contributions, their reputations for reliability can accumulate at the site. In a well-designed system that will happen.
Six: How have others solved the problem? You tell me: has creating a reliable system of volunteer contributors ever been faced before on the Web? Did it prove unsolvable? I would expect NewAssignment.Net to look at prior cases first and find the key lessons.
4) Money
by truthsearch
Do you believe that as money flows into civic journalism that it'll change the equation? Obviously there are some people who's primary goal is to become famous and/or make money through more open journalism. Will the large community of contributors flush out those with less altruistic intentions? I guess I'm really asking will civic journalism be self-correcting as it gets bigger? Or is there a way it may become just as corrupted as much of the current mainstream professional journalism?
Rosen:
I doubt there's any incorruptible system, just different kinds of pressures, with greater and lesser freedoms for the journalists involved. We can certainly hope for a self-correcting system, but it's not likely to happen on its own.
There's nothing wrong with seeking recognition for great work. People who want to be become famous or make a salary through the more open forms in Net journalism aren't the enemy. Not at all. But they are going to have to work with users under conditions that build trust and permit collaboration. It's hard for me to see how the bad actors will succeed at that, but I am not discounting it, either.
Here's a site called Sportingo. It says it's a "new type of sports media company," which is "focused on telling the story from the fans' perspective." Users can write articles, which will be "professionally edited." They can rate and comment on articles written by peers.
Sportingo will own all the content published on the site. There are no plans to pay contributors. The company is for-profit. Tal Rozow, the marketing manager, told me that that "Sportingo authors aiming for a professional writing career will be able to benefit from having by-lines appearing on our website." He said he's confident that a strong network of independent sports writers will emerge at the site, and maybe that will happen. But I'm not sure it's a system designed to build trust among all the players involved.
Everyone I have consulted about open source projects of any kind has stressed one thing over and over: the importance of understanding what would motivate people to contribute to the gift economy of the project. You have to get that right, they say. Ultimately I believe a non-profit foundation is a more secure one. If there are profits and they are extracted by the owners, not distributed to co-creators; that's a problem. If there are profits and they go into doing more and better journalism, that's different.
5) What's wrong with other extant examples?
by crush
I'm assuming that you evaluated and rejected some of the other high-profile citizen journalism outfits that predate the founding of your own project. Off my head I can think of:
* The Indymedia network is one of the longest standing examples of an attempt to have a large citizen journalist network.
* The Pacifica Network (especially the Democracy Now show)
* The New Standard
What was it that you found lacking in the above and why did you decide to start a new project instead of reforming and adapting one of the above? Do you think that your decision to accept corporate sponsorship (which is rejected by the Pacifica Network) will see your organization's focus inevitably drift toward the anodyne ineffectiveness of e.g. NPR?
(And of course, how could I forget WikiNews?)
Rosen:
There's nothing "wrong" with these prior examples. I admire them all. I was especially pleased to see that the New Standard met its do-or-die fundraising goal last week. That site is an experiment with reader-supported, totally independent, strenuously-factual reporting. High standards of verification are meant to prevail. I think the New Standard has a lot in common with professional journalism, except it rejects the political economy of commercial news media entirely. It's run as a collective among those who do the work. I am thrilled that it will remain around, because we need to try lots of solutions to how to fund serious reporting. Just as I'm thrilled that Independent Media Center and its collectives around the world keep humming. I agree with Chris Anderson that what blogging begat--citizen journalism--Indy Media begat, too.
I didn't "evaluate and reject" the New Standard, Indy Media, Pacifica and Wiki News. Nor is it my place to decide they need fixing. They don't. The people who founded those organizations deserve a lot of credit for creating something new and daring-- and genuinely alternative. They inspired me. So did lots of others. (New West, for example, or Witness.org.) NewAssignment.Net is really about a single proposition: that if journalists and networks of users can report stuff together that neither could easily do alone, the public sphere will benefit and the site will build trust. I think there's room for that.
My decision to accept $100,000 from Reuters means we'll have an editor who can test the possibilities in networked journalism, as Jeff Jarvis calls it. My job is to make sure that Reuters has no influence on that person. The company has said it will have no editorial control, and no claim on the content. I agree: it won't. I think we can persuade users that it works as advertised. But people are free to draw their own conclusions about what the gift means, and I'm sure they will.
6) Plagiarism and Ethics?
by goombah99
Lately there's been a few incidents of Plagiarism in the news, not to mention some wholesale ethical breaches of faked stories (e.g. Blair at the NY times and "a million Little pieces"). But the thing is, the reason those are news is that they are both exceptional and something that is specifically drummed in to any professional journalist not to do. Indeed, breaking this taboo is probably even more of a sin to the the fellow journalists than to the general public because of this entrenched ethic.
Yet we know that on college campuses, where we can measure the phenomenon, plagiarism is comparatively rampant. So evidently the common man cannot restrain himself.
It seems to me this is a serious issue for any new journalism form with a low barrier to entry and a high degree of anonymity for the author. How does this ethos get enforced in such a realm?
A related question is the ethical division of commentary and news. We know that's become a problem in the media for some outlets where management has a thumb on the content. But the traditional news organs, especially newspapers, still refrain for the most part. Indeed, the NY times just went so far as to remove the typeset justification from any article that contained any sort of analysis or opinion, reserving the justified typesetting for only traditional factual journalism stories so the difference is apparent to the reader from the start. How do we reinforce that ethos in the untrained journalist?
Rosen:
When people plagiarize they do it for a particular self-interested reason: to meet a deadline, get an unwanted task out of the way, get their full time salary with limited work. These motivations will probably be rarer in the New Assignment model. Why volunteer for a project only to cheat at it?
"The common man cannot restrain himself." Sorry, I don't trust that kind of language. Beyond that making stuff up is not a way to develop a base of users on the Web; people aren't that dumb! You speak of a "low barrier to entry and a high degree of anonymity for the author." But for most users the higher the anonymity factor for the author, the higher the barrier of trust.
What some people can't seem to get over is that other people can say any damn thing they want on the Internet! How can you trust any of it? is their natural reaction to all open systems. Closed systems--and professional journalism is one--develop trust in one way. Open systems have to do it a much different way. Expecting one to look like the other is unreasonable.
We aren't going to learn much about this puzzle by asking how the "common man" can be trained to imitate his betters in the news media. I refer you to sociologist Raymond Williams, who once said, "There are no masses, there are only ways of seeing people as masses." It is these ways of seeing that are retrograde. But they show up in the most surprising places.
7) Scale
by FuturePastNow
First, I'll admit that I haven't read much about citizen journalism other than Jeff Jarvis' [buzzmachine.com], but as a non-blogger thinking of getting in to it, I was wondering:
Much of the discussion seems to be about getting out from under the control of "gatekeepers" like publishers and media owners. Yet, while the internet is less concerned with money, it has its own form of currency: popularity, in the form of the link.
Doesn't this just turn the highest-traffic sites into new gatekeepers? Especially as the number of blogs increases, the gap between "rich" and "poor" expands?
I suppose what I'm really asking is, it's hard enough to get noticed today- how will someone just starting out get noticed ten years from now?
Rosen:
Ten years from now? Jeez, I have no idea what the world of media access will be like then. But anyone who is just starting out in self-publishing should consult Clay Shirky's Power Laws, Weblogs, and Inequality, so as not to become prematurely disillusioned by discovering its truths later on.
Certainly there are new gatekeepers. (Slashdot itself is one. But does it work the same way the old system did?) Traffic-wise, there's still rich and poor. (But is this list as static as that one?) Hierarchies have not gone away. (And who said they would?) Inequality has not disappeared. (But did you really think it could?)
You still have to fight to be noticed, good work can still go unnoticed. Life online is not entirely fair, or completely different. There's a new attention economy to replace the old. The sooner we reconcile ourselves to these common sense conclusions, the easier it will be to see what is actually different today.
Here are some things that stand out for me: Amateurs have joined professionals and they own a part of "the press." An audience that was once connected "up" to Big Media but not across to each other is now connected both ways. The cost for like-minded people to locate each other and collaborate has fallen dramatically. The tools of media production have been widely distributed, and broad distribution of content is no longer impossible for small, upstart producers. For professionals, they're not required to affiliate with Big Media in order to operate as a journalist, though most will. They can be stand alones and independents. The people formerly known as the audience (as I call them) are now a productive force to be reckoned with, and Big Media has just started that reckoning. The Net has new ways of distributing attention, which have taken their place alongside the old.
Still, there's a long way to go before we can say that our media system has been made more democratic, responsive and responsible.
8) What impact would this have on national elections?
by StressGuy
The Electoral process seems to be more of a "marketing contest" and marketing takes bags and bags of money. There's commercial time, signs, billboards, radio, etc. Let's face it, a commercial is, at most 90 seconds to tell me why I should vote for you - hardly enough time. So, all we see are glittering generalities or, all to often, "don't vote for the other guy" spots.
If "Citizen Journalism" takes off, do you see this as a way that candidates without the massive financial resources normally required to sustain a traditional campaign could actually compete? Could this make the "third party candidates" a credible threat? Could this actually serve to "level the playing field"?
Rosen:
We should be cautious here. I think the most we can say is that a system that was almost entirely closed and self-sustaining--in which a handful of people raised the money, took the polls, handled the candidates, made the ads, narrated the campaign and talked about the candidates on TV--has been disrupted. The people who ran it are not as confident as they once were in their ability to manage things and get the outcomes they want. Their party has been crashed, but it's not "over." Nor is it "ours."
It's possible that insurgent candidacies--not backed by current players in the system--will have an easier time of it in the years ahead, just as insurgent news providers have more of an opening now. That's as far as I would go on the leveled field.
9) Dilution of Protection
by ObsessiveMathsFreak
How long before corporations and wealthy individuals start employing goons, lawyers and wiretaps, a la HP, to threaten and intimidate citizen journalists with no real legal recourse? If faced with this, should a citizen journalist just back off and let the guilty win? How can the protections now enjoyed by the fourth estate be extended to citizen journalism without diluting them?
Rosen:
As a matter of law and public policy, I think "fourth estate" protections should focus on significant acts of journalism, not people in pre-fab categories or the kind of organization that surrounds the giver of news. All those who are engaged in the act of informing a broader public of what's going on deserve to be under the First Amendment umbrella that protects the press. The press itself is composed of amateur and professional wings.
But that's no answer to goons with lawyers who threaten to sue. Citizen journalists are definitely vulnerable there, which makes you realize why we have big media organizations in the first place. We have to be more creative. Robert Cox, head of the Media Bloggers Association (I am a founding member of the group) has shown that "an orchestrated campaign by bloggers to defend a fellow blogger in what appears to be a frivolous lawsuit" can work. That's encouraging but not a complete answer, either. Legal intimidation will happen, and I'm sure there will be times when the bad guys will win.
10) Blogging
by From A Far Away Land
When asking a primary source for information, I find that telling them I'm doing so to create a report on my blog tends to make them clam up, or continue to be unwilling to provide information that ought to be publicly available. What technique or phrases should I use to convince the interviewee that I both have a legitimate use for their information, and the right to obtain it?
Rosen:
Sometimes you have a right to obtain information from a primary source. Sometimes it's not a matter of your rights but their decision to recognize you and cooperate. If search costs are high for making an informed decision about whether to trust a blogger who shows up with questions, sources will seek to reduce costs by using reputation and even stereotype (bloggers: ugh) as proxies.
I don't think there's a proper technique or a magic phrase that will solve this problem. There's only one solution I can see. Send the guy the URL for the "about" section of your site. That page ought to persuade potential sources that legitimate use will be made of their information. It should tell them what you are up to, and why. The site itself, the reporting and commentary there, is the best reason any source has to cooperate. Ah, but how do you convince them to take the time and look?
There's at least one way. Break a story so that the source's world is talking about it and next time around the source will speak to you-- and go to your About page. I asked Dean Wright of Reuters what the biggest obstacle for NewAssignment.Net will be when it launches. "The same one that the more minor players in the mainstream media have: getting your calls returned," he said. "Then when you complete a project and publish, you may find that other media outlets are reluctant to pick up your stories." The only answer to that is "do some compelling projects that cannot be ignored."
NewAssignment.Net will try to take that advice. It will do stories developed by users into assignments that are given to journalists. It could also do stories developed by journalists and divided into parts for users to assign themselves. (Mechanical Turk meets the Center for Public Integrity.) I hope it will do stories where teams of users and journalists figure out the division of labor together.
Sometimes the network will be the knowledge producer, the journalist the enabler. Other times the journalist will be the producer, and the network the enabler. Pro-am journalism is not inherently better than am-pro. Amateur users could in some cases do it all themselves, with editors watching and giving the green light in stages. Different combinations beg to be tried. It's unwise to say in advance that we know how it will work, or that it can't. -
Growing Censorship Concerns at Digg
I find site rivalries boring, but growing concerns over Digg "censorship" have been submitted steadily for the last few months. Today two such stories were submitted so numerous that I had little choice but to post. The first claims that Digg is the editor's playground- it explains how a few users control Digg, and that it's not really the 'Democracy' that they claim it to be. Personally I think this is all totally within the rights of their editors to choose content however they like. But it's less pleasant when combined with accounts getting banned for posting content critical of digg, and watching other content getting removed for being critical of sponsors (also, here is Kevin Rose's reply). -
Growing Censorship Concerns at Digg
I find site rivalries boring, but growing concerns over Digg "censorship" have been submitted steadily for the last few months. Today two such stories were submitted so numerous that I had little choice but to post. The first claims that Digg is the editor's playground- it explains how a few users control Digg, and that it's not really the 'Democracy' that they claim it to be. Personally I think this is all totally within the rights of their editors to choose content however they like. But it's less pleasant when combined with accounts getting banned for posting content critical of digg, and watching other content getting removed for being critical of sponsors (also, here is Kevin Rose's reply). -
The Rise of Digg.com
An anonymous reader writes "Wired has a story about Digg, a community bookmarking site that creates its own version of the Slashdot effect. It's a provocatively titled piece - 'Digg Just Might Bury Slashdot' - but goes on to consider the obvious similarities between the two and the differences. Digg is more chaotic, immediate and user driven, whereas Slashdot features more in-depth and technical discussions." Well, I hate navel-gazing news but I think the aggregation of blogs is a critical step in the future of on-line content, and Digg is doing good work here. The interesting thing will happen when their population grows a bit more. Scalability is hard... but I imagine the millions of dollars of VC funding will really help. -
Technology Behind Plasma Displays
digg writes "CoolTechZone.com has an in-depth article that gives an overview of how Plasma Displays work. From the article: 'So, what exactly is plasma? Plasma by definition is one of the four states of matter (apart from solid, liquid and gas) and consists of positively and negatively charged particles, which are added in roughly the same quantity.' This obviously makes the gas more or less inert but ensures that the charged particles are free to conduct electricity. Plasma can be produced if a gas is energized enough to split the molecules into positive and negatively charged ions. Mostly, the plasma displays use a mixture of noble gases like Neon and Xenon."