Developing a Vandalism Detector For Wikipedia
marpot writes "In an effort to assist Wikipedia's editors in their struggle to keep articles clean, we are conducting a public lab on vandalism detection. The goal is the development of a practical vandalism detector that is capable of telling apart ill-intentioned edits from well-intentioned edits. Such a tool, which will work somewhat like a spam detector, will release the crowd's workforce currently occupied with manual and semi-automatic edit filtering. The performance of submitted detectors will be evaluated based on a large collection of human-annotated edits, which has been crowdsourced using Amazon's Mechanical Turk. Everyone is welcome to participate."
It's called Clue Bot. It's been known to revert vandalism in under 30 seconds :)
Summation 2
Amazingly my small sample is to the contrary.
I fix small errors of syntax/grammar/fact when I run across them, have never created an account, and almost all of my edits seem to stick.
Rgds
Damon
http://m.earth.org.uk/
Right now, you can think of wikipedia as having two columns per article - first is the working article column, with the second being the discussion column.
What we really need is a third column, one for the currently published version of the article.
While this may not be popular, it would go a long way to getting rid of the spam, and might even solve some of the other issues facing wikipedia.
With such a system, you could even assign articles to a subject matter expert as the editor, who could approve changes, or just incorporate the best changes in.
Not every article would need to have this, but as articles mature, they could move to this over time.
This is by far overestimated. Dependent on how elaborate your edit model ist, you can analyse edits live on a laptop.
Since the problem is tantalizingly easy to frame as a standard data-mining or machine-learning problem, albeit with some quirks, there's quite a lot of work from a lot of research groups that seems to be looking at it. Some examples: one, two, three, four, five, six, seven.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
There is an art to Wikipedia abuse. If someone cites a Wikipedia article in some argument they're making, you can always just go to Wikipedia and edit the page so that they're wrong. But that's what a novice Wikipedia vandal does.
A pro knows to edit the article in a very subtle way, so that it looks like the person has poor reading comprehension. Let's say the person cites a Wikipedia article with a sentence like this, in order to support the argument that Colbert is a Democrat.
Although by his own account he was not particularly political before joining the cast of The Daily Show, Colbert is a self-described Democrat.[12][13]
This bears the mark of authority, because of the footnote subscripts that are already on it. (We can skip the step where we maliciously relocate them here.)
A novice might change it to this (correctly preserving the authoritative footnote superscripts):
Although by his own account he was not particularly political before joining the cast of The Daily Show, Colbert is a self-described Republican.[12][13]
It makes the person appear to be wrong- and the vandalism is obvious- like swapping Eurasia for Eastasia. There's no way he could have misread that.
But change it to this
Although by his own account he was not particularly political before joining the cast of The Daily Show, Colbert has even been described as a Democrat.[12][13]
and the person looks not only wrong, but plausibly wrong because it looks like he can't read. That's what makes successful Wikipedia vandalism an art.
I believe that vandalism on Wikipedia can be limited. But would it really be possible to detect all kinds of vandalism?
FTA:
"Yahoo! Research will award a cash prize of 500 Euros to the winner of the plagiarism detection task. "
500 Euro's doesn't sound much for detecting plagiarism on a site like Wikipedia...
I've seen signs of that too. Not always ... but often enough to have acquired a rather negative understanding of the role of some folk with admin privileges at WP. It's clear when they haven't even bothered to read (much less understand!) the edits they revert. Or that they just revert anything that offends an ideology they want WP to present on any particular topics. They think NPV shouldn't apply to their gloriously elevated selves. (And refuse to acknowledge when their ideology is showing.)
That's on top of editors just flagging articles as sub-par but without saying specifically why, or responding to queries about WTF they meant. Not every article should consist of 50% citations and 50% content ... if you're going to say there aren't enough citations, just be specific about which statements you think need citations; that's easy to do. And maybe ... read the citations which are already there. Or even use the Talk: page appropriately, to discuss such issues, if you can't yet be specific enough to be actionable.
The messages some admins give is that if you're not part of their particular club, Please Go Away. Some are even quite public that they object to edits from folk without accounts ... regardless of the content of those edits. Way too many obnoxious A**hats have admin privs there.
How about letting us flag such editors/admins as comment spammers? It's not like their volume of vague and un-actionable criticisms, or inappropriate reversions, really helps improve WP. While unlike real spammers, their negative effects are actually hard to correct.
I edit wikipedia occasionally, and one thing I remove is unmotivated links to companies, or unnecessary mentioning of specific products. So yes, I consider it a case of vandalism. Since my edits are usually (always?) kept, I think most people agree. There is probably some policy about it, but I act on common sense there.
c++;
Case in point --- There is an article in Wikipedia about a certain country.
In that article, they blame their previous British colonial master for everything.
I tried to make some corrections to that article to make it more "neutral", and they changed it back within 10 minutes.
I tried again, and again they changed it back.
For the third time, I was warned by someone from Wikipedia (dunno if it's a volunteer or something) that I have no right to make any correction to that particular article anymore.
The "THEY" in question is the government of that country. They have a "cyber-patrol" group in charge of "online propaganda" and that Wikipedia article is one of their many lies, aka propaganda, they have put online.
Now, how do you define vandalism in this case?
Muchas Gracias, Señor Edward Snowden !
I'm the OP.
Anything else you're too lazy to find yourself?
I recognize that voice anywhere; you must be a Wikipedia Admin. I've been editing Wikipedia for years, but didn't know about the second two lists (the first isn't really a list of reversions, but perhaps there's a way to make it work). If I don't, then I suspect many others don't.
Which brings us back to my point: Those lists need to be part of a system -- an easily accessible, understandable system -- "for non-admins to challenge actions (without spending countless hours in an appeal process worthy of a federal court)." I don't have time to find and study every function, rule, and procedure on Wikipedia that might apply. The overhead of editing is so high -- primarily because of admin abuse -- that I've stopped doing it. The frustration of dealing with people who behave poorly doesn't help.