Developing a Vandalism Detector For Wikipedia
marpot writes "In an effort to assist Wikipedia's editors in their struggle to keep articles clean, we are conducting a public lab on vandalism detection. The goal is the development of a practical vandalism detector that is capable of telling apart ill-intentioned edits from well-intentioned edits. Such a tool, which will work somewhat like a spam detector, will release the crowd's workforce currently occupied with manual and semi-automatic edit filtering. The performance of submitted detectors will be evaluated based on a large collection of human-annotated edits, which has been crowdsourced using Amazon's Mechanical Turk. Everyone is welcome to participate."
It's called Clue Bot. It's been known to revert vandalism in under 30 seconds :)
Summation 2
Amazingly my small sample is to the contrary.
I fix small errors of syntax/grammar/fact when I run across them, have never created an account, and almost all of my edits seem to stick.
Rgds
Damon
http://m.earth.org.uk/
Right now, you can think of wikipedia as having two columns per article - first is the working article column, with the second being the discussion column.
What we really need is a third column, one for the currently published version of the article.
While this may not be popular, it would go a long way to getting rid of the spam, and might even solve some of the other issues facing wikipedia.
With such a system, you could even assign articles to a subject matter expert as the editor, who could approve changes, or just incorporate the best changes in.
Not every article would need to have this, but as articles mature, they could move to this over time.
This is by far overestimated. Dependent on how elaborate your edit model ist, you can analyse edits live on a laptop.
Since the problem is tantalizingly easy to frame as a standard data-mining or machine-learning problem, albeit with some quirks, there's quite a lot of work from a lot of research groups that seems to be looking at it. Some examples: one, two, three, four, five, six, seven.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
There is an art to Wikipedia abuse. If someone cites a Wikipedia article in some argument they're making, you can always just go to Wikipedia and edit the page so that they're wrong. But that's what a novice Wikipedia vandal does.
A pro knows to edit the article in a very subtle way, so that it looks like the person has poor reading comprehension. Let's say the person cites a Wikipedia article with a sentence like this, in order to support the argument that Colbert is a Democrat.
Although by his own account he was not particularly political before joining the cast of The Daily Show, Colbert is a self-described Democrat.[12][13]
This bears the mark of authority, because of the footnote subscripts that are already on it. (We can skip the step where we maliciously relocate them here.)
A novice might change it to this (correctly preserving the authoritative footnote superscripts):
Although by his own account he was not particularly political before joining the cast of The Daily Show, Colbert is a self-described Republican.[12][13]
It makes the person appear to be wrong- and the vandalism is obvious- like swapping Eurasia for Eastasia. There's no way he could have misread that.
But change it to this
Although by his own account he was not particularly political before joining the cast of The Daily Show, Colbert has even been described as a Democrat.[12][13]
and the person looks not only wrong, but plausibly wrong because it looks like he can't read. That's what makes successful Wikipedia vandalism an art.
I believe that vandalism on Wikipedia can be limited. But would it really be possible to detect all kinds of vandalism?
FTA:
"Yahoo! Research will award a cash prize of 500 Euros to the winner of the plagiarism detection task. "
500 Euro's doesn't sound much for detecting plagiarism on a site like Wikipedia...
I edit wikipedia occasionally, and one thing I remove is unmotivated links to companies, or unnecessary mentioning of specific products. So yes, I consider it a case of vandalism. Since my edits are usually (always?) kept, I think most people agree. There is probably some policy about it, but I act on common sense there.
c++;
Case in point --- There is an article in Wikipedia about a certain country.
In that article, they blame their previous British colonial master for everything.
I tried to make some corrections to that article to make it more "neutral", and they changed it back within 10 minutes.
I tried again, and again they changed it back.
For the third time, I was warned by someone from Wikipedia (dunno if it's a volunteer or something) that I have no right to make any correction to that particular article anymore.
The "THEY" in question is the government of that country. They have a "cyber-patrol" group in charge of "online propaganda" and that Wikipedia article is one of their many lies, aka propaganda, they have put online.
Now, how do you define vandalism in this case?
Muchas Gracias, Señor Edward Snowden !