Developing a Vandalism Detector For Wikipedia

← Back to Stories (view on slashdot.org)

Developing a Vandalism Detector For Wikipedia

Posted by kdawson on Sunday February 28, 2010 @08:45AM from the false-positives-would-hurt dept.

marpot writes "In an effort to assist Wikipedia's editors in their struggle to keep articles clean, we are conducting a public lab on vandalism detection. The goal is the development of a practical vandalism detector that is capable of telling apart ill-intentioned edits from well-intentioned edits. Such a tool, which will work somewhat like a spam detector, will release the crowd's workforce currently occupied with manual and semi-automatic edit filtering. The performance of submitted detectors will be evaluated based on a large collection of human-annotated edits, which has been crowdsourced using Amazon's Mechanical Turk. Everyone is welcome to participate."

4 of 116 comments (clear)

{{uw-vandalism1}} by Anonymous Coward · 2010-02-28 08:57 · Score: 5, Funny

Welcome to Slashdot. Although everyone is welcome to contribute to Slashdot, at least one of your recent posts did not appear to be constructive and has been modded down. Please use TrollTalk for any test edits you would like to make, and read the welcome page to learn more about contributing constructively to this web site. Thank you.
Re:Existing by marpot · 2010-02-28 09:38 · Score: 5, Informative

We have studied the accuracy of ClueBot, and found that (on a small corpus) it has very good precision (low falsy positive rate), but a very low recall (low true positive rate). (see: http://www.uni-weimar.de/medien/webis/publications/downloads/papers/stein_2008c.pdf) But the picture might look quite different on a large scale.
The Art and Science of Wikipedia Vandalism by MillionthMonkey · 2010-02-28 09:51 · Score: 5, Interesting

There is an art to Wikipedia abuse. If someone cites a Wikipedia article in some argument they're making, you can always just go to Wikipedia and edit the page so that they're wrong. But that's what a novice Wikipedia vandal does.

A pro knows to edit the article in a very subtle way, so that it looks like the person has poor reading comprehension. Let's say the person cites a Wikipedia article with a sentence like this, in order to support the argument that Colbert is a Democrat.

Although by his own account he was not particularly political before joining the cast of The Daily Show, Colbert is a self-described Democrat.[12][13]

This bears the mark of authority, because of the footnote subscripts that are already on it. (We can skip the step where we maliciously relocate them here.)

A novice might change it to this (correctly preserving the authoritative footnote superscripts):

Although by his own account he was not particularly political before joining the cast of The Daily Show, Colbert is a self-described Republican.[12][13]

It makes the person appear to be wrong- and the vandalism is obvious- like swapping Eurasia for Eastasia. There's no way he could have misread that.

But change it to this

Although by his own account he was not particularly political before joining the cast of The Daily Show, Colbert has even been described as a Democrat.[12][13]

and the person looks not only wrong, but plausibly wrong because it looks like he can't read. That's what makes successful Wikipedia vandalism an art.
Re:Existing by beakerMeep · 2010-02-28 10:41 · Score: 5, Insightful

The problem is not so simple though. You cant quantify something as subjective as vandalism. You cant reduce it to your mathematical formula no matter how statistically fancy your 6 page pdf is.

I had a particularly nasty run it with cluebot where I removed large portions of spam from an article, only to have cluebot revert it back and put the spam back in. When I again removed the spam, some other editor strolled by and again put the spam back in because he trusted the bot more than humans and he didnt read the talk page where many had requested the removal of this spam. Finally, after a rather rude conversation with the human he realized he had no business reverting it. This person was a long time editor and contributor too but it just serves as an example that any criteria used to determine spam is based upon assumptions. Assumptions that it will be true in other cases and assumptions that others will agree with the classification.

The whole point of Wikipedia is that it is a community edited encyclopedia. I have no interest in a computer edited encyclopedia. If people want to program bots to review an editor's work, perhaps we should program bots to write the work? Perhaps you can call it Botopedia. Furthermore, many of the bots ask you to report false positive to their personal pages off of Wikipedia's website on some other .com or .edu domain. They ask you to be accountable to them, but who are they accountable to? What's to stop spammers from programming bots to annoy editors as a phishing exercise?

Now don't get me wrong though, if someone wants to use a bot to aid in finding vandalism, that would help. But if the system is so frail that Wikipedia cant exist without computer program editors, It may be time to revisit the system. As others have stated, pushing edits into a queue would be much more sane than direct to live edits.

Editing bots are wrong for Wikipedia, and if they allow it they are letting go of their vision of community participation in favor of the visions (or delusions) of grand technological solutions.

--
meep