Developing a Vandalism Detector For Wikipedia
marpot writes "In an effort to assist Wikipedia's editors in their struggle to keep articles clean, we are conducting a public lab on vandalism detection. The goal is the development of a practical vandalism detector that is capable of telling apart ill-intentioned edits from well-intentioned edits. Such a tool, which will work somewhat like a spam detector, will release the crowd's workforce currently occupied with manual and semi-automatic edit filtering. The performance of submitted detectors will be evaluated based on a large collection of human-annotated edits, which has been crowdsourced using Amazon's Mechanical Turk. Everyone is welcome to participate."
Which part is over-estimated? All I can speak on from experience is AntiVandalBot. I ran that on an Athlon XP 2500+ (which wasn't particularly amazing at the time). It wasn't the computation that was hard, it was the network usage of downloading the diff of every edit by a non-trusted user from the RC feed. I would not have been able to run it on any home Internet connection. Thankfully I was able to place my server on an unthrottled 100 Mbps dorm connection at the University of Maryland.
I will grant you that highspeed Internet access has become a lot more widespread since 2006 (I personally have 25/15 FIOS), but at the time, there wasn't anything available residentially that could handle it.
Cyde Weys Musings - Scrutinizing the inscrutable