Slashdot Mirror


Vandalism Detection Contest Sponsored For Wikidata (wsdm-cup-2017.org)

Remember when Bing Maps lost a city because they used bad Wikipedia data? An anonymous Slashdot reader writes: Since knowledge bases like Wikidata are poised to be integrated into all kinds of information systems, wrong facts are not just displayed on Wikidata's pages but may propagate directly to all systems using the knowledge base. Hence, detecting and reverting vandalism and other kinds of damaging edits is an even more important task than on Wikipedia. Recently, German scientists published the first machine learning-based approach on vandalism detection in Wikidata, and now Adobe sponsors a competition on vandalism detection, the WSDM Cup Challenge, awarding $2500 for the best-performing solutions that will also be published open source.
"Given a Wikidata revision, compute a vandalism score denoting the likelihood of this revision being vandalism (or similarly damaging)," read the official rules, pushing for a near real-time solution to be submitted before December 22. And the winners will also be invited to the headquarters of Wikimedia Germany to discuss implenting their solutions.

3 of 38 comments (clear)

  1. Vandalism really? by Mashiki · · Score: 2

    Wikipedia has a bigger NPOV problem with their articles these days then vandalism. Especially because of people camping, or the variety of meat puppets that banned editors use to push agendas.

    --
    Om, nomnomnom...
  2. Re:Authoritarianism does not valid data by Sique · · Score: 2
    Any database will always be susceptible to containing bad data. Even those that follow the scientific methology. Any data is only preliminary, and will be thrown out until better data comes in. What you totally ignore is how to determine which of two conflicting data points is more close to be real. Wikipedia doesn't do research. That's one very important concept of Wikipedia: no original research. If the people doing the original research are losing interest in Wikipedia, or are run over by a bus, Wikipedia loses any reference point for the data they entered. So what you get is stale data and no way to find out if it is both valid and relevant. If you ban original research from Wikipedia, you at least can vouch for the relevance of the data by checking if research is still going on outside of Wikipedia, and if you can find someone to keep the Wikipedia data up-to-date.

    So your blurb about the scientific method is irrelevant for Wikipedia, as Wikipedia is just a mirror of what happens outside of it. You need other criteria to determine which data in Wikipeda to keep and which data to throw out. Checking for possible vandalism is just one of the methods to throw out irrelevant data and to keep relevant data that got overwritten by vandalism..

    --
    .sig: Sique *sigh*
  3. Re:Authoritarianism does not valid data by Mashiki · · Score: 2

    Not necessarily. You could have a second article about a heliocentric system, and maybe a third one discussing the merits of a geocentric and a heliocentric system. Just keep the original article about the heliocentric system intact!

    That's a fair point, however under today's rules at wikipedia, along with the cock-gobbling edditors. Your topic on helocentric systems would likely be flagged for deletion because it's non-notable(akin to denialism), or doesn't conform to the ruling form of orthodoxy. The sources regardless of whether or not they're factual, would suddenly be marked as unreliable, even if they had provable baseline statistical models with the peer reviewed data to back it up.

    Wikipedia simply needs to be purged of all editors and the foundation at this point with a full start over. It doesn't help the ol' Jimbo prefers to wash his hands of everything while saying "I need monies..."

    --
    Om, nomnomnom...