Slashdot Mirror


Wikipedia and Plagiarism

Spo22a writes "Daniel Brandt found the examples of suspected plagiarism at Wikipedia using a program he created to run a few sentences from about 12,000 articles against Google Inc.'s search engine. He removed matches in which another site appeared to be copying from Wikipedia, rather than the other way around, and examples in which material is in the public domain and was properly attributed. Brandt ended with a list of 142 articles, which he brought to Wikipedia's attention.... 'They present it as an encyclopedia," Brandt said Friday. "They go around claiming it's almost as good as Britannica. They are trying to be mainstream respectable.'"

15 of 267 comments (clear)

  1. That doesn't seem like alot by NinjaFarmer · · Score: 2, Insightful

    Doesn't Wikipedia have over a million articles (not in English alone, I know)? That would mean that's less than .1% of the articles are plagiarized. Seems reasonable to me that that amount would get by into unnoticed. All it takes is for the original author then to deal with it.

    1. Re:That doesn't seem like alot by sprins · · Score: 2, Insightful

      Apparently Wikipedia has over 1.5 million english articles alone. So your calculation of the percentage of 'problematic' articles is even more favourable. Of those 142 eledgedly 'problematic' articles only a few really seem to be a problem as the others originated from the public domain to begin with.

      Sounds like much ado about nothing once more. *yawn*

    2. Re:That doesn't seem like alot by aquaepulse · · Score: 4, Insightful

      Well that 142 was found out of his search of 12000, if his methodology was sound you could expect the proportion plagiarized within the 1.5 million to be about 17750. About 1.18%.

    3. Re:That doesn't seem like alot by tomhudson · · Score: 2, Informative

      ... and after an investigation of some of those by Wikipedia, it was found that some were in the public domain, some were culled from government sites, and some were copied from the wiki, and not the other way around. Of those 12,000, we can now say that the wiki is at least as clean as Ivory soap (99.44%).

    4. Re:That doesn't seem like alot by tomhudson · · Score: 2, Insightful

      Considering that an audit of dead-tree encyclopedias hasn't been done, we can't say. What we CAN say is that its foolish to make a comparison with Britannica, when an audit of Britannica found 10% of 600 articles to be non-factual. The sources cited in those 10% disavowed the articles' contents.

      This isn't all that surprising either, when you think about it. People cite people who cite people, and someone somewhere will mis-interpret what someone else wrote, or come to different conclusions while still citing the original author.

    5. Re:That doesn't seem like alot by kkwst2 · · Score: 2, Interesting

      Alarmingly high? You find it alarming that 1 of every 100 articles on a free web-based encyclopedia has plagiarized material. You are clearly much less cynical than I am. I would have guessed at least 5%, probably more.

    6. Re:That doesn't seem like alot by user24 · · Score: 5, Funny

      "It's a wiki. If you find a problem with it, you fix it."
      no, it's a wiki. If you find a problem with it, you add a template telling everyone that someone else should fix it.

  2. Impressive by Solder+Fumes · · Score: 3, Interesting

    Wow. Only 142 articles in which average Joe Wiki forgot the proper way to attribute a source. I'm actually amazed there were so few occurrences. This article has the effect of heightening my opinion of Wikipedia's quality.

  3. Not shocking, but not a big deal by Chairboy · · Score: 2, Interesting

    What's missing from the summary is that almost immediately upon getting the list, the articles in question were dealt with and the offenders were blocked or warned.

    Wikipedia is written by a large community, and people make mistakes. I have read about other reference tomes that have been caught plagiarizing (for example, some encyclopedias or atlas's will put in a fake piece of data or a fake street so that they can easily determine if they're being copied from), and the turnaround time for fixing it can be years depending on the publishing cycle.

    This isn't a condemndation of Wikipedia, despite Mr. Brandt's best efforts, it's a confirmation of why WP works.

  4. Daniel Brandt, valuable Wikipedia contributor by alienmole · · Score: 4, Insightful

    Brandt is doing a great service to Wikipedia — checking for and reporting plagiarism. That takes dedication and hard work. It's ironic that he feels the need to present it as criticims of Wikipedia's model, when in fact he's demonstrating the power of contributions from many people with different motivations. Even if the motivation is anti-Wikipedia, Wikipedia just absorbs the input and grows stronger.

    "If you strike me down, I shall become more powerful than you could possibly imagine..." -- Obi Wiki-nobi

  5. Biographical articles. by Anonymous Coward · · Score: 4, Funny

    It's very lazy of of the Wikipedia authors to enter the same biographical information as other sites.
    They should write new and interesting histories for all these people rather than using the same old worn out ideas that are on so many places on the net.
    All it takes is a little imagination.
    A new birth place, better achivements (why could hitler not have discovered the cure for cancer and be the first man on the moon? It's better than the depressing story on Wiki at the moment.) and some creative editing would solve this problem once and for all.

    Some Wiki articles are already better and contain things about people that have never happened, but sadly these often get put back to the same old boring stories almost as soon as the changes are made.

  6. Comment removed by account_deleted · · Score: 2, Insightful

    Comment removed based on user account deletion

  7. Re:US Gov copyright? by DragonWriter · · Score: 3, Insightful
    Err... I thought works of the US Government were generally free from copyright...?


    (1) The Wyoming state government is not the US government: state government works are not generally free from copyright.

    (2) Plagiarism is separate from copyright violation, anyway. Using material that is not subject to copyright or is in the public domain that is from one unique identifiable source without crediting the source is plagiarism, as is using copyright material in a way that does not violate copyright without attribution (say, fair use.) Plagiarism isn't a violation of the law, but a violation of commonly accepted standards of integrity when it comes to not claiming other's work as your own.
  8. Re:ok methodology, bad analysis by Skippy_kangaroo · · Score: 2, Informative

    12,000 is easily enough to be statistically effective. Election polling gets acceptable results with samples of about 1,000.

    Assuming that it is a binomial distribution then p=142/12000=0.0118, q=0.9882, n=12000 which means the standard error is sqrt(npq)=11.5 (approximately). Thus a 95% confidence interval is that the true number of plagiarised articles in the sample lies between 165 and 119.

    And this is only plagiarism from on-line sites that are indexed by Google. Plagiarism from dead tree sources could well be significantly more.

    This has got nothing to do with faith-based science and low analytical quality. I am once again amazed at how little people seem to know or care about proper statistics and just say "I don't believe it" if something doesn't accord with their preconceived notions.

  9. What Brandt _should_ do, rather than crowing by Howzer · · Score: 2, Insightful

    Is release the script or code that he used to generate his 142 plagiarised articles out of 12,000.

    Such a script, if tuned and more widely applied, could be extraordinarily useful in weeding out future instances of plagiarism.

    142 articles flagged, 142 articles fixed within hours. That's Wikipedia working as no dead-tree encyclopedia can.

    Of course, Brandt would never do anything as useful as that, but will probably content himself with continuing to "shoot from the hip" and claim this as a blow against the Wikipedia community, rather than a bravura demonstration of exactly how well it works.