Slashdot Mirror


The Curious Case of Increasing Misspelling Rates On Wikipedia

An anonymous reader writes "The crowd-sourced nature of Wikipedia might imply that its content should be more 'correct' than other sources. As the saying goes, the more eyes the better. One particular student who was curious about this conducted rudimentary text mining on a sampling of the Wikipedia corpus to discover how misspelling rates on Wikipedia change through time. The results appear to indicate an increasing rate of misspellings through time. The author proposes that this consistent increase is the result of Wikipedia contributors using more complex language, which the test is unable to cope with. How do the results of this test compare to your own observations on the detail accuracy of massively crowd-sourced applications?"

16 of 285 comments (clear)

  1. Spellink chekers. Duh! by icebike · · Score: 5, Insightful

    Every web browser as auto spell-check capabilities these days. Most of them correct as you type.
    So why should there be any misspellings on something that is managed strictly from a web interface?

    Is it part of the arrogance of those electing themselves to write and editing articles on wiki that they refuse to use a spell checker, or
    is it that the words are simply unknown to the normal spell-check dictionaries?

    I find occasional misspellings in mainstream news articles as well (and I am by no means a natural born speller).

    But most maddening to me is the "they're their there" errors, and similar wrong word usage.
    Spell checkers offer little help in catching these, but a 6th grade education usually suffices.

    Maybe the same people who wont waist there time checking they're spelling also cant be bothered to use the write word. ;-)

    --
    Sig Battery depleted. Reverting to safe mode.
    1. Re:Spellink chekers. Duh! by hairyfish · · Score: 4, Funny

      your definately fighting a loosing battle their.

    2. Re:Spellink chekers. Duh! by hedwards · · Score: 4, Insightful

      No, it's our language when it comes to international communication. We don't own the varieties spoken in Australia, Guyana, India and whatever other regions use English, but if you want to be understood you really ought to be sticking fairly close to either British English or American English.

    3. Re:Spellink chekers. Duh! by icebike · · Score: 4, Informative

      But written Australian English is different from North American English.

      In N.A. things are similar TO each other or they are different FROM each other.

      We would no more say Different TO than we would say Similar FROM. Just seems wrong to our ears.

      --
      Sig Battery depleted. Reverting to safe mode.
    4. Re:Spellink chekers. Duh! by Stormwatch · · Score: 4, Funny

      I wish the Canadians would make their mind up. Either American or British English

      ...or French.

    5. Re:Spellink chekers. Duh! by Culture20 · · Score: 4, Funny

      "Moot" and "moat" aren't even homophones, damnit!

      "Not that there's anything wrong with that."

    6. Re:Spellink chekers. Duh! by Teancum · · Score: 4, Interesting

      I have seen articles on Wikipedia that stick around for any reasonable length of time (about six months to a year being typical) usually attract grammar nazis (or people who are annoyed by bad grammar in general) that do a copy edit and try to fix the article to make it read better. Longer articles tend to attract more people than stubs, particularly if they are well linked to other articles. The subject matter doesn't seem to make a difference, and there are a few bots on Wikipedia which try to scan articles for spelling errors and other minor issues.

      The issue of British vs. American spellings has been a long resolved issue, and for the most part consistency is more the rule than anything else. Sometimes I've seen protracted edit wars over grammar usage between several editors, but even that tends to be rather harmless.

      My point here is that the proofreading does happen, it just happens on a slower time scale and is something that usually only shows up for more mature articles, mature as in more well developed articles that seem to be trying to say something. Articles that are in a constant flux of revision will be less likely to see this kind of activity, or more accurately will tend to see such efforts wasted as the article content changes. Still, if you can get an article to "B quality" status or better, the grammar and quality of the article in terms of spelling and other aspects will be reviewed by at least somebody over time.

  2. Many of the smart people have been driven away? by Anonymous Coward · · Score: 5, Insightful

    Whether it's open source software or online collaborative projects, the smart people always get driven away over the long term. Smarter people are usually more interested in creating high-quality content, whereas stupider people end up putting out crap purely for political reasons. Eventually these stupider people start trying to modify the work of the smarter people, but do a poor job at it. When they're called out on their shitty work by the smart people, the fools make a huge stink. This soon devolves into a political mess where the smarter contributor is severely inhibited from contributing by the constant moaning and bitching of the idiots. Not wanting to waste time with such shenanigans, the smarter person leaves for some other endeavor. After a while, many of the smarter people are driven away, and the end result is that the stupider people make up the bulk of the project's contributions.

    We've seen this happen with many open source software projects, and I don't think that other kinds of online collaborative projects are any different.

    1. Re:Many of the smart people have been driven away? by jd · · Score: 4, Interesting

      I can't say I've seen that on all articles on Wikipedia, but certainly I have seen it on some. I've seen articles dumbed down to suit the majority of the readers, rather than split and refined to allow the majority a summary and those wanting more information access to that. This certainly discourages those who are subject matter experts - what's the point in being an expert in something if all that's wanted is pub quiz grade?

      However, I emphasize that this is NOT what I've seen for the majority of articles. Some articles have been abandoned (occasionally in mid-edit, from the looks of it), some are constantly being updated with updates in conflict with each other, yet others are updated and are of extraordinarily high quality. It runs the full gamut.

      I would far prefer a layered approach, so that you could get access to whatever level of detail you wanted, but the contributors just aren't there to get that. It's a pity, and the net result is uneven quality, but Wikipedia is a case where it's better to have an imperfect something than a perfect nothing.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  3. The bad drives out the good by timholman · · Score: 4, Insightful

    I can offer my own opinion of this phenomenon: the bad is driving out the good. Fewer competent writers are bothering to edit Wikipedia articles nowadays. Not only do contributions get reverted / deleted by editors who think they "own" the article, but good writers simply get tired of fixing the semi-literate ramblings of people who cannot write a coherent sentence.

    It's the old axiom that incompetent people cannot recognize their own incompetence, and so do not realize that their "contributions" are not improving the article, but instead are making it worse. Eventually the good contributors get tired of sweeping back the ocean with a broom, and just walk away from Wikipedia.

    1. Re:The bad drives out the good by FridayBob · · Score: 4, Insightful

      Totally agree! I spend the best part of *three years* working on a relatively obscure corner of WP's biology department involving some 500 articles and over 20,000 edits before finally throwing in the towel. I learned a lot during my time there, but eventually the idea of putting more effort into it just didn't make any more sense. One of their main problems is that the only thing preventing good articles from deteriorating is constant policing by knowledgeable editors -- and preferably by the people who are responsible for all the important contributions. I like to think that my contributions to WP have not been a complete waste, but if enough time goes by before anyone fills my shoes, I fear they will be. After all, what good is an article that's now only 99% accurate? 98%, 97%, 96%...

  4. Worth Posting. by DarwinSurvivor · · Score: 4, Insightful

    So slashdot has just posted an article about a test where even the test's AUTHOR believes the results are due to shortcomings in the test itself. This has to be the most pointless article I've read in a while...

  5. Um... by 93+Escort+Wagon · · Score: 5, Funny

    The crowd-sourced nature of Wikipedia might imply that its content should be more 'correct' than other sources.

    [citation needed]

    --
    #DeleteChrome
  6. Muphry's Law by AnotherScratchMonkey · · Score: 5, Informative

    icebike is a victim of Muphry's Law.

  7. Re:What's really curious by somersault · · Score: 4, Insightful

    It might also be that there are specialist words being used on Wikipedia that aren't in the dictionary.. unless this test is explicitly looking for common misspellings..

    --
    which is totally what she said
  8. Lol by lightknight · · Score: 4, Funny

    It's sad. Through all this web content, I am slowly unlearning how to spell or use proper grammar.

    English teachers / professors (with a few exceptions) used to be my arch-enemies (as a math / science person) and wished them all a pleasant, if sudden, death for their batshit-insane insistence on making mountains out of molehills (i before e, except after c; can't end a sentence with a preposition; this {subject}) with regards to the language, and yet lately I finding myself wishing there were more of them.

    It's not fair: I've nursed some of those grudges for years!

    --
    I am John Hurt.