Slashdot Mirror


Physicists Discover Evolutionary Laws of Language

Hugh Pickens writes "Christopher Shea writes in the WSJ that physicists studying Google's massive collection of scanned books claim to have identified universal laws governing the birth, life course and death of words, marking an advance in a new field dubbed 'Culturomics': the application of data-crunching to subjects typically considered part of the humanities. Published in Science, their paper gives the best-yet estimate of the true number of words in English — a million, far more than any dictionary has recorded (the 2002 Webster's Third New International Dictionary has 348,000), with more than half of the language considered 'dark matter' that has evaded standard dictionaries (PDF). The paper tracked word usage through time (each year, for instance, 1% of the world's English-speaking population switches from 'sneaked' to 'snuck') and found that English continues to grow at a rate of 8,500 new words a year. However the growth rate is slowing, partly because the language is already so rich, the 'marginal utility' of new words is declining. Another discovery is that the death rates for words is rising, largely as a matter of homogenization as regional words disappear and spell-checking programs and vigilant copy editors choke off the chaotic variety of words much more quickly, in effect speeding up the natural selection of words. The authors also identified a universal 'tipping point' in the life cycle of new words: Roughly 30 to 50 years after their birth, words either enter the long-term lexicon or tumble off a cliff into disuse and go '23 skidoo' as children either accept or reject their parents' coinages."

7 of 287 comments (clear)

  1. Re:Scrabble by Anonymous Coward · · Score: 5, Funny

    It's a show on BBC2.

  2. Re:"Universal laws"? by allcar · · Score: 5, Informative

    Bringing mathematical rigour to fields of research where it has previously been ignored can clearly provide some interesting insights.

  3. Organizing Language Vs. The General Public by Cazekiel · · Score: 5, Informative

    My husband works for Merriam-Webster as an assistant editor/lexicographer. You wouldn't believe some of the stuff that goes on there. People will call and demand fame for a word. For example, some guy called in and said he'd been the one to come up with the word 'ginormous', and wanted credit for it. They don't seem to understand the process. MW's archives in the basement is a CIA-esque compilation of language; they'll use every collegiate they have for reference, going all the way back to the first one. Husband says it won't be long before internet-meme creations are included.

    --
    You want to know how to help your kids? LEAVE THEM THE F*&K ALONE. --George Carlin
  4. Re:Some Advice by ClioCJS · · Score: 5, Insightful

    votive? like candles? that's your example of an uncommon word? I was expecting a list of words i'd never heard of. Votive?!

    --
    -Clio
    Karma: Bad (mostly from not giving a fuck)
    Blog: http://clintjcl.wordpress.com
  5. Re:Physicists? by TheRaven64 · · Score: 5, Funny

    Why would physicists be studying this kind of thing?

    When you graduate with a PhD in physics, you get three things:

    • A piece of paper.
    • A true understanding of how little you understand about the universe.
    • An unshakable belief that any subject that is not physics is trivia and that you know more about it than people who have spent their lives studying it.

    The third means that you are obliged, at least once, to submit a paper about some other field to arxiv.org. Ideally, this paper should not cite any relevant research in the field - only other papers by physicists - and, for bonus points, should base its entire thesis a weak statistical correlation.

    --
    I am TheRaven on Soylent News
  6. Re:To the Bane of Grammar Nazi. by silentcoder · · Score: 5, Insightful

    s/threw/through/g

    "through" is an adverb indicating a passage between locations or a change of state.
    "threw" is the past tense of throw.

    Grammar Nazi's often get a bit extreme but when your basic spelling is up-to-shit the actual meaning of your writing gets lost. Yes language evolves - this means we coin new words, we gradually change laws of grammar - but it is not a license to write whatever you want and claim it means what you intended to mean.

    I'm fairly certain from context that you intended to write "through" for example - but if I hadn't recognized it I would have been wondering if you were so badly bullied that teachers actually threw you around in school.

    >I have only learned to dislike people who feel the need to correct every detail, and discredit my arguments

    It's not a discrediting of arguments to correct grammar mistakes. However, repeating them when you have been corrected just makes you look stupid. Worse, it makes you an asshole. Yeah, YOU are the asshole. Why ? Because using the proper conventions of language (grammar, spelling etc.) is a form of politeness. It makes your writing easy to read.
    Furthermore, it is to your own advantage as well. When you ignore good language rules what you write more often than not doesn't mean what you intended it to mean. Some of your readers will simply misunderstand you. Others will be annoyed. Very few will actually have a clue what you were trying to say- because what you were trying to write and what you actually write no longer bear any but the most limited of resemblances.

    The only thing that saves the grammar-ignorant from being completely illiterate is the human ability to infer meaning from context - but context is incredibly culture, time and location specific. So the meaning of your words now become discernible exclusively to people who share your background. Everybody else (that could literally be people who live two neighborhoods away) are just sitting there shaking their heads and wondering what the fuck you're trying to say.

    Oh and for a little encouragement... I am writing in my THIRD Language and very nearly all of the fucking time I get it right... you first language speakers have absolutely no excuse.

    --
    Unicode killed the ASCII-art *
  7. Tempest in a teapot by pjpII · · Score: 5, Informative

    Speaking as a linguist (working on my Ph.D.) this is something of a tempest in a tea-pot. The most relevant use would be for glottochronology - a field that's largely been abandoned by anyone seriously working on historical linguistics because of the various problems involved with that approach, including what the authors of the paper find, that the rate of word loss is not constant over time. They have a better idea of the rate of word loss, which could help improve glottochronology, but the method has a lot of flaws regardless.

    Also, the question they're asking - how do words change over time, in terms of coining, becoming current, and becoming obsolete - really isn't a question historical linguists are that concerned about. Historical linguists are much more interested in how the forms of words change over time (phonological change), or how their function changes over time (grammaticalization), whereas the coinage and loss of words isn't often so important, especially on the large scale statistical level. Furthermore, this type of model probably handles languages with phenomena like avoidance speech poorly, since that would change how and why words are kept or lost.

    Their language sample is at heart a convenience sample - they happened to have access to lots of data in those three languages, and it is largely written data. Spanish and English are both related languages with very similar cultural contexts, while Hebrew is a strange choice in that is has an ancient history, but only quite recent revitalised usage. Whether most spoken interaction (which is what linguists tend to be more interested in) has even a tiny subset of the total number of words they are talking about is an open question and would be better tested against corpora with a large quantity of spoken data such as the British National Corpus or the International Corpus of English.

    It's an interesting study, but if it hadn't been written by physicists I'm not sure if it would have ended up in Diachronica or the Journal of Historical Lingiustics, much less Science. Their "statistical rules" are interesting, but really not of any great use to wider linguistic inquiry. I think its import is really just exaggerated by the fact that science editors read Science and NOT most linguistics journals, and therefore they think it's really impressive.