Physicists Discover Evolutionary Laws of Language
Hugh Pickens writes "Christopher Shea writes in the WSJ that physicists studying Google's massive collection of scanned books claim to have identified universal laws governing the birth, life course and death of words, marking an advance in a new field dubbed 'Culturomics': the application of data-crunching to subjects typically considered part of the humanities. Published in Science, their paper gives the best-yet estimate of the true number of words in English — a million, far more than any dictionary has recorded (the 2002 Webster's Third New International Dictionary has 348,000), with more than half of the language considered 'dark matter' that has evaded standard dictionaries (PDF). The paper tracked word usage through time (each year, for instance, 1% of the world's English-speaking population switches from 'sneaked' to 'snuck') and found that English continues to grow at a rate of 8,500 new words a year. However the growth rate is slowing, partly because the language is already so rich, the 'marginal utility' of new words is declining. Another discovery is that the death rates for words is rising, largely as a matter of homogenization as regional words disappear and spell-checking programs and vigilant copy editors choke off the chaotic variety of words much more quickly, in effect speeding up the natural selection of words. The authors also identified a universal 'tipping point' in the life cycle of new words: Roughly 30 to 50 years after their birth, words either enter the long-term lexicon or tumble off a cliff into disuse and go '23 skidoo' as children either accept or reject their parents' coinages."
Anyone that has played Scrabble (especially against a computer) know that there's tons of words out there that no one has ever heard of, most of which you can't even find a definition for. What the hell is a Qi? I don't know, but I can get 66 points for it.
Qi is a simple one, it's a two letter word and there are roughly a hundred two letter words accepted by TWL which are hackable. Qi is also something I've seen reading Chinese philosophy so that doesn't really upset me. The ones that really get me when I play against computers or people who cheat are actually the longer ones. Recently I have seen outgnawn, aliquot, mahoes, votive, the list goes on when your friends are using websites to look up permutations.
You can study this stuff and memorize things like I-dumps: ziti, ilia, ixia, inion, etc. But in the end what really got my scores higher was studying the short 2 and 3 letter words and building thick crossword-like packs of words especially over TL tiles.
My work here is dung.
Bringing mathematical rigour...
Physicists are widely known for their lack of mathematical rigor. David Hilbert, perhaps the most influential mathematician of the 20th century (who incidentally discovered Einstein's field equations before Einstein, though who was also nice enough not to get into a priority dispute since most of the work leading up to the discovery was Einstein's), is often quoted as saying some variation on, "Physics is too difficult for physicists!" His meaning was apparently that the mathematics required to rigorously justify assertions in advanced physics is often beyond the reach (or inclination) of physicists. This isn't necessarily a bad thing, by the way, but it indicates the traditional lack of rigor in physicist's math.
The paper itself says,
We use concepts from economics to gain quantitative
insights into the role of exogenous factors on the evolution
of language, combined with methods from statistical
physics to quantify the competition arising from correlations
between words and the memory-driven autocorrelations
in u_i(t) across time.
Perhaps "Bringing quantitative statistical analysis..." is a better phrase.