Physicists Discover Evolutionary Laws of Language
Hugh Pickens writes "Christopher Shea writes in the WSJ that physicists studying Google's massive collection of scanned books claim to have identified universal laws governing the birth, life course and death of words, marking an advance in a new field dubbed 'Culturomics': the application of data-crunching to subjects typically considered part of the humanities. Published in Science, their paper gives the best-yet estimate of the true number of words in English — a million, far more than any dictionary has recorded (the 2002 Webster's Third New International Dictionary has 348,000), with more than half of the language considered 'dark matter' that has evaded standard dictionaries (PDF). The paper tracked word usage through time (each year, for instance, 1% of the world's English-speaking population switches from 'sneaked' to 'snuck') and found that English continues to grow at a rate of 8,500 new words a year. However the growth rate is slowing, partly because the language is already so rich, the 'marginal utility' of new words is declining. Another discovery is that the death rates for words is rising, largely as a matter of homogenization as regional words disappear and spell-checking programs and vigilant copy editors choke off the chaotic variety of words much more quickly, in effect speeding up the natural selection of words. The authors also identified a universal 'tipping point' in the life cycle of new words: Roughly 30 to 50 years after their birth, words either enter the long-term lexicon or tumble off a cliff into disuse and go '23 skidoo' as children either accept or reject their parents' coinages."
Anyone that has played Scrabble (especially against a computer) know that there's tons of words out there that no one has ever heard of, most of which you can't even find a definition for. What the hell is a Qi? I don't know, but I can get 66 points for it.
That stupid word always drived me crazy.
This looks like really interesting and important research - perhaps even a tenth as important as these physicists think it is!
Note to self: Make a funny sig.
How many words are "created" by young people to replace their parents' generation's word for the same thing? I suspect that many of the "new" words are already covered, but teenagers want to sound cooler than their parents, or hide their true intentions from them.
Great warrior...hrmph! Wars not make one great.
'Culturomics'? You'd think that people studying words would be able to come up with a better word than that.
When our name is on the back of your car, we're behind you all the way!
Why would physicists be studying this kind of thing?
Linguists? Etomologists, maybe? Sociologists for sure. But physicists?
This must be some new definition of "physics" that I wasn't previously aware of.
Please. No more portmanteaus with -onomics on the end. I automatically think of Regan.
I write professional videogame reviews! http://www.digitallydownloaded.net/
The OED has about 600 thousand words, though still this is a lot less than a million. It would be interesting to see the most commonly used word that isn't in the dictionary.
-- Ed Avis ed@membled.com
Anyone that has played Scrabble (especially against a computer) know that there's tons of words out there that no one has ever heard of, most of which you can't even find a definition for. What the hell is a Qi? I don't know, but I can get 66 points for it.
Qi is a simple one, it's a two letter word and there are roughly a hundred two letter words accepted by TWL which are hackable. Qi is also something I've seen reading Chinese philosophy so that doesn't really upset me. The ones that really get me when I play against computers or people who cheat are actually the longer ones. Recently I have seen outgnawn, aliquot, mahoes, votive, the list goes on when your friends are using websites to look up permutations.
You can study this stuff and memorize things like I-dumps: ziti, ilia, ixia, inion, etc. But in the end what really got my scores higher was studying the short 2 and 3 letter words and building thick crossword-like packs of words especially over TL tiles.
My work here is dung.
Why use something that already exists when you can re-invent the wheel.
Deleted
...Grand Unification Theory of Cosmology Proven.
My husband works for Merriam-Webster as an assistant editor/lexicographer. You wouldn't believe some of the stuff that goes on there. People will call and demand fame for a word. For example, some guy called in and said he'd been the one to come up with the word 'ginormous', and wanted credit for it. They don't seem to understand the process. MW's archives in the basement is a CIA-esque compilation of language; they'll use every collegiate they have for reference, going all the way back to the first one. Husband says it won't be long before internet-meme creations are included.
You want to know how to help your kids? LEAVE THEM THE F*&K ALONE. --George Carlin
The fact that language is evolving gives me no end of joy, to think of all the Grammar Nazi, getting corrected all the time because the language has changed on them.
I may just be bitter because going threw school I had one right after an other of bad English teachers, who felt that they should tell me every time I am wrong and never really explaining how I should do it right. I have never really learned proper grammar, I have only learned to dislike people who feel the need to correct every detail, and discredit my arguments. Not due to lack of logical reasoning but to technical failures in grammar and spelling.
Between K-Grad School I have had 5 English Teachers/Professors who actually were willing to help me improve my skills, who were willing to start the education with the following mindset, "you makes good points but lets make this read better" vs. what I normally get "your spelling and grammar is bad... So your points are invalid"
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
As a trained linguist, I have to treat "breakthroughs" in the field by outsiders with a big grain of salt. I don't hire a plumber to do heart surgery. I don't hire an MD to fix my car. (BTW I don't hire a lingust (like Noam Chomsky) to fix my political system either!)
I cannot find any mention of them studying anything other than English, and if they indeed only studied English then do the same finding apply to other languages? I actually highly doubt it, especially when it comes to smaller, less-used languages. Though obviously claiming to have found some universal laws regarding all languages makes for better headlines.
It's not in the dictionary. Look it up.
I see this all the time (I have a PhD in the humanities and I am a software engineer) where someone from outside the field does something and claims it is a universal law but really, they just worked on English and cannot (or will not) prove that it works for other languages. Usually, these papers also lack any kind of literature review and ignore many of the problems that this would uncover. I saw one paper by a physicist that tried to use bit fields to model language change; it was just massively reductionist and couldn't explain anything at all for all the mathematical rigour.
I go to my University's language lunch which has lots of this and scare the pants off grad students by saying "this is all very well but does this work for Japanese or Old Irish or any other language?" This usually makes their faces go white because naturally English is the ONLY language that matters and is therefore "universal".
So physicists have reinvented battleship curves. Congratulations! We couldn't have done it a century ago without you!
It is the alternate spelling of "Chi", a concept in Daoist philosophy that represents the primal energy of the universe.
As in "tai chi". As in "qi gong". It is also sometimes spelled "ki".
The ancient Chinese must have played a lot of Scrabble
The Scrabble word that bothers me is "aa". I mean seriously. Who even wants to play with you any more? It's not fun when you start bringing out the scrabble dictionary. I thought we said no 2-letter words, anyway. And no, I'm not being a baby.
You are welcome on my lawn.
There has been mathematical studies on how long irregular verbs might survive in the English language for a long time. I remember seeing the first such article a while back.
Basically the more used a verb- the longer it will take us to be liberated from its influence. Some like the verb "to be" are so enconsced in our language that they may take many many generations to eliminate.
Of course- this ignores any political movement to eliminate them- as countries become closer- if English remains the language of democracy- there may be a push to make English more standard. A new English without all the rule contradictions it currently has would be double-plus good.
"That's the way to do it" - Punch
I'm sure Americans will have created 8000 of those new words each year. Not content with the ones we British gave them, they wanted their own.
Jonathanjk.com
Is that it's pinning my bullshitometer against the max stop.
Did you know 80 to 90% of the moderators on slashdot wouldn't recognize a troll even if one dragged them under a bridge.
I'm old enough to remember all the British television and seen American language replace British/Australian language. In 2011, we saw mum and bikkie replaced by mom and cookie. Before that nappy, dummy, backside/arse were removed from the national vernacular. More generally biker and trucker have replaced bikie and truckie and so forth with similar '-ie' words. This year bum-pack was replaced by fanny-pack. I haven't heard anyone use the British version of fanny (last used in 'Billy Elliot') except to laugh at loud Americans who say 'fanny' because they don't know the word has a sexual meaning in this country.
And of course, we have help-desks in India saying in a thick accent 'no worries' to sound Australian, which is also cultural imperialism.
Had you clicked the the link to the PDF provided in the summary, you'd have stumbled onto their paper -- as in "the thing we're discussing here" -- where they mention Spanish and Hebrew were also studied.
Every end has half a stick.
It's not "Physicists Discover Evolutionary Laws of Language"
It's "Physicists Propose a Theory of Language Evolution"
There's no discovery going on here.
They didn't find it hiding under a rock.
Physicists claimed the evolution of language was based on some characterization of words of vocalization pattern and energy usage, the idea being that languages which afford more efficient energy requirements to the speaker tend to survive by natural selection process, just as animals in any environment evolve physical characteristics that are specifically adapted to efficient energy usage in that environment.
My wife is a linguist and much of the summary sounds like stuff she learned in her classes. The only major thing that sounds new is that he has put a large portion of Google's scans through a computational linguistics algorithm to put hard numbers to what they already believe. I know a lot of Computational Linguists come from other fields out side of traditional linguistics, but if this guy has become a computational linguists I would think it would be more appropriate to label him as one instead of what he has his phd in.
The authors of the study have defined a "word" as being similar strings of characters. This means each of those 27 spelling variants of Sioux provided by William Clark was considered a separate word. So, essentially, they're dealing with the birth and death of typos. This makes the 1,000,000 words claim extremely dubious. If each spelling variant is a word, then there has to way more than 1,000,000 words.
Google Books is notoriously inaccurate, especially with dates. I don't know if it's enough to throw their data off, but I wonder if the researchers realize this.
Proverbs 21:19
http://books.google.com/ngrams/
Don't spend the whole day on it.
Published in Science, their paper gives the best-yet estimate of the true number of words in English—a million, far more than any dictionary has recorded (the 2002 Webster's Third New International Dictionary has 348,000) with more than half of the language considered 'dark matter' that has evaded standard dictionaries (PDF).
Umm, no. The phrase "true number of words in English" is sufficiently ill-defined to make the question meaningless. There are two ways people think about whether something is a "true word" in English, but more or less, you need to either rely on an authoritative reference to make that determination (which is not what's happening here), or you note it's existence by some level of usage in practice, and set a somewhat arbitrary bar for how often the word has been used (which is what's happening here.)
As per Zipf's law, etc, tweak that "bar" a little bit, and you'll get quite different results.
I'm a nature photographer.
Where over 90% of vertebrates have probably been discovered and cataloged, only a few percent of insects, worms etc. may have. A combination of statistics and data mining estimates about 7 million total species.
If well good part of the richness of a language is because there were isolated regions with no fluid communication with the others speaking the same language in the past, internet pushing a common culture is adopting a lot of words and concepts from other languages into any language by now. If that process could be controlled or directed (i.e. mass media, main internet sites, etc) could be used to push concepts and word meanings useful to improve a culture, like with this example. Not sure if we could do intelligent design over an existing language, but at least we could direct its evolution with a goal.
When taking "History of the English Language" last year as part of my graduate work, the professor I studied under was part of the Middle English Dictionary Project. It was interesting to speak with him on the life and death of words after the printing press, and I remember him giving a 30 to 50 year estimation for a word to cement itself or become rare. It doesn't really seem like this is anything new.
From TFA, the researchers were analyzing Google's corpus of primarily English texts. Anything they have to say about the development of language can thus only be said to hold true for English .
Different languages work differently, and are subject to different pressures of usage and culture and global politics. Somehow I doubt that Mori or Arabic or German are changing in quite the same ways or at quite the same rates as English.
TL;DR: "Universal", my shiny white honky ass.
"What in the name of Fats Waller is that?"
"A four-foot prune."
Something doesn't sound right about this assumption.
I've been crameniating over the origin of this word/phrase cited by the OP. It's amazing how many claims there are for this phrase. And while the phrase itself may have faded, the Bombardier "Ski-Doo" (from 1960) I would think derived it's name from it and will live on for quite awhile.
Also note the 1932 pingame Skidoo "23".
I agree with your main point, and agree that the modern Hebrew vocabulary is subject to diverse influences, including European languages.
That said, Hebrew (modern or otherwise) is not that hard to classify -- it is firmly in the Semitic language grouping, itself part of the Afroasiatic language family. Hebrew is a cousin to Arabic, and a cousin to ancient Egyptian, Touareg, Somali, and Amharic (Ethiopian).
Cheers,
"What in the name of Fats Waller is that?"
"A four-foot prune."
Speaking as a linguist (working on my Ph.D.) this is something of a tempest in a tea-pot. The most relevant use would be for glottochronology - a field that's largely been abandoned by anyone seriously working on historical linguistics because of the various problems involved with that approach, including what the authors of the paper find, that the rate of word loss is not constant over time. They have a better idea of the rate of word loss, which could help improve glottochronology, but the method has a lot of flaws regardless.
Also, the question they're asking - how do words change over time, in terms of coining, becoming current, and becoming obsolete - really isn't a question historical linguists are that concerned about. Historical linguists are much more interested in how the forms of words change over time (phonological change), or how their function changes over time (grammaticalization), whereas the coinage and loss of words isn't often so important, especially on the large scale statistical level. Furthermore, this type of model probably handles languages with phenomena like avoidance speech poorly, since that would change how and why words are kept or lost.
Their language sample is at heart a convenience sample - they happened to have access to lots of data in those three languages, and it is largely written data. Spanish and English are both related languages with very similar cultural contexts, while Hebrew is a strange choice in that is has an ancient history, but only quite recent revitalised usage. Whether most spoken interaction (which is what linguists tend to be more interested in) has even a tiny subset of the total number of words they are talking about is an open question and would be better tested against corpora with a large quantity of spoken data such as the British National Corpus or the International Corpus of English.
It's an interesting study, but if it hadn't been written by physicists I'm not sure if it would have ended up in Diachronica or the Journal of Historical Lingiustics, much less Science. Their "statistical rules" are interesting, but really not of any great use to wider linguistic inquiry. I think its import is really just exaggerated by the fact that science editors read Science and NOT most linguistics journals, and therefore they think it's really impressive.
Wow. A WSJ article on dictionaries, data mining and the birth and death of words... but somehow at this moment, this story is tagged:
SCIENCE
DARKMATTER
EVOLUTION
Worst. Tags. Ever!!!!
Since the 3 terms appear in the story, this smells like a deeply-ironic case of a data-mining algorithm story having epic amounts of keyword-detection fail. Maybe someone can scrounge up a physicist (or if ONLY slashdot knew where to get in touch with a few computer jocks) to fix their code.
Hah, well then... I won't give a Fuck-Fuck-McFuckity-Fuck from hereon in (or is that overdoing it?)!
You want to know how to help your kids? LEAVE THEM THE F*&K ALONE. --George Carlin
The problem with Qi is its about as "english language" as Shinjitai
English has the great ability of incorporating words from other languages into it's lexicon. I doubt there are many English words that are not borrowed from other languages. English itself is one such word.
Falcon
Should there be a Law?
"to think of all the Grammar Nazi, getting corrected all the time because the language has changed on them"
Well, even if words change, that doesn't mean grammar rules do. Chat/Catspeak is to point out both the ridiculousness and sometimes, blatant ignorance of those on the internet and the funneh way cats WOULD speak given a voice. Having bad grammar and disobeying what is seen as obvious (I ran into a Tea-Partier in a political discussion a few weeks ago that thought 'do'nt' was the way to go--he did it twice in the same comment, so he can't claim 'typo') doesn't help when it comes to wanting to be taken seriously. We're all allowed to make mistakes. I have, and I pride myself as being known as a good writer to circles of friends on writing-sites and what not. It can happen to anyone.
One flaw in your argument is that you can't expect other people, *especially* on the internet to take you 100% seriously when it comes to your mistakes in spelling/grammar. If an aspiring scientist depends on everyone else to correct them for not spelling things right or screwing up the table of elements at a constant, they aren't going to go far unless they recognize it as *their problem*, something they need to get on top of. An argument you want to make needs to be presented with care and attention to details when it comes to language. That's just the way it is. But I understand how it feels to not get the proper education or understanding when you're learning for the first time. I'd always thought I was crap at math and science, until I entered self-learning programs in college. Where I couldn't do Algebra I in high school, I passed Algebra II with flying colors. Maybe take up grammar-studies as a hobby? I dunno. :P
You want to know how to help your kids? LEAVE THEM THE F*&K ALONE. --George Carlin
What are these "vigilant copy editors" of which you speak?
Or wait is that a whole book of Natalie Portman in hot grits!?! Oh I am so confused!
It goes something like
English is a language that mugs other languages in a dark alley and then checks their pockets for loose grammar.
i doubt that anything more than a small fraction of "english" words have an Anglo-Saxon* origin.
(not sure if that would be correct Language Experts Please correct as needed)
Any person using FTFY or editing my postings agrees to a US$50.00 charge
Blame the WSJ for the tropes here about physicists and "culturomics". The lead author of the linked paper is an economist. The WSJ article also mingles information from other publications. On the other hand, Steven Pinker has (rather persuasively) argued for a physical model underlying the structure of language (and not just in English): http://stevenpinker.com/publications/stuff-thought
ah tink as loong as teh message ain't so convoluted as to keep it from being understood its fine.
The rules of grammer and speeling should only b used two keep things within a margion of understandably. everythting else is as foolish as disrespecting anyone not wearing a tie. The only exception should be when it's a technical description of something or a law, where precise definitions are needed.
If video games influenced behavior the Pac Man generation would be eating pills and running away from their problems.
Poorly worded title, I don't see any laws, theories, or other predictive content.. just some analysis.
I was crazy back when being crazy really meant something. (Charles Manson)
I have been crameniating over the "23 skidoo" term and the myriad of possible origins. Although dead in language today, it lives on in the Bombardier "Ski-Doo" snowmobile, that monicker being used since 1960 and not disappearing any time soon.
Also note the 1932 pin table (they weren't call pinball machines yet), Skidoo "23".
A year or so ago a contributor to the London Review of Books identified a golden age of swearing, until it was pointed out that the "apparent prevalence of the word fuck in the period before 1820, and its complete disappearance for more than a century thereafter, can be explained by the end of the use in printing of the ‘long s’, which modern optical character recognition sees as an ‘f’. All the apparent ‘fucking’ before then is actually just ‘sucking’"
This claim: "...the better your language skills, the less the chance your arguments will be misunderstood."
The error is an assumption that your audience will always have a level of education similar to your own. Sadly this isn't always the case and so, it can happen, and often does happen, that someone will be difficult to understand because they're speaking above their audience and have no idea that they're doing that.
Aside from that, I agree with you on what you had to say.
Also, I like puppies.
That stupid word always drived me crazy.
That one doesn't bother me. If fact, I think I like it - irregular verbs with fewer syllables are usually ok by me.
Here's one that I cannot stand: drug as past tense of drag. Drug is already a word, both noun and verb. Dragged is worth the extra syllable.
I hate hearing how "I drug that heavy thing (thang?) across the room."
English, Spanish, Hebrew -> universal?
Try doing that in CJK languages first, which are heavily coupled to Kanji characters.
Since this is 'a body of knowledge about words', how about logology? That would make a worker in the field a 'logologue'. This is a Greek/Latin hybrid: "logos" from the Greek meaning (approximately) "word".
DNA is a Turing machine. You, however, being dynamic and emergent, are not.
May I suggest "foo"? (although it might be considered a back-formation from the jargonic acronym 'FUBAR'.) *We* all use it a lot, don't we?
DNA is a Turing machine. You, however, being dynamic and emergent, are not.
In the words of Rutherford: All science is either physics or stamp collecting. This is stamp collecting. It's another case of applying formulas to numeric observations without a hint of the underlying social or cognitive processes. That does not advance linguistics.
And this is published in Science? You mean, the journal whose impact factor dwarfs those of the more dedicated linguistic journals? Ugly.
Really? Of course you mean Reaganomics. The only example of Reganomics that I know is this short disquisition on luxury purchases:
O reason not the need! Our basest beggars
Are in the poorest thing superfluous.
Allow not nature more than nature needs,
Man's life is as cheap as beast's. Thou art a lady:
If only to go warm were gorgeous,
Why, nature needs not what thou gorgeous wear'st,
Which scarcely keeps thee warm. But, for true need--
Spoken, of course, by her father, King Lear, in response to her claim that he doesn't need his bodyguards.
Words and Phrases and Sentences are only symbols to represent ideas and relations. The musics and contemplation and experience create new words as required. The strength of English is the connotations which if used right will amuse people without them knowing why.