Slashdot Mirror


Google Books Makes a Word Cloud of Human History

An anonymous reader writes "From Ed Yong at the Not Exactly Rocket Science blog: 'Just as petrified fossils tell us about the evolution of life on earth, the words written in books narrate the history of humanity. The words tell a story, not just through the sentences they form, but in how often they occur. Uncovering those tales isn't easy — you'd need to convert books into a digital format so that their text can be analyzed and compared. And you'd need to do that for millions of books. Fortunately, that's exactly what Google have been doing since 2004.' Yong goes on to explain that the astounding record of human culture found in Google Books offers new research paths to social scientists, linguists, and humanities scholars. Some of the early findings (abstract), based on an analysis of 5 million books containing 500 billion words: English is still adding words at a breathtaking pace; grammar is evolving and often becoming more regular; we're forgetting our history more quickly; and celebrities are younger than they used to be. You can also play with the Google Books search tool yourself. For example, here's a neat comparison of how often the words Britannica and Wikipedia have appeared."

127 comments

  1. OCR errors by SputnikPanic · · Score: 5, Interesting

    AFAIK, Google Books doesn't do the sort of methodical OCR clean-up that Project Gutenberg does, so a lot of Google's digitized books have a a fair number of errors. It'd be funny to see what kind of blips this might creates in our extracted cultural history!

    1. Re:OCR errors by yincrash · · Score: 1

      Doesn't google books use reCaptcha? It's not perfect, but I'm sure it fixes a very significant portion of OCR errors

    2. Re:OCR errors by SputnikPanic · · Score: 2

      From Google's "about" page for their Books Ngram Viewer lab: "Why does the word 'Internet" occur before 1950?"

    3. Re:OCR errors by migla · · Score: 2

      A simpson quote where lenny as a kid talks about the netting in his shorts, the internet, and later says "I think I just logged onto the internet" comes to mind...

      --
      Some of my favourite people are from th US; Vonnegut, Chomsky, Bill Hicks.
    4. Re:OCR errors by mjperson · · Score: 1

      One of the sample plots in the article is a plot comparing the frequencies of George Washington, Thomas Jefferson, and Abraham Lincoln. If you look at the plot, you'll notice that Lincoln has a nice uptick in name usage about 10 years before he was born.

    5. Re:OCR errors by meloneg · · Score: 2

      If you follow links on that ngram (and play with the date ranges a bit), you find this query that seems to be showing a lot of those references to Abe were in the meta-data.
      A little more digging finds this little gem. Which appears to just be mis-dated. I suspect it was written in 1890 from looking very carefully at the copyright page.
      It also very possible that some of those references are to others people with the same name. Like this one and this one.

    6. Re:OCR errors by meloneg · · Score: 1

      Take a look at the books which get OCR'ed with "email". As near as I can tell, all of them before a certain point are supposed to be "small". Methinks Google should think about adding a bit of data-sensitivity to their OCR.

    7. Re:OCR errors by Anonymous Coward · · Score: 1

      I wonder if it had to do with President Lincoln's grandfather? He had the same name and was a captain in the revolution.

    8. Re:OCR errors by raddan · · Score: 3, Funny

      Maybe so, maybe so. All is know is that 1720 was a really bad year.

    9. Re:OCR errors by Motard · · Score: 2

      Yes, here's an amazingly precient book from 1920 101 Successful Businesses You can Start on the Internet

    10. Re:OCR errors by Anonymous Coward · · Score: 0

      Yeah, it does. And with 4chan using it now, I foresee many OCR errors being replaced with "faggot faggot nigger faggot".

  2. Case sensitive? by IWannaBeAnAC · · Score: 4, Informative

    Interesting that it is case sensitive. Searching for "britannica,wikipedia" in lowercase, produces, for today, close to zero for brittanica, and 0.00005% for wikipedia, which is not far off the result for Wikipedia (with capital).

    Putting these together, the case-insensitive comparison of brittanica and wikipedia has wikipedia already well ahead of brittanica, at around 0.00010% for britannica, vs 0.00013% for wikipedia.

    1. Re:Case sensitive? by biryokumaru · · Score: 1

      I don't know about the two encyclopediae, but so far my favorite is republic vs democracy.

      --
      When you're afraid to download music illegally in your own home, then the terrorists have won!
    2. Re:Case sensitive? by geegel · · Score: 1

      You should try republic vs tyranny. Some odd correlation there.

      --
      right...
    3. Re:Case sensitive? by Anonymous Coward · · Score: 1

      I read that as republic vs. tranny. The funny thing is, I probably wouldn't have clicked if I read it correctly.

    4. Re:Case sensitive? by nschubach · · Score: 1

      War is pretty straightforward...

      --
      Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
    5. Re:Case sensitive? by nschubach · · Score: 1

      Also, everything is a crisis nowadays!

      --
      Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
    6. Re:Case sensitive? by Daniel+Dvorkin · · Score: 2
      --
      The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
    7. Re:Case sensitive? by jc42 · · Score: 2

      We might also note that big peak in the incidence of "Britannica" in the early 1800s. But back then, it was still expected that educated people (at least in Europe) would study Latin, and "Britannica" is merely a Latin adjectival form of "Britannia", or "Britain", and the British Empire was rather active around the world at that time. So most of the uses of "Britannica" around then probably had nothing to do with the encyclopedia.

      I'd guess that you'd also find a fair number of occurrences of "Britannica" before 1768, the year that the encyclopedia was first published. But most of those would probably be lower case.

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    8. Re:Case sensitive? by Motard · · Score: 1

      Ok, I must not know something about the phrase war pigs

    9. Re:Case sensitive? by IWannaBeAnAC · · Score: 1

      This again exposes a problem of case sensitivity. Try Republic vs tyranny (capital R).

    10. Re:Case sensitive? by nschubach · · Score: 1

      If you click on the links at the bottom, some of them show multi-word combos like:
      "Before the war, pig-iron"
      "On board vessels of war pig-iron"

      --
      Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
  3. Slashdot circa 1885 by Anonymous Coward · · Score: 5, Funny
    http://ngrams.googlelabs.com/graph?content=slashdot&year_start=1800&year_end=2008&corpus=0&smoothing=3

    Sometime around 1885, the very first Anonymouse Cowarde briefly tried writing about Slashdot, but apparently died off before his comments could be modded up.

    1. Re:Slashdot circa 1885 by jfengel · · Score: 1

      That is very, very odd. It appears to be in 1899:

      http://ngrams.googlelabs.com/graph?content=slashdot&year_start=1800&year_end=1960&corpus=0&smoothing=0

      but a further search turns up zero results. If it were a OCR-o, it should at least show up.

      There is another hit, labeled 1963:

      http://books.google.com/books?id=x-O2AAAAIAAJ&q=%22slashdot%22&dq=%22slashdot%22&hl=en&ei=9bwLTaTADoet8Abf1qT7DQ&sa=X&oi=book_result&ct=result&resnum=1&ved=0CCQQ6AEwAA

      but it's a badly mis-dated issue of The Economist. Not sure why it's the only one.

    2. Re:Slashdot circa 1885 by meloneg · · Score: 1

      I was thinking that was the date of first publication for The Economist.

    3. Re:Slashdot circa 1885 by jfengel · · Score: 1

      The Economist dates to 1843.

      There may have been some format change that makes 1963 special, or it may be that their records start there. But I doubt that's the only mention of Slashdot on the Economist, so I suspect it's just that one issue that's misdated.

      (A search at The Economist turns up two hits, both from 1999, but from different issues. I'm surprised that there isn't something more recent than that, and I suspect their search is flaky. Neither one is the article that the Google search turned up, which must be after 1999 since it mentions Google pacing ads on other sites.)

    4. Re:Slashdot circa 1885 by jfmiller · · Score: 1

      There are some number of modern works that are for some reason cataloged at the turn of the last century. Try Internet for similar results.

      --
      Strive to make your client happy, not necessarly give them what they ask for
    5. Re:Slashdot circa 1885 by bazorg · · Score: 1

      I followed your link, replaced "slashdot" with "LOL" and realised that we are living in the happiest of times. The same search for "pwned" shows how writers must have been tremendously aggressive in the late 1800s.

    6. Re:Slashdot circa 1885 by Anonymous Coward · · Score: 0

      Also iphone seems to have been in style around 1940s previously: http://ngrams.googlelabs.com/graph?content=iphone&year_start=1880&year_end=2008&corpus=0&smoothing=3

  4. Probably only one answer by $RANDOMLUSER · · Score: 1

    So a word cloud of human history probably has WAR in in the center at 900 point font.

    --
    No folly is more costly than the folly of intolerant idealism. - Winston Churchill
    1. Re:Probably only one answer by biryokumaru · · Score: 1

      I was actually expecting "the," but it turns out they got "one." I guess they filtered out the articles.

      --
      When you're afraid to download music illegally in your own home, then the terrorists have won!
  5. Smoothing creates bias by AAWood · · Score: 2

    Note that in the linked Brittanica / Wikipedia chart, Britannica appears higher due to smoothing being set as it is. Set it to a lower value, which gives a less pretty, more accurate chart, and Wikipedia is much higher by the present day.

  6. Fuck's Great Comeback by Japong · · Score: 1
    http://ngrams.googlelabs.com/graph?content=fuck&year_start=1800&year_end=2008&corpus=0&smoothing=3

    Up until the 1820s, Fuck was apparently very much in vogue. Not until 1960s was this great word brought back into the lexicon of the common man.

    1. Re:Fuck's Great Comeback by migla · · Score: 1

      http://ngrams.googlelabs.com/graph?content=fuck&year_start=1800&year_end=2008&corpus=0&smoothing=3 [googlelabs.com]

      Up until the 1820s, Fuck was apparently very much in vogue. Not until 1960s was this great word brought back into the lexicon of the common man.

      Click on the time period from 1800 in the lower left and you'll see search results with some of the context. Oftentimes it seems to unfortunately be an OCR error (lambs and calves fucking milk). Maybe there was a font in use at the time with an f that resemble(d/s) an s...

      --
      Some of my favourite people are from th US; Vonnegut, Chomsky, Bill Hicks.
    2. Re:Fuck's Great Comeback by KublaiKhan · · Score: 1

      http://ngrams.googlelabs.com/graph?content=fuck&year_start=1500&year_end=2008&corpus=0&smoothing=0

      Here, take a broader look.

      People may complain about filthy language these days, but daaaaaamn! Our founding fathers must have had -filthy- mouths, and I'd -really- like to know what that spike in the late 1500s was about.

      --
      In Xanadu did Kubla Khan
      A stately pleasure dome decree
    3. Re:Fuck's Great Comeback by dillpick6 · · Score: 1

      http://ngrams.googlelabs.com/graph?content=yahoo&year_start=1800&year_end=2008&corpus=0&smoothing=2
      Yahoo was also very popular back in the early 1800's and fell our of favor just the same.

    4. Re:Fuck's Great Comeback by Anonymous Coward · · Score: 0

      Also it looks as if we're getting smarter. Love is ahead of money recently. Also, they tend to be reversely proportional to each other, if you look at it.

    5. Re:Fuck's Great Comeback by jestill · · Score: 1

      "But if your cock has received any hurt in his eye, then take a leaf or two of right ground-ivy, that is, fuch as is found in little tufts at the bottom of hedges; chew this in your mouth very well, and fuck out the juice, and fquirt it into his eye two or three times" The Sportsman's Dictionary by Henry James Pye It is the use of f for s ... which leads to some pretty funny text http://books.google.com/books?id=xpQXAAAAYAAJ&pg=PA82&dq=%22fuck%22&hl=en&ei=i70LTfDPAoeglAfGtLjUDA&sa=X&oi=book_result&ct=result&resnum=5&ved=0CEAQ6AEwBA#v=onepage&q=%22fuck%22&f=false --

      --
      "Asleep at the switch? I wasn't asleep, I was drunk!" -- Homer
    6. Re:Fuck's Great Comeback by jfengel · · Score: 3, Informative

      Most of the actual hits there appear to be OCR-os for the word "suck" and "such", often due to the use of medial "s" that resembles an "f". The word "such" appeared on a page which was badly speckled.

      Given that the word "suck" was often used in the expression "to give suck", many of those pages are quite hilarious ("she would not suffer the strange lamb to fuck"). I didn't see any actual "fucks" in the first few pages of hits.

      I know that the word was known. Shakespeare made a sly reference to it in Merry Wives of Windsor. But I suspect it wasn't often set down on paper, at least not in the kinds of books that got preserved.

    7. Re:Fuck's Great Comeback by Anonymous Coward · · Score: 0

      Specifically, it's because it's a misread of "such" or "suck".

    8. Re:Fuck's Great Comeback by AJWM · · Score: 1

      Maybe there was a font in use at the time with an f that resemble(d/s) an s...

      Exactly. Well, almost. Not so much a font, but a convention where an initial 's' (or all but the final 's') used a character that looked something like an 'f' and a little like an integral sign (or 'fign'). A lot of old documents use that. I have a 200-year old chemistry text (handed down from a great^n grandfather) which proclaims itself "A Complete Courfe in Chymiftry", except that the 'f' isn't quite.

      --
      -- Alastair
    9. Re:Fuck's Great Comeback by AndrewNeo · · Score: 1

      That's rather easy, click on the link at the bottom for 1500-1665. A lot of OCR errors, it looks like.

    10. Re:Fuck's Great Comeback by blueg3 · · Score: 1

      It's actually a medial s character, rather than an f. At some point the medial s was gotten rid of in favor of the final s.

    11. Re:Fuck's Great Comeback by PatPending · · Score: 1

      For fuck's sake, the letter "f" in this case is actually the letter "s". (I am not a linguist but it may be related to German.)

      --
      What one fool can do, another can. (Ancient Simian Proverb)
    12. Re:Fuck's Great Comeback by meloneg · · Score: 1
    13. Re:Fuck's Great Comeback by balbus000 · · Score: 1

      Yep, check out this.

    14. Re:Fuck's Great Comeback by jfengel · · Score: 2

      Which means, incidentally, that the trailing off of "fuck" at the beginning of the 19th century IS very interesting, for a different reason. It's watching the tail end of the use of the medial "s".

      That's the kind of data that would have been really hard to gather any other way, unless the OCR were to distinguish between medial "s" and regular "s" in its results. There IS a Unicode for medial S, but most OCR doesn't go there.

      So, we have a proxy for it: "suck" scanned as "fuck", which wouldn't otherwise appear very often. I should write a paper on it. Wouldn't "Use of 'Fuck' as a proxy for medial S" look great on my CV?

  7. Re:Academic conceit by KublaiKhan · · Score: 3, Insightful

    Tell me, how are you proposing to measure the words and thoughts of those who did not take the time to put them down in a form that later generations could refer to?

    Because if you have a time machine, I've got some business plans that could make us both filthy rich...

    --
    In Xanadu did Kubla Khan
    A stately pleasure dome decree
  8. wasting time by snookerhog · · Score: 1
    as if I was not already wasting enough time on /.

    now I spent almost an hour fooling around with this today

  9. Naughty Words by DIplomatic · · Score: 1

    I can't believe I'm almost 30 years old and the first thing I did was graph sex and f*ck. I guess some things never change...

  10. Lies, Damned Lies and Statistics by plankrwf · · Score: 1

    Hmmm... So Britannica still on top?

    But this link (is with smoothing=0) gives a different result:

    http://ngrams.googlelabs.com/graph?content=Britannica%2CWikipedia&year_start=1800&year_end=2008&corpus=0&smoothing=0

    Not that I know whether smoothing=0 is better or worse then smoothing=3

    Kind regards,

    Roel

  11. A bit sparse of an article by alcourt · · Score: 4, Interesting

    I wish they had gone in the article into more depth about grammar changes, rather than just word forms. For example, sentence ordering, comma usage, and some various other grammar items would be more intriguing. I found the burnt/burned the most interesting comparison because it showed an example of two competing versions of a word.

    Interesting idea, but as was stated in the article, there are definite limits to what this technique can study, and many are unconvinced of its value for more than highly limited problems.

    --
    "I may disagree with what you say, but I will defend unto the death your right to say it." -- Voltaire
  12. This is better than discovering new oil reserves by countertrolling · · Score: 1

    The richest data mine in the whole world... and probably bottomless..

    --
    For justice, we must go to Don Corleone
  13. Re:Academic conceit by Ephemeriis · · Score: 2

    Oh yeah, the only thing that ever matters is when a self-selected sample of writers puts words on paper. Nothing else matters.

    I don't know that anyone besides yourself actually made that claim...

    What is the percentage of humans who have lived? And what percentage of those humans got book deals

    If we're talking about human history here, not many published authors actually had to get book deals. Those are a fairly recent occurrence.

    and successfully negotiated the minefield to get not only published, but indexed by a 15-year-old company?

    Google is indexing everything they can get their hands on. It isn't like you have to pay an entrance fee or anything.

    Surely this is the sum of all human knowledge! How could it be otherwise? Oh, no, my anti-intellectualism is showing! How dare I question my betters?

    The fact of the matter is that the important stuff is usually what gets written down.

    Genealogies, religious texts, laws, business records, etc.

    And even if it's fiction, it's generally a good indicator of what people care to read about. Lots of sex and scandal and whatnot.

    Regardless of your opinion on the value of what gets written down... It isn't like we have a whole lot else to go by. We can't very well go back 1,000 years and just ask somebody what they think. We have to work with the records we have - be it written text, or the remains of a city, or statues, or whatever.

    --
    "Work is the curse of the drinking classes." -Oscar Wilde
  14. I call BS by daboochmeister · · Score: 1

    I mean, how good can they be if they don't even get THIS right?!

    On the other hand, they seem to have pegged this one!

    --
    "Ahh! I see you're in that indeterminate Schrodinger state where - oh, uh ... never mind." Dave Bucci
    1. Re:I call BS by fuo · · Score: 1

      I agree. Something is obviously wrong with it because the results for "ninja,pirate" and "vi,emacs" cannot possibly be correct.

    2. Re:I call BS by nickersonm · · Score: 1

      Perhaps if you used the correct bigrams instead of uncommon contractions of them.

  15. Email's Great Comeback too! by XxtraLarGe · · Score: 1

    Not until 1960s was this great word brought back into the lexicon of the common man.

    Oddly enough, email was a pretty popular word from up until the 1960's, peaking in popularity in the 1860's, but has made a comeback since the mid 1990's!

    --
    Taking guns away from the 99% gives the 1% 100% of the power.
    1. Re:Email's Great Comeback too! by santax · · Score: 1

      That I can explain with emaille - french-, that in Dutch and I believe in German also is written as email.

    2. Re:Email's Great Comeback too! by meloneg · · Score: 1

      Look at the books in question though. Mostly just a mis-OCR of "small".

  16. From TFA by Dunbal · · Score: 1

    Rather than expose the full texts to the public (and themselves to copyright infringement)

          But wait, I thought you were breaking the law just by scanning the books and creating unauthorized copies. Or is there a different law for corporations like Google?

    --
    Seven puppies were harmed during the making of this post.
    1. Re:From TFA by Anonymous Coward · · Score: 0

      There's something you probably never heard about called negotiating a copyright agreement.

    2. Re:From TFA by circletimessquare · · Score: 1

      it doesn't matter, it's retarded either way

      we can't actually READ these texts... drum roll please... that in most cases no one can get their hands on anyways, they are so obscure. because someone might lose money, theoretically, THAT THEY ALREADY AREN'T MAKING. however, if these texts were made freely available, there would be renewed interest in some of these obscure works and someone would definitely make ancillary revenues off of them

      google is providing free exposure for rights holders and grandchildren of authors (ON WHAT MORAL BASIS DO GRANDCHILDREN DESERVE ANYTHING IN THIS RETARDED COPYRIGHT SYSTEM) of obscure works, which will certainly result in new revenues. but no! we have to keep these musty volumes locked up because it is better to earn no money than have money "stolen" from you that doesn't exist, stolen as in FREE ADVERTISING

      it's greed so incredibly stupid, it hurts its own bottom line

      intellectual property law has to die. i know it is hard to get done, but intellectually property law is really a sick fucking joke

      --
      intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
  17. Re:Academic conceit by AJWM · · Score: 2

    History isn't what really happened, it's what got written down. Everything else is evanescent (well, except for what archaeologists can dig up and reconstruct, which isn't much and not necessarily accurate -- and it only counts if they write it down). Mind, I'd be more impressed if Google were also tracking the content of every hieroglyph and cuneiform tablet ever found.

    It will ever be thus, unless someone invents a time machine (or at least a time viewer).

    --
    -- Alastair
  18. Inverse correlations by entotre · · Score: 1

    This should be a great way to test euphemism treadmills for instance, try 'lunatic asylum' and 'psychiatric hospital'. Lunatic asylum makes a comeback in the early 2000's, i'm guessing because of history books.

    1. Re:Inverse correlations by BumbaCLot · · Score: 1

      My first search was between demon and epilepsy.

      http://ngrams.googlelabs.com/graph?content=demon%2Cepilepsy&year_start=1800&year_end=2008&corpus=0&smoothing=3

      The dip in the 30s was quite interesting to me.

  19. John Lennon by Known+Nutter · · Score: 1
    --
    Beware of the Leopard.
  20. Google VS Yahoo by BigDogCH · · Score: 1

    Yahoo had an early lead and blew it, but has made a comeback!
    Google Vs Yahoo

    "the" vs "of" is also exciting......I will be following this contest for the rest of my life.
    The vs Of

    Is another worth more common than "the"?

  21. tl;dr by PatPending · · Score: 2

    tl;dr

    --
    What one fool can do, another can. (Ancient Simian Proverb)
    1. Re:tl;dr by Anonymous Coward · · Score: 0

      Fuck off.

    2. Re:tl;dr by PatPending · · Score: 1

      I was jokingly referring to books in general. I mean, who reads 'em nowadays? Anyway to bad you don't have a sense of humor, A. Coward.

      --
      What one fool can do, another can. (Ancient Simian Proverb)
    3. Re:tl;dr by NorwayCurrency · · Score: 1
  22. The Cola Wars by Strange+Quark+Star · · Score: 1
    --
    There is no sig.
    1. Re:The Cola Wars by Anonymous Coward · · Score: 0

      Nope:
      http://ngrams.googlelabs.com/graph?content=Pepsi%2CCoke&year_start=1900&year_end=2008&corpus=0&smoothing=0
      Coke ftw

    2. Re:The Cola Wars by Anonymous Coward · · Score: 0

      nope its not.
      http://ngrams.googlelabs.com/graph?content=Pepsi Cola,Coca Cola&year_start=1900&year_end=2008&corpus=0&smoothing=0

  23. Where's Buffy! by martin-boundary · · Score: 2

    Oh oh, according to this graph, we're being overrun by vampires, and the slayers are dropping like flies :(

    1. Re:Where's Buffy! by PatPending · · Score: 1
      --
      What one fool can do, another can. (Ancient Simian Proverb)
    2. Re:Where's Buffy! by Anonymous Coward · · Score: 0

      Your graph ends in 2000. But even after 2003, google's data shows only a 30 % increase in slayers, which is much less than one would expect.

  24. Leadership by charlesj68 · · Score: 1

    References to forms of national leadership are interesting. A nice peak for the reign of the Virgin Queen, the appearance and growth of President in line with the upstart of those bloody colonies in North America, President finally tops King just about the time of the Great War, but King reasserts until the Second World War finally pushes President on top. Interestingly enough, King comes back and surpasses President just about the turn of the Millennium. http://ngrams.googlelabs.com/graph?content=King,President,Queen&year_start=1500&year_end=2008&corpus=0&smoothing=3

  25. This truly looks phenomenal by spads · · Score: 1

    My impression is that a search for "man" would not match "woman". (Ie. word boundaries are assumed.) True?

    --
    Bukowski said it. I believe it. That settles it.
  26. Easter Egg by AaxelB · · Score: 1

    Apparently someone at google labs is a fan of the whole "pirates prevent global warming" joke: http://ngrams.googlelabs.com/graph?content=pirates%2Cninjas&year_start=1800&year_end=2008&corpus=0&smoothing=3

    1. Re:Easter Egg by nschubach · · Score: 1
      --
      Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
  27. Ye Olde Cuss Words by Anonymous Coward · · Score: 0

    http://ngrams.googlelabs.com/graph?content=fuck,ass&year_start=1600&year_end=2008&corpus=0&smoothing=10

    What happened in the 1700s!

  28. New York Word Exchange by Comboman · · Score: 1

    Seeing the graphs of word popularity over time reminds me of that old Saturday Night Live skit with Phil Hartman giving word investing tips.

    --
    Support Right To Repair Legislation.
  29. Rickrolled easter egg by daboochmeister · · Score: 4, Funny
    --
    "Ahh! I see you're in that indeterminate Schrodinger state where - oh, uh ... never mind." Dave Bucci
    1. Re:Rickrolled easter egg by Anonymous Coward · · Score: 0

      Like this joke.. they are coming!!!!

      http://ngrams.googlelabs.com/graph?content=end+times&year_start=1800&year_end=2008&corpus=0&smoothing=0

  30. Wow, the reality distortion field... by Anonymous Coward · · Score: 0

    ...really seems to permeate time:
    http://ngrams.googlelabs.com/graph?content=iphone&year_start=1800&year_end=2008&corpus=0&smoothing=3

  31. Re:Academic conceit by meloneg · · Score: 1

    Mind, I'd be more impressed if Google were also tracking the content of every hieroglyph and cuneiform tablet ever found.

    It will ever be thus, unless someone invents a time machine (or at least a time viewer).

    I suspect they plan to...

  32. Re:Academic conceit by Anonymous Coward · · Score: 0

    History isn't what really happened, it's what got written down.

    "The writing of history is largely a process of diversion. Most historical accounts distract attention from their secret influences behind great events. The few histories that escape this restrictive process vanish into obscurity through obvious processes. Destruction of as many copies as possible, burying the too revealing accounts in ridicule, ignoring them in the centers of education, insuring that they are not quoted elsewhere" - Frank Herbert

  33. Made me change my search engine by Posting=!Working · · Score: 0
    --
    This sentence no verb.
  34. lol by Anonymous Coward · · Score: 0

    i guess people have been lulzing for quite some time

  35. Party Graphology by thethibs · · Score: 1

    It's said that liberals have issues and conservatives have principles. Plug "issue,principle" into it and see a really good picture of Western political change.

    --
    I'm a Programmer. That's one level above Software Engineer and one level below Engineer.
  36. man vs. God by aDSF762 · · Score: 0

    And the winner... ...man... also throw woman in there and look to simplified Chinese.

    --
    sense of security, like pockets jingling...
  37. Search for wikileaks... by Anonymous Coward · · Score: 0

    suddenly you'll see that no one has ever written about wikileaks! Check it out here

  38. Global Temperatures by kirkols · · Score: 1

    It appears that we have dodged a bullet: http://ngrams.googlelabs.com/graph?content=global+temperature&year_start=1800&year_end=2008&corpus=0&smoothing=3 We're on the other side of the hockey stick.

  39. Google Books vs. real corpora by CorpusProf · · Score: 4, Informative

    http://corpus.byu.edu/coha
    Corpus of Historical American English.

    -- 400 million words, 1810s-2000s.
    -- Allows for many types of searches that Google Books can't:
    * accurate frequency of words and phrases by decade and year
    * changes in word forms (via wildcard searches)
    * grammatical changes (because corpus is "tagged" for part of speech)
    * changes in meaning (via collocates; "nearby words")
    * show all words that are more common in one set of decades than others
    * integrate synonyms and customized word lists into queries
    * etc etc etc
    -- Funded by the National Endowment for the Humanities (NEH), 2009-2011.

    Take a look at the "Compare to Google/Archives" link off the first page.

    1. Re:Google Books vs. real corpora by SnowZero · · Score: 1

      Your corpus is clean and balanced -- Google's is 1200 times bigger.
      Your front-end is powerful but complicated -- Google's is simple and usable by regular people.
      Your front-end can handle the load of a few academics -- Google's can handle getting slashdotted, in the mainstream press, etc.

      I kind of see them as complementary. If you'd like, I could get you in contact with the folks who made the Google system; they'd probably be open to someone working to bring more structure to it, or just hosting fancier-but-smaller corpora such as COHA.

      Don't look at it as a threat, consider it an opportunity.

  40. Like a thermometer in the bathtub scenario... by Anonymous Coward · · Score: 0

    I can see the word BFD ticking up right about now.

  41. Instances of "God" by Anonymous Coward · · Score: 0

    Data goes back to 1500 BCE. Really reminds me how important the 'open source' concept is (literacy and printing methods in this case...)

    http://ngrams.googlelabs.com/graph?content=God

  42. "Britannica" in the 1500s: more than just a book by fantomas · · Score: 1

    The word "Britannica" doesn't just refer to Encyclopaedia Britannica. It means 'of Britain' (latin scholars can help me with the exact meaning but this is its general sense). So you'll get hits from before when the encyclopaedia existed, back to at least 1500 according to the search tool. And some hits from after the books started won't refer to them. It's a poor choice of comparison for a search.

  43. So it's just like Google search then? by donutello · · Score: 1

    Search for Slashdotte: 415 results. Go to page 9 of the results: Now there's only 89 results.

    --
    Mmmm.. Donuts
  44. Yikes! by BigSes · · Score: 1

    I think the cloud is halfway to capacity with this "summary" alone.

  45. Re:Academic conceit by TheoMurpse · · Score: 1

    Oh, no, my anti-intellectualism is showing! How dare I question my betters?

    I'm going to start appending that to the end of my posts in a futile, silly attempt to defend ridiculous, unfounded assertions I make. Oh, no, my anti-intellectualism is showing! How dare I question my betters?

  46. finally proof that porn is anti catholic by Anonymous Coward · · Score: 0

    http://ngrams.googlelabs.com/graph?content=catholics%2Cpornography&year_start=1800&year_end=2008&corpus=0&smoothing=3

  47. Important things by Anonymous Coward · · Score: 0

    The important stuff is very recent.

                air conditioning,electric power,telephone,vacuum tube,transistor,airplane

  48. Cardinal Directions by IorDMUX · · Score: 1

    Now here's an interesting one:

    http://ngrams.googlelabs.com/graph?content=North,South,East,West&year_start=1700&year_end=2008&corpus=0&smoothing=3

    The directions "North" and "South" were more than an order of magnitude more popular than "East" and "West" until ~1800, when they quickly caught up over the course of a decade or so. Perhaps this is due to the American revolution, but I noticed that lower-case versions of all four words didn't become popular until about the same time, as well.

    Interesting...

    --
    >> Standing on head makes smile of frown, but rest of face also upside down.
  49. Neat...but to put human history in perspective go by notaprguy · · Score: 1

    This site (http://share.seadragon.com/demos/ChronoZoom/firstgeneration.html) provides a graphical view and timeline of the history of the universe. To get a sense of the place human history has in the greater scheme of things, click on the 'Human History' link near the top of the page to zoom in. Before you flame me, this requires Silverlight but it's worth it. Besides, Silverlight is a quick install on Macs and PCs and you can always uninstall afterwards.

  50. Another easter egg by uwslothman · · Score: 1

    Put in "pirates" and "ninjas" and see what google automagically adds.

  51. Low user ID by Anonymous Coward · · Score: 0

    http://ngrams.googlelabs.com/graph?content=Slashdot&year_start=1885&year_end=1915&corpus=0&smoothing=0

    What user id does this guy have?

  52. Ahh irony and wikileaks by Anonymous Coward · · Score: 0

    No surprise to anyone, but run a comparison on "spin, secrets" you'll be surprised what the outcome is, and when it started... hmm look at that...

  53. American vs British by mkosmul · · Score: 1

    This comparison may (or may not) give interesting insights into the rise of influence of the US (or at least American vs British English).

  54. Comparing corpus is much more interesting by jago25_98 · · Score: 1

    You might compare English to German for example and have a look at what it looks like around the world wars.

    Careful on translations though. Few words are direct translations meaning exactly the same.

    Most arguments are based on people having different meanings assigned to words in their head and not realising actually.

    At the moment I have to do this by making the PNG transparent and overlaying. I'd love to know how to do it automatically. It's facinating to see different languages reacting differently to world events, like the 60's and wars, for example.

  55. Ninjas vs Pirates by valley · · Score: 1
  56. Bah, uncultered barbarians. by vegiVamp · · Score: 1

    "Britannica" can reference to other things than said encyclopaedia. This gives a different picture.

    --
    What a depressingly stupid machine.
  57. blue,red,green,yellow by Anonymous Coward · · Score: 0

    I don't know how to make sense of this:

    blue,red,green,yellow

    They seem to share a distinct pattern. What does that pattern reflect?
    In this case, and many others, I think it would be useful to have one search term as a base line.

  58. Re:Academic conceit by tkprit · · Score: 1

    Is this really TROLL?! Cause I was thinking the same thing. *ouch*

    The word cloud of published materials idea is neat, but trying to make it represent 'human history', instead of a subset of human history (like, 'published by...'') does seem a tad arrogant.