Slashdot Mirror


User: CorpusProf

CorpusProf's activity in the archive.

Stories
0
Comments
1
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 1

  1. Google Books vs. real corpora on Google Books Makes a Word Cloud of Human History · · Score: 4, Informative

    http://corpus.byu.edu/coha
    Corpus of Historical American English.

    -- 400 million words, 1810s-2000s.
    -- Allows for many types of searches that Google Books can't:
    * accurate frequency of words and phrases by decade and year
    * changes in word forms (via wildcard searches)
    * grammatical changes (because corpus is "tagged" for part of speech)
    * changes in meaning (via collocates; "nearby words")
    * show all words that are more common in one set of decades than others
    * integrate synonyms and customized word lists into queries
    * etc etc etc
    -- Funded by the National Endowment for the Humanities (NEH), 2009-2011.

    Take a look at the "Compare to Google/Archives" link off the first page.