Slashdot Mirror


How Do You Visualize 100 GB of Google Text Data?

An anonymous reader writes "There is an amazing series of charts that visualizes trigrams and bigrams, portions of sentences that have been extracted from Google's web data set. The graphs highlight word associations and the frequency with which we use them on web pages. Chris Harrison from Carnegie Mellon University found, for example, that the word 'he' is often tied to 'argues,' while 'she' is found often with 'loves.' There are also word-relation charts that highlight words used in combination with their opposites, such as good and bad, peace and war, and PC and Mac." There are a lot of these things, and they're really interesting to browse through.

2 of 117 comments (clear)

  1. Kudos to Chris Harrison, though by Kupfernigk · · Score: 3, Insightful
    He does these really interesting data visualisations and publishes them for free - and what do people do?

    "Was this "anonymous reader" the guy who owns the blog?"

    "his files are hosted in *.pdf files. tried looking at them in a windows 7 and an ubuntu machine, both have the text with unreadable lines through them. why would you host graphics as pdf?" - mine don't.

    I am slowly recovering from flu. What's the justification for all you miserable bastards out there? This is genuinely interesting stuff presented in an accessible way, and is the sort of thing /. should be about (checks karma and mod points - yup, probably allowed to say that.)

    --
    From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."
  2. Re:/.ed by Anonymous Coward · · Score: 2, Insightful

    You're not missing anything - the images are unreadable even at 200% or more.

    Anyway, I don't get what they're illustrating. Word relations? So what.

    This is a "Digg" sort of submission ... back over to Fark for me.