Slashdot Mirror


How Do You Visualize 100 GB of Google Text Data?

An anonymous reader writes "There is an amazing series of charts that visualizes trigrams and bigrams, portions of sentences that have been extracted from Google's web data set. The graphs highlight word associations and the frequency with which we use them on web pages. Chris Harrison from Carnegie Mellon University found, for example, that the word 'he' is often tied to 'argues,' while 'she' is found often with 'loves.' There are also word-relation charts that highlight words used in combination with their opposites, such as good and bad, peace and war, and PC and Mac." There are a lot of these things, and they're really interesting to browse through.

3 of 117 comments (clear)

  1. This can be used to preload a "human-like" ai by presidenteloco · · Score: 4, Interesting

    With a semantic network which reflects how humans relate various concepts together, and what topics and relationships humans care about.

    Yes it will be biased and partial and rough, but it's a good start.

    More formal reasoning and association techniques, such as bayesian stuff, logic, etc will be also be needed for general AI, but for the
    knowledge base to be grounded in human concerns and human perceptions; that's a key to an ai we can relate to and which can
    relate to us.

    I imagine this kind of semantic network will be usable for google 2.0 "pre-emptive search" or "my virtual social planner and concierge".

    --

    Where are we going and why are we in a handbasket?
  2. Kudos to Chris Harrison, though by Kupfernigk · · Score: 3, Insightful
    He does these really interesting data visualisations and publishes them for free - and what do people do?

    "Was this "anonymous reader" the guy who owns the blog?"

    "his files are hosted in *.pdf files. tried looking at them in a windows 7 and an ubuntu machine, both have the text with unreadable lines through them. why would you host graphics as pdf?" - mine don't.

    I am slowly recovering from flu. What's the justification for all you miserable bastards out there? This is genuinely interesting stuff presented in an accessible way, and is the sort of thing /. should be about (checks karma and mod points - yup, probably allowed to say that.)

    --
    From scarped cliff or quarried stone she cries "A thousand types are gone, I care for nothing, no not one."
    1. Re:Kudos to Chris Harrison, though by FrankDrebin · · Score: 3, Funny

      'he' is often tied to 'argues,'

      I don't agree.

      --
      Anybody want a peanut?