Slashdot Mirror


How Do You Visualize 100 GB of Google Text Data?

An anonymous reader writes "There is an amazing series of charts that visualizes trigrams and bigrams, portions of sentences that have been extracted from Google's web data set. The graphs highlight word associations and the frequency with which we use them on web pages. Chris Harrison from Carnegie Mellon University found, for example, that the word 'he' is often tied to 'argues,' while 'she' is found often with 'loves.' There are also word-relation charts that highlight words used in combination with their opposites, such as good and bad, peace and war, and PC and Mac." There are a lot of these things, and they're really interesting to browse through.

1 of 117 comments (clear)

  1. This can be used to preload a "human-like" ai by presidenteloco · · Score: 4, Interesting

    With a semantic network which reflects how humans relate various concepts together, and what topics and relationships humans care about.

    Yes it will be biased and partial and rough, but it's a good start.

    More formal reasoning and association techniques, such as bayesian stuff, logic, etc will be also be needed for general AI, but for the
    knowledge base to be grounded in human concerns and human perceptions; that's a key to an ai we can relate to and which can
    relate to us.

    I imagine this kind of semantic network will be usable for google 2.0 "pre-emptive search" or "my virtual social planner and concierge".

    --

    Where are we going and why are we in a handbasket?