Deriving Semantic Meaning From Google Results

← Back to Stories (view on slashdot.org)

Deriving Semantic Meaning From Google Results

Posted by ryuzaki0 on Saturday January 29, 2005 @09:35AM from the can-also-use-tea-leaves-if-google-not-available dept.

prostoalex writes "New Scientist talks about Paul Vitanyi and Rudi Cilibrasi of the National Institute for Mathematics and Computer Science in Amsterdam and their work to extract meaning of words from Google's index. The pair demonstrates an unsupervised clustering algorithm, which 'distinguish between colours, numbers, different religions and Dutch painters based on the number of hits they return', according to New Scientist."

2 of 120 comments (clear)

Min score:

Reason:

Sort:

Compression is a stricter test for AI than Turing by Baldrson · 2005-01-29 09:47 · Score: 3, Informative

From the linked academic abstract:
Viewing this mapping as a data compressor, we connect to earlier work on Normalized Compression Distance.

This is basically what I was referring to in my response to "Using The Web For Linguistic Research" when I said:
There needs to be an annual prize for the highest compression ratio using random pages from the web as the corpus. This would probably do more for real advancement of artificial intelligence than the Turing competitions.
followed by the explanation:
Intelligence can be seen as the ability to take a sample of some space and generalize it to predict things about the space from which the sample was drawn. The smaller the sample and the more accurate the prediction, the greater the intelligence. This is also a short description of what a compression algorithm does.
and
Text Compression as a Test for Artificial Intelligence, 1999 AAAI Proceedings. Matt Mahoney shows that text prediction or compression is a stricter test for AI than the Turing test. (1 page poster, compressed Postscript).

--
Seastead this.
Re:wARTIME? by MoonFog · 2005-01-29 09:59 · Score: 3, Informative

Well, when I was in the army, it was very strict that whatever was said over a network DIDN'T have an ambigous meaning. That's why the army language sounds kinda weird at times, because you are not supposed to misunderstand anything.