Slashdot Mirror


Deriving Semantic Meaning From Google Results

prostoalex writes "New Scientist talks about Paul Vitanyi and Rudi Cilibrasi of the National Institute for Mathematics and Computer Science in Amsterdam and their work to extract meaning of words from Google's index. The pair demonstrates an unsupervised clustering algorithm, which 'distinguish between colours, numbers, different religions and Dutch painters based on the number of hits they return', according to New Scientist."

4 of 120 comments (clear)

  1. Compression is a stricter test for AI than Turing by Baldrson · · Score: 3, Informative
    From the linked academic abstract:
    Viewing this mapping as a data compressor, we connect to earlier work on Normalized Compression Distance.

    This is basically what I was referring to in my response to "Using The Web For Linguistic Research" when I said:

    There needs to be an annual prize for the highest compression ratio using random pages from the web as the corpus. This would probably do more for real advancement of artificial intelligence than the Turing competitions.
    followed by the explanation:
    Intelligence can be seen as the ability to take a sample of some space and generalize it to predict things about the space from which the sample was drawn. The smaller the sample and the more accurate the prediction, the greater the intelligence. This is also a short description of what a compression algorithm does.
    and
    Text Compression as a Test for Artificial Intelligence, 1999 AAAI Proceedings. Matt Mahoney shows that text prediction or compression is a stricter test for AI than the Turing test. (1 page poster, compressed Postscript).
  2. Re:wARTIME? by MoonFog · · Score: 3, Informative

    Well, when I was in the army, it was very strict that whatever was said over a network DIDN'T have an ambigous meaning. That's why the army language sounds kinda weird at times, because you are not supposed to misunderstand anything.

  3. On the bright side... by Anonymous Coward · · Score: 2, Informative

    They are developing an open source tool http://complearn.sourceforge.net/ that will hopefully integrate the algorithm they describe. Right now it's only supporting one of their previous algorithms. More about this in the above link and section 5 of the paper.

  4. No. by Dylan+Thomas · · Score: 2, Informative

    A slug is not conscious. Nothing without langauge is. Recommended reading: Dr. Daniel C. Dennett, Consciousness Explained and Darwin's Dangerous Idea. Richard Dawkins, The Extended Phenotype. Julian Jaynes, The Origin of Consciousness in the Breakdown of the Bicameral Mind.

    Those are all more commercial works, well within the grasp of even people who've done no work in the field. For more sholarly and technical references, check their bibliographies, especially in Dennett.

    --
    What he wants is more important that what I want. What he wants is also more important that what you want.