CMU Web-Scraping Learns English, One Word At a Time
blee37 writes "Researchers at Carnegie Mellon have developed a web-scraping AI program that never dies. It runs continuously, extracting information from the web and using that information to learn more about the English language. The idea is for a never ending learner like this to one day be able to become conversant in the English language." It's not that the program couldn't stop running; the idea is that there's no fixed end-point. Rather, its progress in categorizing complex word relationships is the object of the research. See also CMU's "Read the Web" research project site.
Robots are destined to rule the world, destroying all humans is a good thing.
One that hath name thou can not otter
You're advocating the "emergent intelligence" model of AI, where intelligence "somehow" is created by the confluence of lots of data. This has been a dream since the concept of AI started and is the basis for numerous movies with an AI topic. In practice the degrees of freedom which unstructured data provides far exceed the capability of current (and likely future) computers. It is not how natural intelligence works either: The structure of neural networks is very specifically adapted to their "purpose". They only learn within these structural parameters. Depending on your choice of religion, the structure is the result of divine intervention or millions of years of chance and evolution. When building AI systems, the problem has always been to find the appropriate structure or features. What has increased is the complexity of the features that we can feed into AI systems, which also increases the degrees of freedom for a particular AI system, but those are still not "free" learning machines.
There is simply no existing database to tell computers that "cups" are kinds of "dishware" and that "calculators" are types of "electronics." NELL could create a massive database like this, which would be extremely valuable to other AI researchers.
This is what they are trying to do, based on information they glean from the internet. It's already been done, with Cyc. The major difference seems to be that Cyc was built by hand, and cost a lot more. It will be interesting to see if this experiment results in a higher or lower quality database.
Also, I question their assertion that it would be extremely valuable to other AI researchers. Cyc has been around for a while now, and nothing really exciting has come of it. I'm not sure why this would be any different.
Qxe4
Bucket of #xkcd is on github: http://github.com/zigdon/xkcd-Bucket
Well, Bucket's based on the (rather widespread) 'infobot' Perl program. The original infobot is hosted at http://sourceforge.net/projects/infobot/, but the XKCD variant of Bucket has a very detailed page showing the various interactions one can have with it, as well as a link to the Github page. See http://wiki.xkcd.com/irc/Bucket.