CMU Web-Scraping Learns English, One Word At a Time
blee37 writes "Researchers at Carnegie Mellon have developed a web-scraping AI program that never dies. It runs continuously, extracting information from the web and using that information to learn more about the English language. The idea is for a never ending learner like this to one day be able to become conversant in the English language." It's not that the program couldn't stop running; the idea is that there's no fixed end-point. Rather, its progress in categorizing complex word relationships is the object of the research. See also CMU's "Read the Web" research project site.
What happens when it discovers lolcats?
I am the the Carnie Mellon reader, I have discovered with this article that I am robot.
Damia
I've always been amazed that until recently, most work on AI has been focused as a preconstructed system that fits data into pathways while having some variation in thought abilities to let it expand it's model slightly.
They'd write the rules for the system and try to include most of the work on it, and then let see how good it does, with limited learning capabilities and still based on the original model.
I'm glad a lot of research is finally gearing more towards the path of having a small initial program, then feeding it data and letting it grow into it's own intelligence.
If you give it the ability to learn, then it'll learn itself the rest, rather than giving it functions that let it pretend to learn while fitting into a model.
And i know there have been research into this in the past, but it didn't really take off till the last decade or so, and i'm glad it has.
True, or at least somewhat competent AI, here we come.
You never realize how much manually made unmanaged "linked" lists suck, till you have src.link.link.link.link...
Only as good as current machine learning algorithms.
So not very.
Why do I get the feeling that the bot's first words are going to be OMGWTFBBQ?
Don't thank God, thank a doctor!
There is simply no existing database to tell computers that "cups" are kinds of "dishware" and that "calculators" are types of "electronics." NELL could create a massive database like this, which would be extremely valuable to other AI researchers.
This is what they are trying to do, based on information they glean from the internet. It's already been done, with Cyc. The major difference seems to be that Cyc was built by hand, and cost a lot more. It will be interesting to see if this experiment results in a higher or lower quality database.
Also, I question their assertion that it would be extremely valuable to other AI researchers. Cyc has been around for a while now, and nothing really exciting has come of it. I'm not sure why this would be any different.
Qxe4
I think I see the problem with their code.
All they've done is reproduce the typical office worker. It just sits around and surfs the net all day, without coming back with an answer.
Serious? Seriousness is well above my pay grade.