CMU Web-Scraping Learns English, One Word At a Time

← Back to Stories (view on slashdot.org)

CMU Web-Scraping Learns English, One Word At a Time

Posted by timothy on Saturday January 16, 2010 @07:18AM from the hao-ubowt-hahmnimz dept.

blee37 writes "Researchers at Carnegie Mellon have developed a web-scraping AI program that never dies. It runs continuously, extracting information from the web and using that information to learn more about the English language. The idea is for a never ending learner like this to one day be able to become conversant in the English language." It's not that the program couldn't stop running; the idea is that there's no fixed end-point. Rather, its progress in categorizing complex word relationships is the object of the research. See also CMU's "Read the Web" research project site.

8 of 148 comments (clear)

Min score:

Reason:

Sort:

Finally, people are getting AI right. by Umuri · 2010-01-16 07:26 · Score: 4, Interesting

I've always been amazed that until recently, most work on AI has been focused as a preconstructed system that fits data into pathways while having some variation in thought abilities to let it expand it's model slightly.
They'd write the rules for the system and try to include most of the work on it, and then let see how good it does, with limited learning capabilities and still based on the original model.
I'm glad a lot of research is finally gearing more towards the path of having a small initial program, then feeding it data and letting it grow into it's own intelligence.
If you give it the ability to learn, then it'll learn itself the rest, rather than giving it functions that let it pretend to learn while fitting into a model.
And i know there have been research into this in the past, but it didn't really take off till the last decade or so, and i'm glad it has.
True, or at least somewhat competent AI, here we come.

--
You never realize how much manually made unmanaged "linked" lists suck, till you have src.link.link.link.link...
1. Re:Finally, people are getting AI right. by Korbeau · 2010-01-16 08:11 · Score: 2, Interesting
  
  I'm glad a lot of research is finally gearing more towards the path of having a small initial program, then feeding it data and letting it grow into it's own intelligence.
  This idea is the holy grail of AI since the early ages. The project described is one amongst thousands done, and you'll likely see news about such projects pop every couple of months here on Slashdot.
  The problem is that such a project has yet to produce interesting results. The reason why the most successful AI projects you hear about are human-organized databases and expert-systems, or human-trained neural networks for instance, is because they are the only ones that produce useful results.
  Also, consider that we are not talking about "pixel-ants" that only have very few possible inputs and outputs, but we are talking about a system that understand and do something meaningful with natural language, something a normal human being doesn't completely grasps until he is at least a teenager, with the constant help of parents, friends, teachers, television etc. all along these years.
2. Re:Finally, people are getting AI right. by phantomfive · 2010-01-16 09:04 · Score: 3, Interesting
  
  AI history has gone back and forth between pre-constructed systems and models that expand. One of the earliest successful AI experiments was a checkers program that taught itself to play by playing against itself, and quickly got very strong.
  
  Building a giant database of knowledge hasn't been possible for very long, because computers didn't have very much memory. When system capabilities first reached the capacity to do so, it had to be constructed from hand because there was no online repository of information to extract data from: the internet just wasn't very big. That particular project was known as Cyc, and it cost a lot of money.
  
  Since that time, the internet has grown and there are massive amounts of information available. It will be interesting to see the resultant quality of this database, to see if the information on the internet is good enough to make it usable.
  
  --
  Qxe4
Non english text by Bert64 · 2010-01-16 07:29 · Score: 2, Interesting

What happens when this program stumbles across text written in a language other than english? Or how about random nonsensical text? How does it know that the text it learns from is genuine english text?

--
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
I think AI needs a 3d imagination to know English by CrazyJim1 · 2010-01-16 07:44 · Score: 2, Interesting

Once a computer understands 3d objects with English names, it can then have an imagination to know how these objects interact with each other. Of course writing imagination space that simulates real life is exceedingly difficult and I don't see anyone doing it for several years if not a decade just to start.

--
God spoke to me.
Pruning by NonSequor · 2010-01-16 07:46 · Score: 2, Interesting

In general I find that the quality of a data set tends to be determined by the number (and quality) of man hours that go into maintaining it. Every database accumulates spurious entries and if they aren't removed the data loses it's integrity.
I'm very skeptical of the idea that this thing is going to keep taking input forever and accumulate a usable data set unless an army of student labor is press-ganged to prune it.

--
My only political goal is to see to it that no political party achieves its goals.
V*yger 2.0 ? by LifesABeach · 2010-01-16 07:54 · Score: 2, Interesting

The concept is intriguing, "Create a program that learns all there is to know, off the net." What amazes me is that others don't try the same thing. It doesn't take a team of A.I. types from Stamford to kick start this program. The cost is a Netbook, even Nigerian Princes could afford this. I'm trying figure out how economic competitors could take advantage of this. I can see how the U.S.P.T. could use this to help evaluate prior art, and common usage. I'm thinking that an interface to a "Real World Simulator" would be the next step toward usefulness.
Re:Uh oh... by javaman235 · 2010-01-16 21:19 · Score: 4, Interesting

The quality of the teachers is important when learning.
That's seriously kind of interesting, actually: It makes me wonder if decades from now software developers will be few and far between, designing the AI algorithms for modern programs while the rest of us find work as software tutors, training those programs to do their business function.

--
-The art of programming is the pursuit of absolute simplicity.