CMU Web-Scraping Learns English, One Word At a Time

← Back to Stories (view on slashdot.org)

CMU Web-Scraping Learns English, One Word At a Time

Posted by timothy on Saturday January 16, 2010 @07:18AM from the hao-ubowt-hahmnimz dept.

blee37 writes "Researchers at Carnegie Mellon have developed a web-scraping AI program that never dies. It runs continuously, extracting information from the web and using that information to learn more about the English language. The idea is for a never ending learner like this to one day be able to become conversant in the English language." It's not that the program couldn't stop running; the idea is that there's no fixed end-point. Rather, its progress in categorizing complex word relationships is the object of the research. See also CMU's "Read the Web" research project site.

7 of 148 comments (clear)

Min score:

Reason:

Sort:

Machine learning algorithms by sakdoctor · 2010-01-16 07:26 · Score: 3, Insightful

Only as good as current machine learning algorithms.
So not very.
1. Re:Machine learning algorithms by poopdeville · 2010-01-16 10:14 · Score: 3, Insightful
  
  It's not as if human use of "machine learning" algorithms is any faster. It takes about 12 months for our neural networks to figure out that the noises we make elicit a response from our parents. And according to people like Chomsky, our neural networks are designed for language acquisition.
  AI "ought" to be an easy problem. But there's one big difference in the psychology of humans, and of computers. Humans have drives, like hunger, the sex drive, and so on. In particular, an infants' drive to eat is a major component in its will to learn language. But this drive to eat has other psychological manifestations.
  It is difficult to imagine a programmatic "generalized goal system" that mirrors the role of human drives in learning. The "goals", usually, are to maximize fitness in a particular domain. A real human has to maintain sufficient fitness in multiple domains, in order to survive.
  This should not be so surprising. Human evolution has about 300,000 generations of improvements on the brain since we first stood up. Our drives are clearly genetically programmed, and are just as hard wired as a machine learning algorithms' "drive" to maximize. The human drive is just much more nuanced, and informed about the real world. There is a model of the world in our genes. It is unfair to expect that a computer will ever be "smart" without one.
  
  --
  After all, I am strangely colored.
Re:Finally, people are getting AI right. by sakdoctor · 2010-01-16 07:31 · Score: 3, Insightful

letting it grow into it's own intelligence
This is still weak AI. It isn't going to grow into anything, let alone strong AI.
Re:Uh oh... by Bragador · 2010-01-16 07:36 · Score: 5, Insightful

Actually, it reminds me of a chatbot named Bucket. When people at 4chan heard of it, they started to use it and teach it. It became a complete mess filled with memes, bad jokes, racists comments, and everything you can think of.
http://www.encyclopediadramatica.com/Bucket
One response from the bot:

Bucket: I don't know what the fuck you just said, little kid, but you're special man. You reached out and touched my heart. I'm gonna give you up, never gonna make you cry, never gonna run around and desert you, never gonna let you down, never gonna let you down, never gonna make you cry, never gonna let me down?
The quality of the teachers is important when learning.
Re:Finally, people are getting AI right. by buswolley · 2010-01-16 08:20 · Score: 3, Insightful

Of course. Thatis why is is important during human development that the infant has huge cognitive constraints (e.g. low working memory) in language learning; it limits the number of possible pairings of label and meaning. Of course, constraints can also be an impediment.

--
A Good Troll is better than a Bad Human.
Re:Finally, people are getting AI right. by DMUTPeregrine · 2010-01-16 10:09 · Score: 3, Insightful

The obligatory classic AI Koan:

In the days when Sussman was a novice Minsky once came to him as he sat hacking at the PDP-6. "What are you doing?", asked Minsky. "I am training a randomly wired neural net to play Tic-Tac-Toe." "Why is the net wired randomly?", asked Minsky. "I do not want it to have any preconceptions of how to play." Minsky shut his eyes. "Why do you close your eyes?", Sussman asked his teacher. "So the room will be empty." At that moment, Sussman was enlightened.

--
Not a sentence!
Re:Uh oh... by Rocketship+Underpant · 2010-01-16 18:42 · Score: 2, Insightful

Yes, database pollution sounds like a problem to me. Not only do you have to deal with AOL-speak and horrific spelling disasters of every kind, there's the issue of broken English and nonsensical English produced through machine translation, which shows up on corporate websites a lot more than it should.

--
He who lights his taper at mine, receives light without darkening me.