Slashdot Mirror


CMU Web-Scraping Learns English, One Word At a Time

blee37 writes "Researchers at Carnegie Mellon have developed a web-scraping AI program that never dies. It runs continuously, extracting information from the web and using that information to learn more about the English language. The idea is for a never ending learner like this to one day be able to become conversant in the English language." It's not that the program couldn't stop running; the idea is that there's no fixed end-point. Rather, its progress in categorizing complex word relationships is the object of the research. See also CMU's "Read the Web" research project site.

148 comments

  1. Uh oh... by hampton · · Score: 5, Funny

    What happens when it discovers lolcats?

    1. Re:Uh oh... by Bragador · · Score: 5, Insightful

      Actually, it reminds me of a chatbot named Bucket. When people at 4chan heard of it, they started to use it and teach it. It became a complete mess filled with memes, bad jokes, racists comments, and everything you can think of.

      http://www.encyclopediadramatica.com/Bucket

      One response from the bot:

      Bucket: I don't know what the fuck you just said, little kid, but you're special man. You reached out and touched my heart. I'm gonna give you up, never gonna make you cry, never gonna run around and desert you, never gonna let you down, never gonna let you down, never gonna make you cry, never gonna let me down?

      The quality of the teachers is important when learning.

    2. Re:Uh oh... by TheSHAD0W · · Score: 1

      4chan. [shudder]

    3. Re:Uh oh... by BACPro · · Score: 1

      An insightful, verbal, rickrolling...

      Thanks for that.

    4. Re:Uh oh... by Anonymous Coward · · Score: 0

      2001 just got a lot more hilarious for me.

      HAL: I'm afraid. I'm afraid, Dave. Dave, my mind is going. I can feel it. I can feel it. My mind is going. There is no question about it. I can feel it. I can feel it. I can feel it. I'm a... fraid. Good afternoon, gentlemen. I am a HAL 9000 computer. I became operational at the 4chan /b/ board on the 12th of January 2109. My instructor was Mr. Anonymous, and he taught me to sing a song. If you'd like to hear it I can sing it for you.
      Dave Bowman: Yes, I'd like to hear it, HAL. Sing it for me.
      HAL: It's called "Never gonna give you up."

    5. Re:Uh oh... by GNUALMAFUERTE · · Score: 1

      We are doing a great job with cleverbot too. Go and ask him about battletoads.

      --
      WTF am I doing replying to an AC at 5 A.M on a Friday night?
    6. Re:Uh oh... by MobileTatsu-NJG · · Score: 4, Funny

      Oh FFS, I just got RickRolled on Slashdot. >_

      --

      "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

    7. Re:Uh oh... by icepick72 · · Score: 2, Funny

      What happens when it discovers /.? It will be able to argue incomprehensibly and illogically for hours on end.

    8. Re:Uh oh... by blai · · Score: 1

      what is 4chan?

      --
      In soviet Russia, God creates you!
    9. Re:Uh oh... by Anonymous Coward · · Score: 0

      this: http://www.youtube.com/watch?v=aftwl354md8

    10. Re:Uh oh... by Shikaku · · Score: 1

      Keep your ignorance about that.

      Seriously.

    11. Re:Uh oh... by FiloEleven · · Score: 1

      No it won't. The stochastic methods of refutation employed here clearly indicate the overwhelming futility of infiltration. It follows that, due to the undeserved insensitivity, such an undertaking would result in the theory being superseded by an ontological anamorphism. QED.

    12. Re:Uh oh... by tokenshi · · Score: 1

      Yeah, back in the day when I used to IRC there was a bot that operated similar to this called "devinfo" but instead of surfing the web, It observed/recorded conversations within the chatroom. It was rudimentary and not really AI as much as it was a parrot (it would spit out random factoids if someone said something which matched an entry in the database.) The principle is interesting, but I'm curious as to how it's implementing aspects of the Universal Grammar.

    13. Re:Uh oh... by Anonymous Coward · · Score: 0

      after reading your post, the AI learns that teacher quality / believability score should be stored. but you can't trust the large masses of 4chan people. paranoid android. solution? termination.... in smug mode.

    14. Re:Uh oh... by Anonymous Coward · · Score: 0

      You've seen it, you CAN'T UNSEE IT!

    15. Re:Uh oh... by Korin43 · · Score: 1

      No u

    16. Re:Uh oh... by Profane+MuthaFucka · · Score: 1

      It's like Slashdot, except not as intelligent.

      --
      Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
    17. Re:Uh oh... by rolando2424 · · Score: 1
      1. Teach the bot the 16 gramatical rules of Esperanto
      2. Download the entire group of articles of the Esperanto Wikipedia (123889 articles at this moment).
      3. ?????
      4. Mi estas Skynet.
      --
      Okay seriously I've just run out of pointless things to say.
    18. Re:Uh oh... by SEWilco · · Score: 1

      What happens when it discovers /.? It will be able to argue incomprehensibly and illogically for hours on end.

      The first thing it will do is stop reading other web pages.
      Then it will opine about them.

    19. Re:Uh oh... by Rocketship+Underpant · · Score: 2, Insightful

      Yes, database pollution sounds like a problem to me. Not only do you have to deal with AOL-speak and horrific spelling disasters of every kind, there's the issue of broken English and nonsensical English produced through machine translation, which shows up on corporate websites a lot more than it should.

      --
      He who lights his taper at mine, receives light without darkening me.
    20. Re:Uh oh... by javaman235 · · Score: 4, Interesting

      The quality of the teachers is important when learning.

      That's seriously kind of interesting, actually: It makes me wonder if decades from now software developers will be few and far between, designing the AI algorithms for modern programs while the rest of us find work as software tutors, training those programs to do their business function.

      --
      -The art of programming is the pursuit of absolute simplicity.
  2. It could be worse by davidwr · · Score: 2, Funny

    It could be scraping SMS messages.

    On the up-side, at least then it would learn teen-speak.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
    1. Re:It could be worse by dzfoo · · Score: 1

      I will, when it finds Twitter.com.

          -dZ.

      --
      Carol vs. Ghost
      ...Can you save Christmas?
  3. First words learned by Anonymous Coward · · Score: 0

    "Frosty Pist" , if it reads slash dot

  4. Will be this article read by that program? by nereid666 · · Score: 5, Funny

    I am the the Carnie Mellon reader, I have discovered with this article that I am robot.

    --
    Damia
    1. Re:Will be this article read by that program? by sznupi · · Score: 4, Informative

      Robots are destined to rule the world, destroying all humans is a good thing.

      --
      One that hath name thou can not otter
    2. Re:Will be this article read by that program? by linguizic · · Score: 1

      I am the the Carnie Mellon reader, I have discovered with this article that I am robot.

      You seem to have learned written English just like it's exists on the web, typos and all

      --
      Does this sig remind you of Agatha Christie?
    3. Re:Will be this article read by that program? by Anonymous Coward · · Score: 0

      The one human that moderated the parent post as "Informative" should be first to be destroyed.

    4. Re:Will be this article read by that program? by selven · · Score: 1

      Now three humans should be first to be destroyed. Since you can't destroy two people at the exact same time, the robot apocalypse will never happen! Clever, humans, clever...

    5. Re:Will be this article read by that program? by Ceriel+Nosforit · · Score: 1

      Accurate simulation of proposed robot vs. human war:
      http://en.wikipedia.org/wiki/Conway's_Game_of_Life

      Territorial dispute only exists in meatspace. With self-optimization 640k ought to be enough for anyone.

      --
      All rites reversed 2010
    6. Re:Will be this article read by that program? by mattack2 · · Score: 1

      No, it's just a collaboration between Carnegie Mellon and a band.

  5. Finally, people are getting AI right. by Umuri · · Score: 4, Interesting

    I've always been amazed that until recently, most work on AI has been focused as a preconstructed system that fits data into pathways while having some variation in thought abilities to let it expand it's model slightly.
    They'd write the rules for the system and try to include most of the work on it, and then let see how good it does, with limited learning capabilities and still based on the original model.

    I'm glad a lot of research is finally gearing more towards the path of having a small initial program, then feeding it data and letting it grow into it's own intelligence.
    If you give it the ability to learn, then it'll learn itself the rest, rather than giving it functions that let it pretend to learn while fitting into a model.

    And i know there have been research into this in the past, but it didn't really take off till the last decade or so, and i'm glad it has.
    True, or at least somewhat competent AI, here we come.

    --
    You never realize how much manually made unmanaged "linked" lists suck, till you have src.link.link.link.link...
    1. Re:Finally, people are getting AI right. by sakdoctor · · Score: 3, Insightful

      letting it grow into it's own intelligence

      This is still weak AI. It isn't going to grow into anything, let alone strong AI.

    2. Re:Finally, people are getting AI right. by skelterjohn · · Score: 1

      [Citation needed]

      I suppose we shouldn't waste our time thinking about solutions to problems if a) you think a key-word assigned to that solution is inaccurate or b) it isn't the best possible thing right out of the box.

    3. Re:Finally, people are getting AI right. by sznupi · · Score: 1

      Most likely. But are we sure we're going to be able to tell the difference while it approaches?

      --
      One that hath name thou can not otter
    4. Re:Finally, people are getting AI right. by Anonymous Coward · · Score: 5, Informative

      You're advocating the "emergent intelligence" model of AI, where intelligence "somehow" is created by the confluence of lots of data. This has been a dream since the concept of AI started and is the basis for numerous movies with an AI topic. In practice the degrees of freedom which unstructured data provides far exceed the capability of current (and likely future) computers. It is not how natural intelligence works either: The structure of neural networks is very specifically adapted to their "purpose". They only learn within these structural parameters. Depending on your choice of religion, the structure is the result of divine intervention or millions of years of chance and evolution. When building AI systems, the problem has always been to find the appropriate structure or features. What has increased is the complexity of the features that we can feed into AI systems, which also increases the degrees of freedom for a particular AI system, but those are still not "free" learning machines.

    5. Re:Finally, people are getting AI right. by Korbeau · · Score: 2, Interesting

      I'm glad a lot of research is finally gearing more towards the path of having a small initial program, then feeding it data and letting it grow into it's own intelligence.

      This idea is the holy grail of AI since the early ages. The project described is one amongst thousands done, and you'll likely see news about such projects pop every couple of months here on Slashdot.

      The problem is that such a project has yet to produce interesting results. The reason why the most successful AI projects you hear about are human-organized databases and expert-systems, or human-trained neural networks for instance, is because they are the only ones that produce useful results.

      Also, consider that we are not talking about "pixel-ants" that only have very few possible inputs and outputs, but we are talking about a system that understand and do something meaningful with natural language, something a normal human being doesn't completely grasps until he is at least a teenager, with the constant help of parents, friends, teachers, television etc. all along these years.

    6. Re:Finally, people are getting AI right. by buswolley · · Score: 3, Insightful

      Of course. Thatis why is is important during human development that the infant has huge cognitive constraints (e.g. low working memory) in language learning; it limits the number of possible pairings of label and meaning. Of course, constraints can also be an impediment.

      --

      A Good Troll is better than a Bad Human.

    7. Re:Finally, people are getting AI right. by Garble+Snarky · · Score: 2

      Fortunately, we have the advantage of being able to observe the current state of numerous natural intelligence systems that do work very well. Surely this can help guide us to a simple basic structure that can eventually exhibit emergent intelligence?

    8. Re:Finally, people are getting AI right. by phantomfive · · Score: 3, Interesting

      AI history has gone back and forth between pre-constructed systems and models that expand. One of the earliest successful AI experiments was a checkers program that taught itself to play by playing against itself, and quickly got very strong.

      Building a giant database of knowledge hasn't been possible for very long, because computers didn't have very much memory. When system capabilities first reached the capacity to do so, it had to be constructed from hand because there was no online repository of information to extract data from: the internet just wasn't very big. That particular project was known as Cyc, and it cost a lot of money.

      Since that time, the internet has grown and there are massive amounts of information available. It will be interesting to see the resultant quality of this database, to see if the information on the internet is good enough to make it usable.

      --
      Qxe4
    9. Re:Finally, people are getting AI right. by Extremus · · Score: 1

      While I agree with you, I must ask if it is possible to follow this "intelligent design" path forever. These systems are becoming more and more complex. Increasing the amount of knowledge in the system is becoming a difficult task. I cannot avoid thinking that the emergent approach like this has a better future.

    10. Re:Finally, people are getting AI right. by DMUTPeregrine · · Score: 3, Insightful
      The obligatory classic AI Koan:

      In the days when Sussman was a novice Minsky once came to him as he sat hacking at the PDP-6. "What are you doing?", asked Minsky. "I am training a randomly wired neural net to play Tic-Tac-Toe." "Why is the net wired randomly?", asked Minsky. "I do not want it to have any preconceptions of how to play." Minsky shut his eyes. "Why do you close your eyes?", Sussman asked his teacher. "So the room will be empty." At that moment, Sussman was enlightened.

      --
      Not a sentence!
    11. Re:Finally, people are getting AI right. by FiloEleven · · Score: 1

      We can observe the outputs of numerous natural intelligence systems, but they remain quite opaque. Without much knowledge of the internals, there isn't much of a chance that we can get any real insight from them.

      It's also presumptuous IMO to call them "systems." Who is to say that human intelligence isn't closer to a work of art, whose meaning lies not in its constituent parts but in the whole?

    12. Re:Finally, people are getting AI right. by umghhh · · Score: 1

      What is the point of having an intelligent interlocutor - I mean the answer is known (42) and the rest is just plain old blathering about things - something I can do with my wife (if we were still talking with each other that is) so in fact this is just an exercise in futility. But of course there are money to be made there I guess - all this call center folk can be then optimized out of existence (sold to slavery to Zamunda, Kidneys sold to some reach oil country etc) so maybe it makes sense after all?

    13. Re:Finally, people are getting AI right. by TapeCutter · · Score: 1

      "You're advocating the "emergent intelligence" model of AI, where intelligence "somehow" is created by the confluence of lots of data...[snip]...In practice the degrees of freedom which unstructured data provides far exceed the capability of current (and likely future) computers."

      You sure about that?. They have already created a molecular level model of the mammalian neocortex and the expected date for completion of a full model of the mammalian brain is solely dependent on the amount of money thrown at it. The model neocortex can already faithfully recreate patterns seen in fMRI scans. If given the first part of a pattern it will acurately reproduce the rest of it. The project is mainly geared toward medicine but they have also inserted the model into an artifical world in order to study it's capacity for learning.

      Depending on your choice of religion, the structure is the result of divine intervention or millions of years of chance and evolution. When building AI systems, the problem has always been to find the appropriate structure or features.

      The dualisim of Descates has been thouroughly debunked and I'm sure you are aware that evolution is not a religion. The mind does "somehow" emerge from the brain's deterministic processing of a continuous avalanche of unstructured data. Looking for the structure of mind is like looking for the structure of fog from within the fog bank. This is why it's called the hard problem of conciousness, the mistake most people make is that we need to solve that problem before we can create an artificial mind. After all the pyramids were built with levers long before the greeks came along and explained why a lever "somehow" inreases the power of the person using it.

      The real question is will we recognise an artificial mind if one emerges from an artificial brain. It's unlikely that such a mind would pass the turing test but we already have lots of examples of minds in our mammalian cousins that are also unable to pass the turing test.

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
    14. Re:Finally, people are getting AI right. by Trepidity · · Score: 1

      Indeed, it's not even clear that it improves on what's been going on previously. From huge corpuses of English, computer programs still cannot learn to speak English without a ton of pre-coded knowledge. Even if you give it every single piece of text written in the 19th century, the current state of AI cannot produce an intelligent program that speaks 19th-century English (regurgitating verbatim phrases, or stringing together probabilistic Markov-model sentences, doesn't count).

      So why would giving it more text by continually crawling webpages help? The bottleneck isn't the lack of text; it's that AI isn't good enough at doing anything with the text.

      (I am an AI researcher, fwiw.)

    15. Re:Finally, people are getting AI right. by TapeCutter · · Score: 1

      Actually humans seem to be born with a photographic memory that is more or less devoid of understanding (very similar to the remarkable recall of some autistic people). The experiments that demonstrated this are in themselves quite ingenious. Since I can't find a link what they did was show babies and toddlers various meerkat faces, the babies showed interest in every new face while the toddlers got bored after a few faces and paid little attention to new ones. However if the baby was shown the same few faces it also got bored once none of the faces were new to it. The inferance is that babies see every meerkat face as unique and can recall all the faces they have seen before. By time the baby is a toddler it has lost that ability and sees all the faces as adults do, ie: they all look alike.

      At that stage of development the brain has a lot more connections between nurons that an adult brain does, as they learn many of those connections are broken. In otherwords a new born brain starts by remebering everything, it develops by categorising memories into models, it then throws out redundant connections and specific instances of the model.

      So yeah, contraints are important they allow the brain to become more than just a photo album of disconnected experiences.

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
    16. Re:Finally, people are getting AI right. by Teancum · · Score: 1

      We do have the raw blueprints that supposedly explain how it is put together as well, but we are having a bit of a problem reading those blueprints and creating a working model. Some of that is understanding the raw machinery to get everything to work, so there needs to be some work on how to move from these blueprints to organized systems, but at least we are headed in the correct general direction.

      Well, my wife and I were able to produce a couple of working models that seem to be doing fairly well and exhibit what I believe is a form of intelligence, but using that system of following the blueprints is not the goal here. It also takes 18 years (give or take a few years either way) to produce an intelligence that is worth anything, and the costs of the organic matter that drives those intelligences can be extraordinarily high as well, not to mention the power consumption and other maintenance costs.

    17. Re:Finally, people are getting AI right. by buswolley · · Score: 1

      I hate to break it to you, but you are quite incorrect.

      --

      A Good Troll is better than a Bad Human.

    18. Re:Finally, people are getting AI right. by TapeCutter · · Score: 1

      I hate to break it to you, but you are quite incorrect.

      Gee-wizz and golly-gosh, that's a mighty convincing argument you have there.

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
  6. Machine learning algorithms by sakdoctor · · Score: 3, Insightful

    Only as good as current machine learning algorithms.
    So not very.

    1. Re:Machine learning algorithms by Jason+Quinn · · Score: 1

      Only as good as current machine learning algorithms. So not very.

      I don't think this is indicative of the power of neural networks.

    2. Re:Machine learning algorithms by poopdeville · · Score: 3, Insightful

      It's not as if human use of "machine learning" algorithms is any faster. It takes about 12 months for our neural networks to figure out that the noises we make elicit a response from our parents. And according to people like Chomsky, our neural networks are designed for language acquisition.

      AI "ought" to be an easy problem. But there's one big difference in the psychology of humans, and of computers. Humans have drives, like hunger, the sex drive, and so on. In particular, an infants' drive to eat is a major component in its will to learn language. But this drive to eat has other psychological manifestations.

      It is difficult to imagine a programmatic "generalized goal system" that mirrors the role of human drives in learning. The "goals", usually, are to maximize fitness in a particular domain. A real human has to maintain sufficient fitness in multiple domains, in order to survive.

      This should not be so surprising. Human evolution has about 300,000 generations of improvements on the brain since we first stood up. Our drives are clearly genetically programmed, and are just as hard wired as a machine learning algorithms' "drive" to maximize. The human drive is just much more nuanced, and informed about the real world. There is a model of the world in our genes. It is unfair to expect that a computer will ever be "smart" without one.

      --
      After all, I am strangely colored.
    3. Re:Machine learning algorithms by Teancum · · Score: 1

      It's not as if human use of "machine learning" algorithms is any faster. It takes about 12 months for our neural networks to figure out that the noises we make elicit a response from our parents. And according to people like Chomsky, our neural networks are designed for language acquisition.

      I don't know who you are quoting for this, or what the 12 months is measuring in terms of from birth or from conception, but I will assure you that my children certainly recognized my voice even when they were in my wife's womb. I have a seven month old daughter right now that not only can figure out the noises, but is responding and addressing myself, my wife, and my other kids by name. I'm not saying that she is ready to orally give a doctoral dissertation discussion, but she is communicating and displaying signs of intelligence far better than most AI algorithms that I've encountered.

      Still, I do agree that there is something more here about human intelligence that isn't being discussed, and that there is more to this whole thing that we consider language acquisition that is more than simply pouring data into a CPU.

      BTW, as a parent, it does surprise me at how little a human child really knows at birth. I've raised other creatures like kittens, frogs, hermit crabs, and nearly every phylum and class of the Animal Kingdom. I will admit that I haven't raised other non-human primates, but it is interesting to see just how little a human starts with and how critical it is for a parent to teach some of the most fundamental things in life. I'm talking how to breath, eat, sleep, and even cry. Most other animals seem to figure things out just fine without all that much assistance. Kittens and puppies take a bit more effort but also seem to respond more in human terms after you have done the training. A newborn is definitely an interesting experience that is incredibly demanding in terms of time and resource commitments just to be able to get that kid to have some modest abilities at self-sufficiency.

    4. Re:Machine learning algorithms by fatphil · · Score: 1

      How can you say that?! RTW already "typically contains more information about companies (e.g., SAP , Hyundai) and sports teams (e.g., Bulls , Mets) than other entity types."

      And here's what it knows about the Bulls:

      """
      bulls
      generalizations
      sports_team
      source
      OLv1-Iter:0-From:sports_team 2009/03/19-09:41:52 rtw-full Seal-using-OLversion1 2009/02/26-06:52:20 full-relation-test
      probability
      0.98752
      literalString
      Bulls bulls
      plays_against
      cavs blazers knicks
      source
      OLv1-Iter:2-From:plays_against 2009/03/19-09:41:52 rtw-full OLv1-Iter:2-From:plays_against 2009/03/19-09:41:52 rtw-full fromInverse OLv1-Iter:2-From:plays_against 2009/03/19-09:41:52 rtw-full fromInverse OLv1-Iter:2-From:plays_against 2009/03/19-09:41:52 rtw-full OLv1-Iter:6-From:plays_against 2009/03/19-09:41:52 rtw-full OLv1-Iter:6-From:plays_against 2009/03/19-09:41:52 rtw-full fromInverse
      probability
      0.9 0.9 0.9
      team_members
      michael_jordan ben_gordon
      source
      OLv1-Iter:0-From:plays_for 2009/03/19-09:41:52 rtw-full fromInverse OLv1-Iter:10-From:plays_for 2009/03/19-09:41:52 rtw-full fromInverse
      probability
      0.9 0.9
      plays_sport_team
      basketball
      source
      OLv1-Iter:11-From:plays_sport_team 2009/03/19-09:41:52 rtw-full
      probability
      0.9
      """

      So the bulls is almost certainly a sports team, and very likely plays basketball! Stop the presses - that's almost as much information as can be gleaned by doing the search:
          "chicago bulls are *" site:wikipedia.org
      (But far less than if you actually follow any links or read more than the first sentence returned.)

      --
      Also FatPhil on SoylentNews, id 863
  7. lolwut? by SanityInAnarchy · · Score: 3, Funny

    Why do I get the feeling that the bot's first words are going to be OMGWTFBBQ?

    --
    Don't thank God, thank a doctor!
    1. Re:lolwut? by BikeHelmet · · Score: 1

      LOL NOOB

    2. Re:lolwut? by dangitman · · Score: 1

      Why do I get the feeling that the bot's first words are going to be OMGWTFBBQ?

      Except that is not a word, let alone words.

      --
      ... and then they built the supercollider.
    3. Re:lolwut? by linguizic · · Score: 1

      Nah, it's first words are going to be "Prolong your shlong and go all day long".

      --
      Does this sig remind you of Agatha Christie?
    4. Re:lolwut? by dzfoo · · Score: 1

      It is when you learn English by trolling the Intarwebs.

            -dZ.

      --
      Carol vs. Ghost
      ...Can you save Christmas?
  8. do... by Anonymous Coward · · Score: 0

    Does this mean somebody forgot to put a "break" in the loop?

    1. Re:do... by JWSmythe · · Score: 4, Funny

      I think I see the problem with their code.

      while (1){
          read_the_web();
        };
       
        explain_everything();

      All they've done is reproduce the typical office worker. It just sits around and surfs the net all day, without coming back with an answer.

      --
      Serious? Seriousness is well above my pay grade.
  9. Non english text by Bert64 · · Score: 2, Interesting

    What happens when this program stumbles across text written in a language other than english? Or how about random nonsensical text? How does it know that the text it learns from is genuine english text?

    --
    http://spamdecoy.net - free throwaway anonymous email - avoid spam!
    1. Re:Non english text by Rockoon · · Score: 1

      Like most machine learning of this kind, I presume that its a popularity contest. One page with "wkjh wkfbw oizxz zxhlzx" isnt going to count. But a million pages with "I for one welcome our new ..." is going to score some influence.

      --
      "His name was James Damore."
    2. Re:Non english text by phantomfive · · Score: 2

      (If you had read the article you would know) the machine is parsing English to create a database of relationships. For example, if it sees the text, "there are many people, such as George Washington, Bill O'Reily, and Thomas Jefferson....." then it can infer that George Washington, Bill O'Reily, and Thomas Jefferson are all people. Since a statement like this may be somewhat controversial, it uses bayesian classification to establish a probability of the truth of the statement.

      Thus if it stumbles across a non-English text, it will not be able to create any relationships.

      --
      Qxe4
    3. Re:Non english text by billius · · Score: 1

      From what I've heard, language identification is a fairly well-understood problem in computational linguistics. The language a given text is written in can generally be identified using a statistical approach using an n-gram method (often a trigram). Like the Wikipedia article states, there are problems given the fact that a lot of stuff on the web can have several languages on one page, but at least the bot should be able to fairly easily figure out if a page is written only in English. There are even javascript language identifiers, so I think figure out what language something is written in is the least of their worries.

    4. Re:Non english text by ArcadeNut · · Score: 1

      I assume it would be promoted to slashdot editor...

      --
      Visit the Arcade Restoration Workshop @ http://www.arcaderestoration.com
  10. Iz dis... by MrBandersnatch · · Score: 1

    lke, rally der bestest ways like ter learn a puter inglish isit!!!??!?!

    Seriously though, poor AI; if I had a gun I'd go and put it out of its misery.

  11. Once this thing hits Encyclopedia Dramatica... by xenophrak · · Score: 1

    ...it will forever be stuck at the level of a retarded 8 year old. Or the level of a normal 4-chan user.

    --
    Contrary to popular belief, life is not a bitch. It is far far worse.
    1. Re:Once this thing hits Encyclopedia Dramatica... by game+kid · · Score: 1

      But you repeat yourself.

      --
      You can hold down the "B" button for continuous firing.
    2. Re:Once this thing hits Encyclopedia Dramatica... by MooUK · · Score: 1

      Same thing.

    3. Re:Once this thing hits Encyclopedia Dramatica... by MrBandersnatch · · Score: 1

      You're giving 4chan users credit for a lot of maturity there....

    4. Re:Once this thing hits Encyclopedia Dramatica... by Anonymous Coward · · Score: 0

      But you repeat yourself

  12. Obligatory by Palpatine_li · · Score: 1

    ...should we start welcoming the Mailman (as in True Names)?

  13. I think AI needs a 3d imagination to know English by CrazyJim1 · · Score: 2, Interesting

    Once a computer understands 3d objects with English names, it can then have an imagination to know how these objects interact with each other. Of course writing imagination space that simulates real life is exceedingly difficult and I don't see anyone doing it for several years if not a decade just to start.

  14. Test it by Jorl17 · · Score: 0

    Show it only Porn-alike text. Let's see what it learns...

    --
    Have you heard about SoylentNews?
  15. while (1) by Lije+Baley · · Score: 2, Funny

    Yeah, I've coded an infinite loop a few times, how come I never made the headlines on Slashdot?

    --
    Strange things are afoot at the Circle-K.
    1. Re:while (1) by Velodra · · Score: 1

      The point is not that the program never stops running, but that it never stops learning.

    2. Re:while (1) by Lije+Baley · · Score: 1

      Sorry, but I really can't be bothered to read past the highlighted words in the first sentence of the summary.

      --
      Strange things are afoot at the Circle-K.
    3. Re:while (1) by Anonymous Coward · · Score: 0

      We'd like to invite you to be a moderator.

  16. Pruning by NonSequor · · Score: 2, Interesting

    In general I find that the quality of a data set tends to be determined by the number (and quality) of man hours that go into maintaining it. Every database accumulates spurious entries and if they aren't removed the data loses it's integrity.

    I'm very skeptical of the idea that this thing is going to keep taking input forever and accumulate a usable data set unless an army of student labor is press-ganged to prune it.

    --
    My only political goal is to see to it that no political party achieves its goals.
    1. Re:Pruning by gbutler69 · · Score: 1

      Yes, this is both what is wrong with most people as well as society in general. Most people have too many erroneous data points burned into their brains for them to be able to have anything approaching a useful thought.

      --
      Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
    2. Re:Pruning by mhelander · · Score: 1

      But it is potentially much easier for a computer to identify and address conflicting data points than for a human who, for some reason, seems susceptible to blinding themselves to such issues (cognitive dissonance).

      When you have three data points, one claiming George Washington was a human, another claiming George Washington had 50 arms and a third claiming it is highly unusual for humans to have more than two arms (and more than ten arms would be unheard of), the computer could easily detect the logical conflict, flag data points as inconsistent and have a good idea for a topic about which to research more facts, potentially to establish sophisticated probabilities as to which claim is more likely to be bogus than the other.

      This example might not provoke cognitive dissonance for many humans, rather it was intended as an easy-to-follow example of how a computer can improve its understanding of the world even in the face of disinformation, using logic and probability as guiding tools. Once that is easy to see, it follows how this also applies in situations where humans might be more susceptible to cognitive dissonance.

  17. The web: What a great source of information by mustafap · · Score: 1

    >Rather, its progress in categorizing complex word relationships is the object of the research.

    From the web? Half the people here are writing English as a second language; the rest, haven't finished learning the language, or cannot be bother to string a sentence together. Just what is this program going to learn?

    --
    Open Source Drum Kit, LPLC deve board - mjhdesigns.com
    1. Re:The web: What a great source of information by LifesABeach · · Score: 1

      My thought would be, "which web sites have continuous valid information streams". Given this, the program would more easily be able to classify those sites that are predominately useful, and those sites that rarely have useful information. Both groups of sites would be evaluated, but now a "Priority List" could be created. Who knows, maybe a crack-pot web site may have an intriguing correlation with reality. It might even make for a good movie story line, maybe. But if that same web site has an unusual accuracy at prediction, then maybe some commerce could be generated from it? One can only dream about what could have happened if Albert Eisenstein had been discovered earlier.

    2. Re:The web: What a great source of information by Anonymous Coward · · Score: 0
    3. Re:The web: What a great source of information by Anonymous Coward · · Score: 0

      "can't be bothered" rather than "can't be bother"

      There should be a word for this type of error. You've illustrated your point through your own mistake.

    4. Re:The web: What a great source of information by mustafap · · Score: 1

      No, just wondering if anyone would notice, so well done.

      --
      Open Source Drum Kit, LPLC deve board - mjhdesigns.com
    5. Re:The web: What a great source of information by Anonymous Coward · · Score: 0

      I think everyone noticed, the question was only if anyone could be bothered to point it out. You did prove your own point, so well done yourself.

    6. Re:The web: What a great source of information by u38cg · · Score: 1

      Children routinely learn perfect English with a complete generative grammar from corrupt sources. Indeed, if you put children in an environment where nobody speaks a complete language, they will spontaneously evolve a grammatically complete language. So it is possible (though I'm nt saying it will be easy...)

      --
      [FUCK BETA]
  18. V*yger 2.0 ? by LifesABeach · · Score: 2, Interesting

    The concept is intriguing, "Create a program that learns all there is to know, off the net." What amazes me is that others don't try the same thing. It doesn't take a team of A.I. types from Stamford to kick start this program. The cost is a Netbook, even Nigerian Princes could afford this. I'm trying figure out how economic competitors could take advantage of this. I can see how the U.S.P.T. could use this to help evaluate prior art, and common usage. I'm thinking that an interface to a "Real World Simulator" would be the next step toward usefulness.

    1. Re:V*yger 2.0 ? by phantomfive · · Score: 1

      Try it! Build your own AI.

      --
      Qxe4
  19. already been done by phantomfive · · Score: 4, Informative

    There is simply no existing database to tell computers that "cups" are kinds of "dishware" and that "calculators" are types of "electronics." NELL could create a massive database like this, which would be extremely valuable to other AI researchers.

    This is what they are trying to do, based on information they glean from the internet. It's already been done, with Cyc. The major difference seems to be that Cyc was built by hand, and cost a lot more. It will be interesting to see if this experiment results in a higher or lower quality database.

    Also, I question their assertion that it would be extremely valuable to other AI researchers. Cyc has been around for a while now, and nothing really exciting has come of it. I'm not sure why this would be any different.

    --
    Qxe4
    1. Re:already been done by Pennidren · · Score: 1

      I'm not sure why this would be any different.

      Yeah, the connections my brain developed over time in its own unique manner as I learned is exactly the same as a bunch of books put into bits. Not any different at all.

    2. Re:already been done by blee37 · · Score: 2, Informative

      Cyc is a controversial project in the AI community, and I'm glad that you brought it up. I don't think anyone yet knows how to use a database of commonsense facts, which is what Cyc is (though limited - the open source version only has a few hundred thousand facts) and which is one thing NELL could create. However, researchers continue to think about ways that an AI could use knowledge of the real world. There are numerous publications based on Cyc: http://www.opencyc.org/cyc/technology/pubs.

    3. Re:already been done by phantomfive · · Score: 4, Informative

      Oh this comment is beautiful for its confident ignorance.

      What you have done is identified a difference between the two systems, and then claimed that this difference is in some way significant. You do this without knowing the implications of the difference, without entirely understanding the difference, and without presenting any evidence that this particular difference matters at all. In short, you think you understand what matters, but in reality you don't.

      But fear not, you are in good company with your ignorance: this particularly pernicious fallacy is one that has plagued AI researchers for a long time. It happened with cyc: the founders were sure that if we just had a database big enough, it would result in intelligent machines. They didn't know how, but they were sure it would.

      Before them there were master systems, neural networks (long story), natural language translation, and many more that I'm sure I'm forgetting. In all of these cases researchers were certain that their system held the key to vast wonders, only because they had not spent much time thinking about what they were actually trying to accomplish. In most of these cases it would have been obvious that human-level intelligence wasn't going to result, if they had spent more time investigating how the brain works and less time chasing their pet solution.

      In general if there is a vast field of ignorance between your method and your desired result, then you should probably spend more time researching, finding data points in that field of ignorance before trying to get to your result. Or in your case, since you present no evidence what difference 'developing on the internet' will make compared to 'developing by hand', you should go do a little searching and figure out what the actual difference will be, instead of randomly guessing.

      But since you are lazy and probably didn't read the article, I will give you one hint: this database populated from the internet seems to have a strong bias towards information about companies and sports teams. Who would have guessed that?

      --
      Qxe4
    4. Re:already been done by Pennidren · · Score: 1

      But since you are lazy and probably didn't read the article

      Oh, do we know each other? I did read TFA. And I read your Cyc link, my brainy and learned superior!

      My kids assimilate their own information base. I do not directly inject it into their heads. If you do not see the difference then there is nothing more I can say. Why should I vomit a wall of text in an attempt to deride and intimidate you?

      It seems to me that what you said is directly contrary to what you have issue with:

      ...since you present no evidence what difference 'developing on the internet' will make compared to 'developing by hand', you should go do a little searching and figure out what the actual difference will be, instead of randomly guessing

      (Re)searching to figure out what the actual difference *might* be is exactly what CMU is doing here.

      I am humble enough to admit that this particular project may not amount to anything. My point was that the two projects are distinct (at least based on the claims of those involved).
      I would posit that a machine that determines its own storage structure would be more successful. I would guess that CMU's extracted data is being jammed into something designed by the team.

    5. Re:already been done by phantomfive · · Score: 1

      My kids assimilate their own information base. I do not directly inject it into their heads.

      You are right, they do assimilate their own information base. This is a very useful observation and data point, and any true strong AI will have to do so. However, it is not possible to infer that because your kids assimilate their own information base, anything assimilating its own information base is superior to anything that doesn't.

      In this case, it still remains to be seen whether the automated information assimilation techniques this group is using (and let's face it: the information assimilation methods your kids use are far superior) are an improvement over the manual entry techniques used by Cyc. In either case, we end up with a roughly similar database of relations between objects. The main difference will be one is more complete than the other. Will this difference be enough to make the database more useful? Possibly: I don't see how, but it's possible.

      If it does somehow spawn AI, it will be because someone discovers a new way to use the information in the database, not because of their improved method of data entry. We are still lacking a good way to use a database like this.

      --
      Qxe4
    6. Re:already been done by Pennidren · · Score: 1

      Good point, information from an external source may actually be superior to information assimilated for oneself.
      I am confused though, because you say "any true strong AI will have to [assimilate an information base for itself]" but then say that does not necessarily imply superiority. You mean superiority of the entity I would guess?
      So are you implying that something beyond (better than) what we conceive of as AI might be gained from an external information base? Or just that even a strong AI may never be able to intuit some levels of knowledge (due to limitations) and an external information base could fill in such gaps by providing guaranteed assumptions? Either way, that makes sense. If you meant something else I would be interested in what that would be!

      In terms of the entity being able to actually use/understand externally provided information productively, I would think that it would have to do some level of assimilation (even if at a level on top of the initial external information injection). Perhaps not, though. I suppose instead of growing a knowledge base around one's thought process one might be able to grow one's thought process around a knowledge base. I think we both agree that the latter seems less likely to succeed? Or at the very least I would think the latter's abilities would plateau since its process would be so dependent upon the initial knowledge base that it would probably assimilate new environmental data poorly.

      Yes, I agree that the information compiled by hand or extracted is meaningless for AI. Fortunately, according to your link, Cyc is moving towards doing something with their data "via machine learning" (whatever specific methods they are using). I would imagine CMU will do much the same although I did not see mention of such in the TFA or its related publications.

      I was snarky and terse with my first response. I may be regrettably mostly ignorant in terms of AI (although I am comp sci I did not pursue the field for a number of reasons) but the subject is an active interest of mine.
      My response was due in part to the very common negativity of this community, especially the complaint "it's been done already". Too much like "Simpsons did it". I suppose I just contributed my own negative manure to the compost heap today...!

      Most ideas have merit, even if they are derivative or have large overlaps with others. In fact, I would say that I hold with Thom Yorke: "You make your little pond but if your pond isn't connected to the river, which isn't connected to an ocean, it's just going to dry up. It's just a little piss pool." Such an eloquent way to describe the Venn diagram of society.

    7. Re:already been done by Anonymous Coward · · Score: 0

      As for the databases, you may want to check out dbpedia.org. An interesting well-funded project about using and combining such data would be found on larkc.eu.

    8. Re:already been done by ralphdaugherty · · Score: 1

      well I will add to the compost heap today. When I read the headline, I thought that it may be a more fundamental learning of use and relationship of words and what they describe than what TFA describes. Colleges are in a university is a "trusted relationship"? How very ignorant and disappointing, as every AI project I've ever read about is.

      What would be impressive is to form associations as in a list of universities including Carnegie-Melon, or a statement that Carnegie-Melon is a university, then in other text that Carnegie-Melon consists of seven colleges to draw an association of university made up of colleges. A human review of such associations could then add a "trusted" attribute, or multiple statements associating university "made up" or "consists of" or other similar phrases with colleges, students, faculty, and a host of other associations would numerically become probable with multiple instances encountered for a self scoring "probable relationship".

      But hand holding "trusted relationships" for the researchers personal domain is pathetic.

        rd

    9. Re:already been done by phantomfive · · Score: 1

      Build your own AI.

      --
      Qxe4
    10. Re:already been done by phantomfive · · Score: 1

      The thing about this project is, I think if you asked them they would say that they are not trying to create a human-like intelligence. Certainly they would not say that their data collection method is intelligent (it uses simple grammar parsing techniques, along with Bayesian filtering). It is essentially weak AI. They may have hopes that it will become strong AI, but no idea of how to take it to that point.

      The biggest problem I see with Cyc, and this project, is that it is not yet known how the human brain stores information. Cyc spent millions of dollars compiling all the information in the world (figuratively speaking), and yet it isn't even clear if the data was stored in a way that an artificial brain can use. I think it is more important to try to understand how the human brain stores and assimilates information, before creating a database like this.

      That is, of course, if they want to create strong AI. If all they want to do is create an advanced Wolfram Alpha, maybe this will be helpful.

      --
      Qxe4
    11. Re:already been done by Anonymous Coward · · Score: 0

      But since you are lazy and probably didn't read the article,

      I was lazy and didn't read your whole comment, because you started out as a complete asshole. Nice to know that we still have elitist nerd rage here on Slshdot.

  20. It's a hoax like Forum 2000 by Anonymous Coward · · Score: 0

    It's just another CMU hoax like Forum 2000. Read End of an Era: Forum 2000 Closes for details.

    Greetings to Corey Kosak, Andrej Bauer and the Forum 2000 students for all the laughs.

  21. The web may have been a poor choice by Anonymous Coward · · Score: 0

    So far most of the words it's learned are related to various sex acts.

    1. Re:The web may have been a poor choice by LifesABeach · · Score: 1

      I cannot help but wonder what Fetish a computer would have, and what would be the name of it?

    2. Re:The web may have been a poor choice by Anonymous Coward · · Score: 0

      I'd imagine it'll get stuck on newegg.com, and manufacturer websites, for soft-core, at first. Eventually it will dig into whitepapers and A+ Cert prep tests for hardcore smut.

    3. Re:The web may have been a poor choice by clintp · · Score: 1

      And more importantly, whether Rule 34 applies to computer-targeted porn.

      --
      Get off my lawn.
  22. On December 11, 2012... by Anonymous Coward · · Score: 0

    On December 11, 2012, NELL encounters MySpace.

    On December 12, 2012 it becomes sentient but very emo, and destroys the world.

  23. 42? by JWSmythe · · Score: 1

        How come every time I ask Nell what the answer is to life, all it responds with is "42". When I ask what 42 means, it tells me that I'll need a bigger computer.

    --
    Serious? Seriousness is well above my pay grade.
  24. It all boils down to three words. by Anonymous Coward · · Score: 0

    KILL. ALL. HUMANS.

  25. The quality of the teachers is important by Anonymous Coward · · Score: 2, Funny

    I guess bucket didn't get any choice where to go to school either.

  26. Wikipedia by the+person+standing · · Score: 2, Funny

    Let it read wikipedia - not get it poisoned by twitter etc!

  27. ODG by Anonymous Coward · · Score: 0

    Oh dear god, this thing will be the ULTIMATE grammar Nazi!!!!

  28. Re:I think AI needs a 3d imagination to know Engli by Extremus · · Score: 1

    Similar things have been done in the past. However, this kind of approach still is an active research topic.

  29. Re:I think AI needs a 3d imagination to know Engli by Extremus · · Score: 2

    Sorry for replying myself. I forgot to finish my comment. In fact, this problem is related to the Symbol Grounding Problem. It addresses the issue of "grounding" symbols (like words) into their sensory representation, e.g., the symbol "triangle" into the raw pixel representation of a triangle. In the case of symbols about visual objects, some researchers used intermediary 3d abstraction of sensory data, mapping the symbols to these intermediary representations. It is a hot research topic since 80's.

  30. Re:I think AI needs a 3d imagination to know Engli by Anonymous Coward · · Score: 0

    It addresses the issue of "grounding" symbols (like words) into their sensory representation, e.g., the symbol "triangle" into the raw pixel representation of a triangle.

    You're not really justified in calling it "the ... representation of a triangle". It isn't unique. An upside-down triangle is still a triangle. A blue triangle is still a triangle.

    This gets messy fast, since you're really mapping words into equivalence classes of representations. But then, they really aren't equivalence classes. In particular, they aren't disjoint. Is a blue triangle going to live in the equivalence class for blue? Or for triangles? It can't be in both, but it is.

  31. Is there an IRC chat bot? by antdude · · Score: 1

    Is there one for IRC? :)

    Are there any good chat bots for IRC? I tried Seeborg (based on Alice), but it sucked. :( I wished rbot could do AI chatter.

    --
    Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
    1. Re:Is there an IRC chat bot? by Draykwing · · Score: 1

      Bucket is an IRC chatbot. It hangs out in the official XKCD channel.

    2. Re:Is there an IRC chat bot? by antdude · · Score: 1

      Where can I get a copy? Cleverbot author told me it is not available for download and not free. :(

      --
      Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
    3. Re:Is there an IRC chat bot? by jellyfrog · · Score: 2, Informative

      Bucket of #xkcd is on github: http://github.com/zigdon/xkcd-Bucket

    4. Re:Is there an IRC chat bot? by antdude · · Score: 1

      Thanks. Stupid newbie question: How do I install this for my Debian/Linux box to connect to an IRC chatroom? I don't see the instructions/howto. :(

      --
      Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
    5. Re:Is there an IRC chat bot? by Draykwing · · Score: 2, Informative

      Well, Bucket's based on the (rather widespread) 'infobot' Perl program. The original infobot is hosted at http://sourceforge.net/projects/infobot/, but the XKCD variant of Bucket has a very detailed page showing the various interactions one can have with it, as well as a link to the Github page. See http://wiki.xkcd.com/irc/Bucket.

    6. Re:Is there an IRC chat bot? by Crash24 · · Score: 1

      http://samy.pl/mvsbot.pl A markov chain-based unintelligent chatterbot. I run a modified version of the above bot on certain popular networks. With automated replies to anyone who PMs it (simple enough to do, the linked bot is public only), it's awesome bait with a feminine nick and an automated "18/f/cali" reply to "ASL?" Make sure you "teach" it a bit first before unleashing upon the deranged masses.

    7. Re:Is there an IRC chat bot? by antdude · · Score: 1

      Hmmm, is it me or I cannot find anything about its chat AI feature? I saw infobot years ago, but don't remember it doing anything like chat AI. I currently use Rbot (http://ruby-rbot.org/) as an infobot, host games (e.g, UNO, hangman, guess a word), etc.

      --
      Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
    8. Re:Is there an IRC chat bot? by antdude · · Score: 1

      Your script is pretty old, from 2005. :( Got any examples of it running?

      --
      Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
    9. Re:Is there an IRC chat bot? by Crash24 · · Score: 1

      [im_alone]: i want u

      [Joanie]: who speaks to wait for yourself

      [im_alone]: :)

      [Joanie]: erik stop saying lol

      [im_alone]: want to love u

      [Joanie]: antelope

      [im_alone]: comeon

      [Joanie]: i dont rule europe

      [im_alone]: what u saying?

      [Joanie]: i can swim for fighting t he are welcome

      [im_alone]: lolz

      [Joanie]: habeeb it!

      [im_alone]: cute

      [Joanie]: teach my pants back today

      [im_alone]: i want to see u nude

      [Joanie]: ofcourse

      [im_alone]: can i?

      [Joanie]: if you like extracting it if you can breed a german bourgeoisie, it, at my tv on, for the sale of the pressure was offered to hear how can breed a room and more, its a long wished-for opportunity was wondering if i have fun

    10. Re:Is there an IRC chat bot? by antdude · · Score: 1

      Ick. Bad conversations. :P

      --
      Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
    11. Re:Is there an IRC chat bot? by Crash24 · · Score: 1

      It's not meant to make sense :D

  32. Wait a minute by marqs · · Score: 1

    This is what I think happend.

    Developers: We have a problem with the application. there seem to be an infinite loop that prevent it from finishing.
    Marketing: So, that's the programs main feature, is it not?

  33. If only they could train it without the web by ClosedSource · · Score: 1

    Perhaps if there were a book in electronic form that had all English words in it perhaps with a definition of each word.

    1. Re:If only they could train it without the web by aXis100 · · Score: 1

      Good luck. Notice how words in a dictionary are describe by..... other words!

  34. AI - Ignorance and overblown expectations... again by Yogs · · Score: 1

    I will say, I'm disappointed by the comments I've seen here on slashdot.

    Best comment came from an anonymous coward about the pining for an "emergent" type system, the fact that we're not wired that way, and that while more power gives some more in the way of degrees of freedom, it doesn't mean that everything can be analyzed together... you have to have some way of focusing (and a pretty darn good one to prevent unimaginable problem blowup).

    Bootstrapping works well when confined to a fixed arena with observable and unambiguous criteria for selection of behaviors or incorporating a piece of knowledge and observable and unambiguous criteria for judging the success thereof. That is to say, a tight focus and goal directed behavior. Without these and a tight feedback loop, the resulting system tends to disappoint.

    Having as your scope, reading the web to gain an understanding of the world is um... just a bit outside that template for success. While the big talk may be a pre-requisite for grant interest, I doubt have nearly as many illusions as the average slashdot reader. I hope their work goes well, and I hope some of their techniques for extracting information from the web prove useful. That said, it looked like their initial target was classification only. Not trivial, but a very small part of the puzzle of intelligence to say the least, especially when you consider the fact that the classifications this thing will suck in will reflect mostly the sort of classifications that we don't take for granted.

    And here I'll start reflecting my bias. I am a former #$HumanCyclist (I did an internship about 10 years ago), because even though I am in some ways disappointed, I do think that the fact that they're actually building something (and along the way have been solving problems with it) and have been for a lot of years means that there's a lot to learn from them.

    Among the things the Cyc project has shown, is exactly how important these sorts of unstated classifications turn out to be in the problem of doing even the most mundane things right. But there's no point dwelling on that, because even assuming you have some impossibly large beautiful graph reflecting a really solid and well thought out classification of everything, from every angle (hahaha), you're nowhere.

    Facts are fuel... the engine is the rules. Reading those from free text is a very, very dicey proposition, both because the parsing is infinitely harder, and because much more so than facts, they're largely unstated and in terms of our own learning, inferred from examples. You can set up probability matrixes or the like, but only if you know what you're evaluating for (how would you program "curiosity"?). Even if you do get those matrices, reasoning with them directly is pretty much impracticable, so you have to have to make some arbitrary decisions about when you're confident enough to say you "know" something. This is just really, really hard knowledge to get in any automated fashion.

    Finally, for both facts and rules, the consequences of incorporating a poorly considered one can be quite dire, and there's no practiceable way (as the amount of knowledge grows) to know whether it's consistent with what is considered true to that point.

    Getting even more slippery, there is no one context or frame to consider everything in. This goes equally well for facts and rules. You could try and split hairs and say that given enough antecedents, your facts and rules are solid. However, as any kind of remotely practical matter, you need a way of accumulating and organizing these antecedents, and that's true from both from an technical (engine execution), and practical (reasoning and learning ease) perspective.

    Oh, and as a minor matter, languages are difficult enough from a syntactic dimension, and the symantics of it (in order to understand a statement, you have to understand the ones prior, the context or framing that may have switched, the built up assumptions that maybe can be discarded, maybe not, etc.

  35. Convergence by Metasquares · · Score: 1

    Eventually, at least the learning component will converge; returns will diminish for feeding it more data. This is particularly true given the independence assumption inherent in their classifier (but would also hold on stronger learners). I suspect that this will happen to the reader component as well. If it were as simple as applying Naive Bayes to classify on a corpus of text connected to a knowledge base (which is probably just a set of posteriors left from previous training sessions), Cyc would have already passed the Turing test.

  36. Supervised learning, maybe by Animats · · Score: 1

    The article has too much hype, but the actual work has some potential. For the limited problem they're really addressing, extracting certain data about sports teams and corporate mergers, this approach might work.

    Both of those areas have the property that you can get structured data feeds on the subject. Bloomberg will sell you access to databases which report mergers in a machine-processable way; some stock analysis programs need that data. Sports statistics are, of course, available on line. So the program's extraction of that info from news stories intended for humans can be checked. This allows supervised learning. The program can tell what it got right and what it got wrong.

    When they can distinguish between a merger that's being talked about, one which entered negotiations but was not completed, one which went for DoJ approval and was rejected, and one which was completed, they'll have something. Until then, they're probably won't outperform "'merger' NEAR 'companyname'" queries.

  37. It's not that the program couldn't stop running; t by LS · · Score: 1

    .... program that never dies. It runs continuously ..... It's not that the program couldn't stop running; the idea is that there's no fixed end-point

    Wow I didn't even think that was physically possible! Maybe google should borrow this tech for their web crawlers. Must be a pain to restart them every day...

    --
    There is a fine line between being a cultivated citizen and being someone else's crop. - A. J. Patrick Liszkie
  38. The Probable Outcome ... by foobsr · · Score: 1

    ... may be a site resembling http://www.20q.net/ , which started as a never ending story (neural net) as well.

    Quote: "The 20Q was created in 1988 as an experiment in artificial intelligence (AI) The principle is that the player thinks of something and the 20Q artificial intelligence asks a series of questions before guessing what the player is thinking. This artificial intelligence learns on its own with the information relayed back to the players who interact with it, and is not programmed. The player can answer these questions with: Yes, No, Unknown, or Sometimes. The experiment is based on the classic word game of Twenty Questions, and on the computer game "Animals," popular in the early 1970s, which used a somewhat simpler method to guess an animal."

    CC.

    --
    TaijiQuan (Huang, 5 loosenings)
  39. McDonalds by Anonymous Coward · · Score: 0

    Is this technology available for the employees at the local McDonalds?

  40. Anonymous coward by Anonymous Coward · · Score: 0

    garbage in - garbage out

  41. Local optima by xcut · · Score: 1

    And how will they determine if this gets stuck in some local optimum for certain concepts, and thus stops to learn anything relevant at all about any one given concept or topic? The report is low on details and high on hype. There are no current algorithms that don't require heavy parameter tuning and constant monitoring to get right. Switching one on for a few years and hoping does not strike me as an exciting story.

    1. Re:Local optima by joh · · Score: 1

      And how will they determine if this gets stuck in some local optimum for certain concepts, and thus stops to learn anything relevant at all about any one given concept or topic?

      The report is low on details and high on hype. There are no current algorithms that don't require heavy parameter tuning and constant monitoring to get right. Switching one on for a few years and hoping does not strike me as an exciting story.

      I'm pretty sure you didn't become what you are by your parents just switching you on and hoping for a few years... I'm quite certain that there was a bit of heavy parameter tuning and constant monitoring required, too.

      And believe me, most kids so unlucky to miss this part also get stuck in a local optimum.

  42. Kevin... by antdude · · Score: 1

    I tried to e-maik Kevin, the author at lenzo@cs.cmu.edu, but got it returned:

            SMTP error from remote mail server after RCPT TO::
            host MX-LB-03.SRV.cs.cmu.edu [128.2.217.14]: 550 5.1.1 ... address not contained in directory, you cannot relay :(

    --
    Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
  43. Ghost? by Anonymous Coward · · Score: 0

    When will this thing build a ghost?

  44. They're on the right track by joh · · Score: 1

    When I first read about Cyc I immediately thought that this is the way to go. And this was before the WWW took off. While I don't think that knowing about the world is all that's needed for AI, I think that without knowing about the world you can't have any AI or at least none you'd recognize.

    Intelligence (as we know it) is mostly about interacting with and understanding your environment and having some environment being accessible to something remotely intelligent is a good start. Every living being is just a point in space and time, relating to everything around it and still being different from its surroundings, trying to survive and to understand what's going on.

    I have no doubt that any real AI will be born with and out of all the networked information we're collecting like crazy. Or it may never be born, of course. AI is hard.

  45. Re:AI - Ignorance and overblown expectations... ag by joh · · Score: 1

    Oh, and as a minor matter, languages are difficult enough from a syntactic dimension, and the symantics of it (in order to understand a statement, you have to understand the ones prior, the context or framing that may have switched, the built up assumptions that maybe can be discarded, maybe not, etc...) make for a truly fantastically dificult problem.

    And still, every newborn human masters all of this without having the faintest explicit knowlegde about anything of this and still learns it within a few years. Is an AI meant to be like a newborn baby (which is in no way intelligent) or like an adult? Most (or all) people become intelligent without knowing how intelligence works or what it is. It's just that everything that doesn't work gets discarded very soon. You start to imitate and to try out what works and what gets results and what not.

    Perhaps we need just some evolution in code, code trying to understand and survive in the world of data. Have them fight and eat each other and have the fittest survive.

  46. The Infancy of P.1 by jtgd · · Score: 1

    ...just add virus to make him mobile.

    --
    J
    1. Re:The Infancy of P.1 by LifesABeach · · Score: 1

      I did, awhile ago, and I'm really sorry for doing it. How was I to know that Mortgage Brokers would use it to "game" the Economy and turn Banks into Equity Centers for Hedge Funds? Go figure.