Slashdot Mirror


Language Parsing and AI-Where are we now?

C-Town asks: "Browsing through the Slashdot articles to research Artificial Intelligence, the Artificial Intelligence IRC Bot contest came up and it brought me to think.. How far have we gotten in AI technology in terms of language/text parsing? How long will it be before Ask Jeeves will start working so well that it could replace all pattern matching and neural networking based searches? Instead of giving me categories when I submit a query, return to me abstracts (not hand entered but abstracts generated by the ai engine, that describes the article in context of the search query, when the match is made) of the documents on the web with their respective links? I know Autonomy currenlty have a product out to improve data mining and they route customer service emails with what they call "high performance pattern matching algorithms" (I think it's neural networking?) but they're still not able to analyze whole documents in a lexical manner. What companies/research institutes are currently working on this? Imagine being able to search through all /. comments with a question like "Who makes the best Linux Laptops" and get great results. I can imagine this to stop spam eventually too! "

5 of 13 comments (clear)

  1. Re:Not really that hard by scheme · · Score: 2
    Language processing really isn't that hard. Look at some of the people who do it. That may sound flippant but it obviously doesn't consume fast portions of normal people's brains to do it. Sure I slow down talking when dealing with complex driving problems but compare it to relatively basic mathematics which I can't do while driving (I mean more complex than long division.)

    I disagree on this point. I'm sure that if you were to go back to Hyde Park and asking people in Cummings or BSLC they would disagree about how much mental ability parsing takes. I think the reason that parsing seems easy and math problems seem to be difficult is because our brains have evolved to deal with parsing in an efficient manner.

    For a given person not being able to divide a 3 digit number by a 2 digit number quickly isn't much of a handicap, but not being able to speak and understand speech quickly and easily is a big problem. In our past, being able to communicate without effort would be more advantageous if you were hunting something than being able to do complex math problems.

    Given an appropriate setting and some help, I could, by following several psycholinguistics methods that I worked on at the University of Chicago and access to an online thesarus based database, get working mechanical translation going within six months. Problem is I don't do generative linguistics and all of Noam Chomsky's followers would rather not have a working system than see his theories disproven.

    I'm sure that others have tried using other methods to do translations/parsing but so far it hasn't been very successful. To tell the truth, using a generative grammer is probably the best method available in computer science. It has a solid theoretical/mathematical framework and its problems/benefits are relatively well understood. In any case, I don't believe that language semantics are well enough understood/adequately modeled in linguistics that a mechanical translation system would be possible right now (e.g. idioms and cultural references that cause problems for professional translators).

    --
    "When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
  2. Re:Not really that hard by scheme · · Score: 2
    Now here, I absolutely disagree with you. Our brains simply have not had enough time to develop for efficient language parsing. At best, language parsing is an 'arch'. Some of the features of phonetics have had the time (VOT for example) but not language parsing in general.

    I think that you are underestimating the amount of time that we have had to evolve parsing. I see parsing as a specialized case of communication. Other animals are able to communicate and seem to have the ability to use language in a limited capacity (e.g. Koko, some apes). In more primitive forms, many mammals are able to communicate in some form. These communications are structured. So I would consider our language abilities as a specialization of these communication abilities. If you buy this, then we have had quite a while to evolve structures to deal with parsing in some form.

    Precisely what I'm attempting to get at. A generative grammar tells you only all the possibilities (at best) but never any of the probabilities. A prototype grammar, one appearing in the early 80s called the sausage machine, would provide you with the main possibilities and all the probabilities. This would be much more useful in attempting mechanical translation.

    Using a stochastic context free grammer lets you assign probabilities to production rules in the grammar. More generally you can assign probabilities to production rules in a context sensitive grammer to get a sense a of how often a given production is used.

    And you've shown yourself to have bought Chomsky's line hook line and sinker by arguing this point. Language is not a mathematical construct, it is a psychological one.

    Whether language is a mathematical or psychological construct does not really matter much to modeling it mathematically. The model may not work well but you can still do it. The advantages of using a generative are that it is mathematically well understood and has a solid interconnections with complexity theory and algorithmics. You can get good idea of how an algorithm based on this model will run and where it will have problems. Other models may not have these advantages.

    In addition, using a computer to parse language means that ultimately you need to express the parsing in an algorithmic procedure. Regardless of whether language is a psychological construct or not, you will express it in terms of if/then or case statements. My assertion is that you can change these statements into a generative grammar with associated probabilities.

    In any case, I disagree with your position that language is entirely psychological. I think that there are some aspects that are purely mathematical. For example, most people are to identify the syntactic elements in a nonsense sentence such as "The sdfjklds aaadjed fdfjdfj to fdjlfkdj." I would think a psychological model of language would have problems with this and the ability of sentences/phrases to be perfectly correct while having no mean or a contradictory meaning.

    By the way, who moderated you up?????

    I get an automatic +1 bonus do to positive moderation in the past. If you get more than 20 something points of positive moderation, your posts receive a +1 bonus unless you explicitly prevent it.

    --
    "When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
  3. Breakthrough is missing by lovebyte · · Score: 2
    During the 80's natural language understanding and automatic translation was very much en vogue. But since noone was able to make anything terribly useful, much of the research has stopped and is now at a standstill. What is really missing is a theoretical breakthrough. This is obviously unpredictable, but might happen with the current progress in genetics. The more we know about the human genome, the more we will understand the human brain. And from there, the world is your oister. So be patient!

    --

    I'll do it for cheesy poofs.

  4. Re:Depends on your language... by medicthree · · Score: 2

    Non-natural languages such as lojban are notoriously problematic. There are reasons why natural languages don't have only 850 words, and why they aren't comprised of musical notes (I'm not kidding, one constructed language was actually meant to be sung only, and its writing system was using musical notes), and why adjectives have negatives other than just the original adjective with an "un" in front of them. For reasons other than their obvious shortcomings stemming from small word-sets (reasons dealing with our innate language capabilities that I'm sure you don't want me to go into here), no constructed language has ever been--and probably none will ever be--even close to natural languages in terms of our brains' abilities to process them. Also, constructed languages are notoriously limited in terms of language change. Languages are, by their very nature, changing beings, and trying to lock them into a singular state is pure stupidity.
    I don't mean this as a flame or anything of the sort, it is just that I'm getting a bit sick of seeing people suggest that man-made languages replace (or even become lingua francas) natural languages so that everything would be "easier." It's just not even a remote possibility, and aside from hobbyists who enjoy learning Esperanto in their spare time (as quite a few of my colleagues do), there's just no useful application of them. Unfortunately the idea of a world-wide universal language that is man-made is pure fantasy. World-wide lingua francas based on natural languages is much more realistic, although I hope that day never comes--it would seem quite a boring world to me if everyone spoke the same language. But that's probably because my job would be a lot less interesting then.

  5. Re:Depends on your language... by medicthree · · Score: 2

    Just a quick note, you mention that language is not logical. It's true that some languages don't follow formal logic (e.g., it's perfectly okay to use double-negatives in French), but formal logic is different from "logical" in another sense. The sense I'm talking about is that it's logical in terms of being in perfect harmony with the equipment that's generating it as well as with the equipment that's receiving it. Some of the things that popular critics lambast (e.g., the redundancy and strangeness of the English spelling system) are perfectly logical and make perfect sense when you analyze them for what they are, and for their use by the systems that use them and receive them. I won't go into details here, but there is plenty of reason for redundancy and "strnageness" in spelling, etc. Steven Pinker's The Language Instinct is a great introduction to a lot of this stuff that I highly reccomend.