Language Parsing and AI-Where are we now?
C-Town asks: "Browsing through the Slashdot articles to research Artificial Intelligence, the Artificial Intelligence IRC Bot contest came up and it brought me to think.. How far have we gotten in AI technology in terms of language/text parsing? How long will it be before Ask Jeeves will start working so well that it could replace all pattern matching and neural networking based searches? Instead of giving me categories when I submit a query, return to me abstracts (not hand entered but abstracts generated by the ai engine, that describes the article in context of the search query, when the match is made) of the documents on the web with their respective links? I know Autonomy currenlty have a product out to improve data mining and they route customer service emails with what they call "high performance pattern matching algorithms" (I think it's neural networking?) but they're still not able to analyze whole documents in a lexical manner. What companies/research institutes are currently working on this? Imagine being able to search through all /. comments with a question like "Who makes the best Linux Laptops" and get great results. I can imagine this to stop spam eventually too! "
Nothing against Lojban, but I'm side with the school of thought that says computers should be changed to better interact with humans, and not vice versa. Human language was established thousands of years ago, and electronic computing is circa 50 years old; it'll cause significantly less social upheaval to improve computer language recognition than it will to train the world to speak Lojban.
Plus, I kinda *like* the fact that language is not logical. A reflection of the beings who speak it, I guess...
I disagree on this point. I'm sure that if you were to go back to Hyde Park and asking people in Cummings or BSLC they would disagree about how much mental ability parsing takes. I think the reason that parsing seems easy and math problems seem to be difficult is because our brains have evolved to deal with parsing in an efficient manner.
For a given person not being able to divide a 3 digit number by a 2 digit number quickly isn't much of a handicap, but not being able to speak and understand speech quickly and easily is a big problem. In our past, being able to communicate without effort would be more advantageous if you were hunting something than being able to do complex math problems.
Given an appropriate setting and some help, I could, by following several psycholinguistics methods that I worked on at the University of Chicago and access to an online thesarus based database, get working mechanical translation going within six months. Problem is I don't do generative linguistics and all of Noam Chomsky's followers would rather not have a working system than see his theories disproven.I'm sure that others have tried using other methods to do translations/parsing but so far it hasn't been very successful. To tell the truth, using a generative grammer is probably the best method available in computer science. It has a solid theoretical/mathematical framework and its problems/benefits are relatively well understood. In any case, I don't believe that language semantics are well enough understood/adequately modeled in linguistics that a mechanical translation system would be possible right now (e.g. idioms and cultural references that cause problems for professional translators).
"When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
I think that you are underestimating the amount of time that we have had to evolve parsing. I see parsing as a specialized case of communication. Other animals are able to communicate and seem to have the ability to use language in a limited capacity (e.g. Koko, some apes). In more primitive forms, many mammals are able to communicate in some form. These communications are structured. So I would consider our language abilities as a specialization of these communication abilities. If you buy this, then we have had quite a while to evolve structures to deal with parsing in some form.
Precisely what I'm attempting to get at. A generative grammar tells you only all the possibilities (at best) but never any of the probabilities. A prototype grammar, one appearing in the early 80s called the sausage machine, would provide you with the main possibilities and all the probabilities. This would be much more useful in attempting mechanical translation.Using a stochastic context free grammer lets you assign probabilities to production rules in the grammar. More generally you can assign probabilities to production rules in a context sensitive grammer to get a sense a of how often a given production is used.
And you've shown yourself to have bought Chomsky's line hook line and sinker by arguing this point. Language is not a mathematical construct, it is a psychological one.Whether language is a mathematical or psychological construct does not really matter much to modeling it mathematically. The model may not work well but you can still do it. The advantages of using a generative are that it is mathematically well understood and has a solid interconnections with complexity theory and algorithmics. You can get good idea of how an algorithm based on this model will run and where it will have problems. Other models may not have these advantages.
In addition, using a computer to parse language means that ultimately you need to express the parsing in an algorithmic procedure. Regardless of whether language is a psychological construct or not, you will express it in terms of if/then or case statements. My assertion is that you can change these statements into a generative grammar with associated probabilities.
In any case, I disagree with your position that language is entirely psychological. I think that there are some aspects that are purely mathematical. For example, most people are to identify the syntactic elements in a nonsense sentence such as "The sdfjklds aaadjed fdfjdfj to fdjlfkdj." I would think a psychological model of language would have problems with this and the ability of sentences/phrases to be perfectly correct while having no mean or a contradictory meaning.
By the way, who moderated you up?????I get an automatic +1 bonus do to positive moderation in the past. If you get more than 20 something points of positive moderation, your posts receive a +1 bonus unless you explicitly prevent it.
"When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
Language processing really isn't that hard. Look at some of the people who do it. That may sound flippant but it obviously doesn't consume fast portions of normal people's brains to do it. Sure I slow down talking when dealing with complex driving problems but compare it to relatively basic mathematics which I can't do while driving (I mean more complex than long division.) Given an appropriate setting and some help, I could, by following several psycholinguistics methods that I worked on at the University of Chicago and access to an online thesarus based database, get working mechanical translation going within six months. Problem is I don't do generative linguistics and all of Noam Chomsky's followers would rather not have a working system than see his theories disproven.
So long and thanks for all the fish . . . !!!
One of the main problems with doing natural language processing is that linguistics is about as old as computer science. And while CS has had a pretty firm theoretical foundation since Day One (thank you, Mr. Turing), debate rages on to this day as to how language really works. So not only is the domain quite complex (ie. you need damn good programmers who have good knowledge of linguistics), no one even completely understands the domain.
That's not gonna' stop me from goin' into the field, though...
Jon
I'll do it for cheesy poofs.
1. Language is AI complete.
There's a common naive assumption that you can analyze a text in isolation and figure out what it means based on the definitions of words and the structures of language. But that forgets about the tremendous amount of knowledge that you bring to the text before you read it.
Forget Littleton, the real problem is 1984.
Think about what you have to know to understand that sentence. It's not just a simple matter of encoding more definitions, either.
The refrigerator slipped and he jerked his foot too late.
You know what happened, but only if you know something about gravity and where feet go, which isn't mentioned anywhere in the sentence... In fact, it's estimated that 3 year olds know some 50,000 facts about the way the world works physically. Naturally, language is designed to for such an environment. Without the complete understanding of intelligence as practiced by people, it's unlikely that you'll have much success writing code to understand human language.
2. There are different kinds of knowledge.
When you talk about knowledge, you often mean "facts." But there are other kinds of knowledge not so easily described. Playing music, for example, involves kinds of knowledge that skilled artists may not be able to describe, although it's obvious that they have it.
3. There are different kinds of understanding.
When you ask about the state of language understanding, realize that there are different kinds of understanding. Answering questions about characters in a story may be a very different task from deciding the grade level of the writer, for example. You could probably sort advertisements in any language, so your understanding is based on other cues.
4. Languages depend on domain.
It's also important to realize that progress will probably be made in limited domains. Typically, email uses only a few thousand words (~22,000 in my research) and covers a small range of topics. Success in an email sorting task that recognizes and discards spam can be said to represent some kind of understanding, though its considerably more limited than a human's reading and understanding of the text.