The State of Natural Language Programming
gManZboy writes "Brad Meyers (and co) of the Human Computer Interaction Institute at Carnegie Mellon have written an interesting paper about the state of natural language programming. They point out that well understood HCI principles aren't finding their way into relatively new languages like Java and C#."
Inevitably you end up with an artificially rigid language structure that sounds like something that nobody would EVER say. Perfectly easy to read, after all, who wouldn't understand what "ADD VAR1 TO VAR2 GIVING VARX", but who the hell would use the word "giving" in such a way. It's a nightmare to learn or write, at least for English-speaking people who would have to constantly fight years of learning to speak real English to make up for the fake english in the language.
If I have been able to see further than others, it is because I bought a pair of binoculars.
Is Macromedia's ColdFusion syntax. As it continues to become less tied to HTML it will be interesting to see where this goes.
But natural language requires more typing than say C syntax.
A EQUALS B
A = B
But does the thought process get speeded up. If so one needs to know how the gains and loss affect overall development.
I disagree with the article's assumption that interesting programming errors are due to people being unable to express themselves "naturally" in code. Rather, I find that almost all errors worthy of debugging come not understanding the problem domain correctly.
jeff
On that site, there's http://www.alice.org/whatIsAlice.htm which says
So, this is just like Visual Basic. I know that can't be true, or else Microsoft would be marketing VB as NLP. So what am I missing?
Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
Cobol, anyone?
...
Multiply x by y to get something or the other
An interesting read.
One of the big problems this approach will have to overcome (in my opinion) is that people generally tend to order their thoughts in a manner specific to their native language. A development environment that seems intuitive and easy to use to a native English speaker might be backwards or obtuse to a person who natively speaks another language. To clarify; I'm not speaking strictly of grammatical structure of language, but of a seemingly inherent difference in the way people learn things based on what language is used in the teaching. For this reason it has always seemed better to me for programmers to learn a new, common language (that of the higher-level compiler they are interested in) so that when they work with others, everyone is on the same page (similar to scientists and doctors using Latin nomenclature).
I'd imagine that a "natural language" system could be developed with different approaches based on the native tongue of the programmer, but I would think this would damage the benefits of commonality that other languages now enjoy.
That's about as far as I got. I guess he didn't really express his ideas in the same way that I wanted to think about them.
Which nicely illustrates the point that there's always a "semantic gap" associated with natural languages, which builds up because people have different ways of thinking. The semantic gap is even wider when one of the entities being communicated to happens to be a machine. There's a reason why traditional programming languages are precise and exact...it's so that the gap is reduced - the machine will do exactly what you tell it to do...even then we have a disconnect between what the programmer's thinking, and the code that he's writing.
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
Natural language isn't precise enough for serious programming. I personally wouldn't enjoy typing so much for no added benefit. It seems like this sort of thing only has value amongst people who are learning to programming. Why would a mainstream language like Java or C# cater to this bunch?
One thing that programming languages force upon you (the programmer) is the ability to get what you want using the least possible resources.
Natural language, while easier for beginners, would make for horribly inefficient code and would be undesirable for any sizeable application.
"Ask not what your country can do for you." --John F. Kennedy
IMO it's nothing more than a better way to introduce *newbies* into programming.
Would would any programming want to code in english? To me this:
myvar++
makes more sense than:
increase the variable myvar by one please
Do we really want people who can't understand something as simple as "myvar++" to be programming in the first place? Seems to me we NEED a barrier to entry. There're enough lousy programmers out there already.
It isn't that there aren't any languages that follow these principles coming out; lots of them are. It's just that the only languages that have become popular ignore these principles.
The fact is that people don't care what's academically sound, or what people have "proven" is the best way to do things. In fact, the things people do care about are directly contradictory with what's academically "best". It isn't some kind of head-slapping coincidence that the new popular languages ignore "natural programming". It's the market speaking, and it's saying "we don't want natural programming languages".
Well, I'm not sure if it's that nobody read the article, or if nobody actually understood it, but.
:-) (And no, I'm not using Englishy COBOL syntax.)
We've had a lot of posts about "OH NO! COBOL!" Yes, yes, I agree with you -- pretending to be English usually results in awkward and unnatural syntaxes. One of the advantages of a formal syntax like most programming languages is that it clicks the brain into a different mode. (How many of you can read sigs like 2b||~2b? I thought so.)
But that's not really the paper's main aim. It makes a couple of notes that all of us, particularly those of us in language design, could benefit from.
1. People tend to deal with collections in the aggregate far more often than they step through them an item at a time. The example given was "set the nectar of all the flowers to 0." Look past the syntax for a moment and look at how simple that is.
2. Debugging the traditional way sucks. Did anyone actually read that bit at the end about the 'Why?' questions, and look at the screenshots? Holy crap. That's really impressive.
Of course, I may be biased, because the points made in the article are basically the same that underlie a language I'm currently designing.
The real problem is a lack of strong domain models for most real world situations. That is, if you're starting a project to emulate something happening outside of a computer, then there's a very large likelihood that you're going to have to build your own object model to describe the situation to the desired level of accuracy. Once you have that model, it's easy enough to say "do this until that happens", but there's a world of difference between that point and staring at a blank screen at the beginning of a project.
There's been some progress (depending on who you ask) to make this easier for those who aren't full-time programmers, such as UML and related design tools, but even these are mainly limited to building a high-level template of the final result so that a human can manually implement all of the details.
This may or may not be avoidable. Vernon Vinge (author and CompSci professor) refers to the "Age of Failed Dreams" where humans eventually concede that some things just aren't possible. Expecting a current deterministic Turing device to be programmable at the level where people interact with each other may very likely be one of those areas.
Dewey, what part of this looks like authorities should be involved?
Right now that happens - only the program gets generated by programmers (sometimes outsourced to India!)
Unfortunately, what the user says they want, and what they really want are usually very different things. Natural Language Programming really doesn't solve that problem.
The critical piece is the Designer, who sits between the end user and the programmer, and asks the tough questions: "Do you really want that? Let me explain the implications of what you just asked for." "How critical is that piece of functionality that you just added on a whim, but it just added 3 years to the project plan?" "You're asking for the data to be selected this way, but really there's no use for that - have you considered selecting the data this other way?" etc.
> The authors state that syntax in program languages are too complex. I would argue that the syntax of a programming language needs to be more complex then the syntax of a natural language.
I think you really mean the opposite of what you said. The syntax of natural language is bogglingly complex. You can express the syntax of even perl with a few kilobytes of EBNF. Noam Chomsky tried to come with formal syntax rules for spoken languages and utterly failed (though his work is what led to BNF and company)
It is a matter of habit and training. I am used to think in terms of objects so any object oriented language is "natural language" for me. When I solve a problem I think of objects, methods, properties and how they work together. I don't have to translate from some abstract "natural" concepts to OO concepts. I am sure someone who is using lisp will see lists and functions in the same problem that I see objects and methods.
I understand that the goal is to have the user just tell the computer what to do in English. The problem is that English is not precise and is too ambiguous. I don't know if I would want to fly on an airpline if I knew the computer on board was programmed in English.
Why does all research like this seem to revolve around "toy" problems? They study non-programmers or, when they include real programmers, focus only on small tasks that can be completed in an hour or so.
Great, I accept that a new language can make toy problems easier.
However, I think the situation is very different when you have a real programmer working on a real program. Writing a real application, like a word processor or a web browser, is difficult no matter what language you do it in -- and I would argue that the difficulty doesn't vary much between languages. In fact, I would further argue that many of these research languages, while making toy problems easier, would actually make "real" programming substiantally harder, because the semantics of the language are not as formalized and thus more difficult to remember and deal with.
I'm certainly not opposed to advances in language theory and design -- our modern-day large applications would be essentially impossible to write if all we had to work with was machine language. But to be a major advance, a new language should focus on making real problems easier for real programmers, not making toy problems easier for non-programmers.
ZFS: because love is never having to say fsck
There are two main features of applescript. 1) The english like syntax 2) the ability to control other applications Of the two the second is by far the most important. But to gether they create a new programming experience. Because most of the complexity is sequestered in the applications you are comtrolling the applescript code tends to be very short. On average I would say my applescripts are about 30 lines with only 5 - 10 lines of working code the rest is error catching and handleing. Because of the syntax it is very easy to read the code moths or years later. Also the having short code helps. BUT the most important thisis tha becasue the code is sort YOU CAN START AGAIN. How much code is kludged because no one wants to rip out and recode 1000 or more lines of code? There is a real benefit in short code that you can read
a language that leaves all the verbs for the end of the sentence? A language that likes the modifiers to follow rather than precede their nouns?...my point is , you have one translation problem in going from high level language to machine language and another going from "natural" language to high level language. But a third problem is finding a culture-neutral natural language OR solving the natural language translation problem...and you have seen how atrocious babelfish results can be...we just aren't there yet folks! The ambiguity that must be dodged in going from normal human speech to a computer program hides in different places depending on the language, especially on which words have multiple meanings. And inflection? What are you going to do? program with emoticons?
I know that natural language is creeping into UI's in specialized search engines. If you know where to look, you will find natural language search features on Fidelity.com and perhaps other financial websites. These are much more carefullly bounded problems than the broad challenge of allowing a user to express a solution or algorithm for an arbitrary problem a computer could be programmed to do in, say C, but using ordinary speech. The article sited is interesting and it might make life better for us programmers but I am not getting my hopes up that more than incremental change to computer languages is around the corner.
SLASHDOT: news for people who can't concentrate on work or have no life at all and got tired of yelling back at the TV.
Programming is also something that is easier to express in a specialised language. Sure we can make some things more human readable but does that make it easier to understand? The hard part of programming isn't reading/writing the code so much as knowing what structures and concepts to use. Making programming more natural language like will not really make programming easier, you still need skill and practice. Using the music analogy again: I don't play music and can't read music score (the language of music). If Beethoven's fifth (if he ever had a fifth) was rewritten in a natural language it would not make it easier for me to play; I'd still need a whole lot of practice with a piano or whatever to play effectively. Relative to aquiring the piano skills, I expect learning to read sheet music would be relatively simple.
Where natural languaages might help is in system design and requirement capture. Still, however, I think that most often things go wrong because when people are expressing their thoughts in a natural language they use very woolly thinking and use vague terms.
Engineering is the art of compromise.
Where's my compilable flowchart? They're more universally understandable across human languages/cultures, including geek/wonk/artist/customer/PHB, than text. They can be intermediate-compiled to text procedures for lexical parsing techniques. And they're much easier to design, program, debug, maintain and document, especially for parallel/distributed/networked applications. They're natural language without speech. Where's my gcc flowchart preprocessor?
--
make install -not war
... make a good programming programming language. Mathematics has "been there and done that" with natural language versus a formal language. Why reinvent the mistakes of the past?
It is very difficult to write a context-free programming language, let alone a natural one! when we speak, everything is meant relative to the current context. There is no way that a mathematical abstraction can be made out of that, unless really powerful computers can try every production possible in the same time (thinking about quantum computers).
We humans don't even talk logically at times (logically in the mathematical sense). We say one thing, we mean another one. One of the most difficult things for new students is to get used to the strictly mathematical nature of computer languages. Computer thinking requires every bit to have its special meaning in the universe. Most people choke on that. The most capable programmers are those that can hold a mental model of the application, its various parts and as a whole. These types of people can translate requirements to code very efficiently, because they can reason about a program's state better since they remember the whole program and they can immediately recognize the consequences of any programming decision.
And when one becomes familiar enough with the way the computer works, then the verbosity really gets in the way.
What we need is a development environment that can reason about the state of the program. That's the root of all problems. Embedding state information in a program is something I haven't seen in any language. Most languages, if not all, work in the assumption that anything can happen anytime, and they don't have state constraints, thus allowing the programmer to make mistakes that could be cought in compile time.
The world will always need people who understand that asking for the last digit of Pi isn't a worthwhile request.
"Computer, sort this list of names, then beat me at chess without moving your queen, then formulate a method of reversing entropy." "Computer, tell me a joke."
If natural language aims at letting users tell the computer what to do in the terms they think about their tasks, the computer needs to be aware/intelligent to understand the requests. Otherwise there's always going to be a manual describing what you can and can't ask and how/how not to ask it. And people won't read manuals, they'll write programs that don't work.
And then, you and I will *finally* get programming jobs. :)
"A witty saying proves nothing." ~Voltaire
"d'Oh!" ~Homer
This is as natural as it gets.
Of course, what this is really doing is:
So unless there's a you method in package love, this will cause a runtime error. The following would be a little more consistant with the other examples, but less like English:
...and if you want this to run with the strict pragma in effect, you'll have to quote the string "perl", or use a scalar variable $perl.
In a way, the languages of mathematics and music are natural languages. Someone didn't sit down one day and enumerate all of the rules for mathematical expressions, it evolved to suit the needs of mathematicians and has retained the flexibility that results from such evolution, much like "social" languages.
It's hard for programming languages to "evolve" in the same sense since they aren't "for humans, by humans", but we do try new language designs and find that some work better than others.
Some of the more "dynamic" languages go some way to enabling this kind of evolution. If I try to use an unusual construct in a mathematical expression, I'd probably follow it with a statement in English or mathematics explaining the meaning. If it was a useful construct, others will adopt it and slowly the explanation will become unnecessary. Likewise, in some languages we can define new constructs (within certain boundaries, of course) and tell the compiler what is meant by them in simpler terms, usually by writing some kind of function. Over time, popular constructs will be adopted as core features in newer languages. One example that springs to mind is the foreach construct, which does vary from language to language but arose because it was very common to want to visit each element in a list in turn and perform some operation on it. Modern languages have become a lot more expressive so this kind of evolution will probably become more common.