A Useful Grammar Checker?
burtdub asks: "With the amount of raw text data available, there seems to be no shortage of ambitious language projects on the horizon, from Universal Language Translators to Junk Email Filtering. However, the mess that is the English language still seems to elude commercial attempts while being relatively ignored by the open source community. What would it take to make a useful, functional grammar checker?"
What would it take to make a useful, functional grammar checker
How about a competently taught highschool English class?
Seriously, people...learn to use the language...you'll be better off.
____
~ |rip/\/\aster /\/\onkey
Grammar can often only be determined by context, especially in English, where the rules of grammar change so much. Until a computer can for itself understand context, no grammar checker can be successful (or even marginally useful). Thus, my answer to your question is two words: "Artificial Intelligence." Artificial stupidity can also be used to simulate bad English.
My Systems
How about a dictionary and classes in english, like those given in schools. Should be all that is needed.
One of the concepts that most people should realize is that the main success (and downfall) of the English language is that it can mutate quite easily.
Remember... English is the bastard child of Celtic, Latin, and various other Germanic languages. Language also affects the way the way we think and also is the key limiting factor in grasping concepts.
If your language cannot express a certain concept then you need a way to bend the rules (which English has a bad habit of doing) so that you can share that idea with others.
To enforce a view or a proper method of speaking will often stagnate a societies ability to assimilate new ideas or methods. George Orwell pointed this out when he came up with the idea for new speak in which society can restrain itself from unwanted aspects by removing societies ability to even discuss it.
We obviously do not speak Elizabethan English or the olde English of the Middle ages. Should our descendants be forced to speak an archaic language 200 years from now because we demanded to have our software set in stone what is the proper way to express ideas and communication.
Man, this sounds a bit hippy-esque, but hopefully you understand what I mean.
Still there should be some ground rules to what proper English is and should be so we can understands each other without going "Huh?" but it shouldn't be a hard-line stance that is unchangeable for the next 50 years.
"I am the king of the Romans, and am superior to rules of grammar!"
-Sigismund, Holy Roman Emperor (1368-1437)
...saying "Just learn the grammar correctly in the first place", here's a question: can you really see no use in a computerised tool to help you learn correct grammatical usage?
It's like someone coming on asking about natural media painting apps being told "Just go to art school and learn how to use REAL paint, you lazy bastard!" - you're missing the point entirely. A grammar checker would be useful even for people with a decent grasp of grammar, as a double-check. Like spell checking, do you get it yet?
Game dev and music blog
All those different forms and the nearly syntax-free sentence structure are precisely why it is easier to parse Latin than English.
Grammer checking is a thousand-fold more complicated than most people realize. English's hoary syntax, which pretty much boils down to "8 million exceptions in search of a rule", doesn't parse easily into computer code.
But I, too, would be interested in seeing this field develop - because it has the side effect of making bot AI better! Now, a voice-activated console that understood commands in plain, sloppy English would be worth striving for. Grammer-checking in a word-processor usually just provokes me: "How *dare* you red-line this sentence; I'm quoting *Shakespeare*, you illiterate rock!"
But we'll have perfect machine-generated grammer before we've reached the level of innovation required to put a spell-checker on the comment box on Slashdot!
Precisely. GPP said 140 different forms as if that would be a large number for a computer.
I prayed about it, and God said, "Don't do it!" But I thought, "I know better."
What makes English such a pain in the backside is that the language has been so utterly simplified over the millenia that we have lots of words with identical spellings, but different parts of speech. This makes the word order critical.
Technically, word order isn't critical in English. I can say "Campus green and tow'ring trees" and you understand I'm talking about a green campus. This was actually common usage in the not-so-distant past.
The problem, though, is that words have become overloaded and/or multiple words combined to a single term. For example, the green lantern is probably something you carry around to provide light when the power goes out. The Lantern Green is probably a place where they play cricket.
We're seeing this happening with things like "it's vs. its" and "their vs. they're vs. there" in some people's usage as well. Every time the spelling distinction between words breaks down, it becomes significantly more difficult for anything short of a person to get meaning out of a sentence. That's why there are so many spelling/grammar nazis on slashdot. If we don't, in a matter of just a few years, we'll get to the point where nobody can understand anything.
There is another theory which states that this has already happened.
Check out my sci-fi/humor trilogy at PatriotsBooks.
Grammar is, to a large extent, nothing more than a post hoc description of the expressive customs which have arisen amongst a particular community of speakers. Consider this: all living languages are in constant flux. Given that, when a particular alteration occurs, do you really think the collective Spirit of Grammar first makes a check for internal consistency? Language is imitative. People say things because they hear other people say them. Other than by attempting to influence the reader toward or away from a particular construction, grammar manuals can never be anything but historical documents.
What makes English such a pain in the backside is that the language has been so utterly simplified over the millenia
No, it hasn't been simplified. At least, you won't find any linguist or student of Old or Middle English who'll claim that it has simplified as opposed to changed. Presumably you'll back up this outlandish statement with, say, a detailed analysis of the history of the case system in English from the Norman conquest onwards?
that we have lots of words with identical spellings, but different parts of speech.
Yeah, just like every other language. Do you have any data suggesting that English is unusual in this respect?
This makes the word order critical.
Word order isn't critical because of homographs, it's critical because the rules of English grammar are strict about word order. From a more practical point of view, it's critical because English is too poorly inflected for a parser to work out the structure of a sentence without reference to the order of the words. In any case, there's nothing particularly difficult about parsing languages with strict word order rules, or parsing languages with homophones and homophones, or parsing languages with both.
Every time the spelling distinction between words breaks down, it becomes significantly more difficult for anything short of a person to get meaning out of a sentence.
Not really. The problem of people writing "their" instead of "they're" is absolutely trivial compared to the staggeringly difficult task of accurately parsing natural language, or machine translation, or any other NLP problem of similar complexity. For God's sake, just list "their" as a synonim for "they're" in your parser and it will figure out which meaning was intended from the grammatical structure (there are few, if any, syntactic contexts in which more than one of "there", "their" or "they're" is correct).
If we don't, in a matter of just a few years, we'll get to the point where nobody can understand anything.
People have been saying this for hundreds of years.
So, basically, you've taken one of the most difficult areas of AI (NLP) and argued that it's really difficult these days because sometimes people spell "they're" incorrectly. Weird.
Freedom is not increased by mere diminuation of government. Anarchy is freedom for the strong and slavery for the weak.