Open Source Grammar Checkers?
DaveBarr asks: "Maybe I'm more sensitive to this than most, but after continuing to see "it's" instead of "its" and "loose" instead of "lose" everywhere in the media and on web sites of supposedly
reputable origin, I began to wonder. Are there any Open Source
projects trying to develop a reliable grammar checker -- one
that would catch these common foibles? Are all these algorithms
proprietary? Are there any University research projects which
could be used as a basis for even a halfway-decent grammar checker?"
i don't know what the status of the software is now, but there was once a company called Reference Software which made a really good grammar checker called Grammatik. Unfortunately, they were bought by WordPerfect, which was subsequently bought by Corel. i think that Grammatik may have been integrated into WordPerfect. Since the guys at Corel are so big into Linux right now, maybe they would consider open-sourcing Grammatik?
Not everyone understands Perl?!
O.k...in the universal language of vi...
s/\([Ii]t\)'s/\1 is/g
What I read is that encoding meaning is a remarkably difficult task. The Fifth Generation Project in Japan was said to be really about making it easy to enter Japanese text into computers; however, it apparently turned out that cultural context was simply too dificult to define, and that goal couldn't be reached. There's an ambitious project it Texas (?) that's trying to encode meaning, as I remember, and they've found out it's a huge task.
s/([iI]t)'s/$1 its/g ?
NB
THIS IS A
S
M
A
R
G
L
E
ANNOUNCEMENT!
The smargle"> race is superior!
All furry creatures - bunnies, hamsters, sheep, muppets, kitties and fuzzy smarglegoblins -
realize this, for they are allies of the smargle.
Do not attempt to challenge the smargle, for then you will surely gain a smargle right in the *bleep*.
You may now look to your right.
Plekt you!
- smargle frep
I'm always amazed when a Slashdot article
is posted *without* any grammar mistakes.
I've often wondered what would happen if
the "preview" function for submitting an
article included something like
s/([iI]t)'s/$1 is/g
Can we force geeks to recognize "it's"
for what it is through technology?
--kyler
Hmmm, I wonder if Dr. Bruce has any thoughts on designing an Open Source grammar checker? He probably could offer a lot of guidance to any group who wanted to start such a product.
Building upon spelling checker code, a fairly small dictionary could provide all the data needed to identify most homophones. At the user's choice, each homophone could be flagged with alternate spellings shown in a dialog box, with really-concise meanings for each. The user would select the intended meaning.
So far, this idea seems to have generated little interest, but it would help create fewer ridiculous bodies of text.
Far more ambitious would be a lexical analyzer that would try to deduce whether a given homophone seemed appropriate for the meanings of the words (a bottomless pit?) in the surrounding text. (Bloatware, anyone?)
Nicholas Bodley // nbodley@world.std.com
It's true that most AI programs have a necessarily limited semantic model, based on a few logic predicates and deductive rules. Logic itself is a philosophical construct, derived from observations of how people reason and solve problems, but it's not really a model of how the brain works, and efforts to get computers to assemble their own sets of rules and facts have been largely unsuccessful.
"When people try to get computers to learn, the people do and the computers don't" - Alan Perlis
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
Word Perfect for linux comes with a grammar checker, Grammatik, licensed from Novell.
Forgive me if I'm wrong here, but won't KOffice come with a grammar checker?
Common sense is a set of prejudices built up over a lifetime
Squad helps dog bite victim (a classic indeed).
Or:
The boy is hungry
The boy is a toad
Or:
The boy carried a sandwich to the playground and ate it. (the playground? Note that conjunctions are the most ambiguous words in the English language.)
It's easy for us to tell how to parse those, but a computer would have to maintain a database of the following:
playground is big
sandwich is small
people normally eat small things
when dogs bite, they harm humans
a noun indicating [a] human[s] (squad) would not harm humans.
One can argue that the purpose of learning is to fill in those pieces of knowledge, but:
1) The amount of knowledge that would have to be stored and recalled is *huge*.
2) Even if we have the storage and recall capacity, computers need to be able to interpret everything and know that, among other things, squad can be a group of people, "normally" may not always apply, etc. etc.
void recursion (void)
{
recursion();
}
while(1) printf ("infinite loop");
if (true) printf ("Stupid sig quote");
Friends don't let friends misuse the subjunctive.
Im very interested in the project of writing such a beast. I have been interested in natural language processing for years. Im also a C coder (under *nix). Anyone interested please email me at joshr@netspace.net.au
:wq ~ ~ ~ ~ ~
"s/([iI]t)'s/$1 is/" is (ugh) Perl for "substitute `it is' or `It is' for every instance of `it's' and `It's'" I don't know why people expect everyone on /. to understand Perl. I only use it for fixing other ppl's broken Perl code.
The problem with parsing English lies in the nature of English itself. English was not designed to be parsed. it was not designed with a logical structure that has been consistently implemented.
The question is what do you mean by a grammar checker? If you simply mean a program to read text and try to find obvious errors. You do not need to be able to parse English completely to do this. To extend the example from above you do not need to know exactly what "The cow is brown" means. Only if the tense agree. That program would just need to be able to recognize certain patterns as wrong. That is not impossible.
As for the other side of it, a program that actually understands what you are writing and figures out the best way to communicate that. This is much more complex. It would be a very cool program if it could be completed. Besides, what better than OSS to harness the immense mindshare that would require?
That being said, my grammar is so horrible I would love to see either one working as soon as possible.
Nate Custer
"The poet presents his thoughts festively, on the carriage of rhythm; usually because they could not walk" Nietzsche
There is a program called diction.
:)
This is a GNU program still in development. It's available at:
this link
I've played with diction and it's not bad, not great but not bad.
Frankly, I'm suprised that I haven't seen a program that understands a spoken human language. The rules are codified in millions of textbooks and semantics should be parsable from WordNet, the OED or various other sources. And there are plenty of 'M-x doctor'-like programs that try to emulate conversation; and some of them, like megahal, can 'learn' well enough to fool some people.
I've even played with coding a C library that reads like English without proper writing mechanics. A natural language interpreter shouldn't be too hard, though it would be time consuming and would probably not produce a substantial return on investment to a financial sponsor.
I am inclined to think that the problem is ideological. There are so many disagreements among philosophers, linguists, and computer scientists as to the meaning of 'The cows are brown.' that unless one person is sufficiently savvy of all three and some other disciplines, no consensus or plan will ever be implemented.