A Useful Grammar Checker?
burtdub asks: "With the amount of raw text data available, there seems to be no shortage of ambitious language projects on the horizon, from Universal Language Translators to Junk Email Filtering. However, the mess that is the English language still seems to elude commercial attempts while being relatively ignored by the open source community. What would it take to make a useful, functional grammar checker?"
All you need is my 7th grade English teacher staring over your shoulder all day.
That'll get you twisted into shape real good.
The best way to write a useful grammar checker is to write it for a language with a rational syntax.
What would it take to make a useful, functional grammar checker
How about a competently taught highschool English class?
Seriously, people...learn to use the language...you'll be better off.
____
~ |rip/\/\aster /\/\onkey
Remember Linguo? Or am I dating myself? (ew)
Grammar can often only be determined by context, especially in English, where the rules of grammar change so much. Until a computer can for itself understand context, no grammar checker can be successful (or even marginally useful). Thus, my answer to your question is two words: "Artificial Intelligence." Artificial stupidity can also be used to simulate bad English.
My Systems
How about a dictionary and classes in english, like those given in schools. Should be all that is needed.
1 tbsp of crazy
1 ounce of nuts
4 cups of pure genius
1/2 tsp of wit
5 gallons of caffeine*
*Your product of choice.
Ahhh the irony of asking Slashdot how to build a grammar checker!
People are always making these grammar checkers that work "from the inside out": look at the words, surround them with expectations of what words can agree with them grammatically, and flag contradictions. But humans are interactive with language, like everything else we do. Proper speakers and writers of English are good listeners (and readers). When we hear what we've said, we imagine what that would mean to us if it had been said to us. When the words make us think of something different from what we though before we said them, we correct ourselves. A better grammar checker might work "from the outside in": compose imagery or relationships between recorded objects as represented in the written words, and show implications to the writer, to match against their expectations.
That might be a mightily complex undertaking, akin to a machine "understanding" the words. But it would replicate the feedback we humans already use to keep our grammar correct, and to understand each other. If we aimed that high, we could probably find a less ambitious assistance that's easier to automate, but goes a long way towards helping us express our words to computers, and to each other using computers.
--
make install -not war
I have absolutely no idea what the appropriate requirements for a grammar checking engine would be.
However, I doubt slashdot would be an appropriate place to seek advice on the subject.
English is a complex and "dirty" language, effective usage can involve breaking what are the accepted rules.
Where's the Kaboom?
There's supposed to be an Earth-shattering Kaboom.
Back when WordPerfect was actually giving MS Word a fight, grammatik was a great grammar checking program for DOS, Windows, Macintosh and Unix & years ahead of anything which made it into MS Word. It was developed by Reference Software, before WordPerfect acquired them. I assume Corel still has this & uses it in their WordPerfect Office Suite.
Not perfect (our language is eccentric & computers are stupid), but the best I've seen.
so it will take a miracle.
Speaking of The Elements of Style, the full text of the book can be found here. It's online now. Use it.
One of the concepts that most people should realize is that the main success (and downfall) of the English language is that it can mutate quite easily.
Remember... English is the bastard child of Celtic, Latin, and various other Germanic languages. Language also affects the way the way we think and also is the key limiting factor in grasping concepts.
If your language cannot express a certain concept then you need a way to bend the rules (which English has a bad habit of doing) so that you can share that idea with others.
To enforce a view or a proper method of speaking will often stagnate a societies ability to assimilate new ideas or methods. George Orwell pointed this out when he came up with the idea for new speak in which society can restrain itself from unwanted aspects by removing societies ability to even discuss it.
We obviously do not speak Elizabethan English or the olde English of the Middle ages. Should our descendants be forced to speak an archaic language 200 years from now because we demanded to have our software set in stone what is the proper way to express ideas and communication.
Man, this sounds a bit hippy-esque, but hopefully you understand what I mean.
Still there should be some ground rules to what proper English is and should be so we can understands each other without going "Huh?" but it shouldn't be a hard-line stance that is unchangeable for the next 50 years.
"I am the king of the Romans, and am superior to rules of grammar!"
-Sigismund, Holy Roman Emperor (1368-1437)
A possibility is to assign every word in a sentence a number of descriptors (tense, part of speech, etc...) and see if they are in a logical order. For example:
I use a grammer checker.
Nominative Pronoun, present tense transitive action verb, general article for non-vowel sounds, adjective, noun.
Simiilarily, She kick a red ball would have the same pattern.
Assuming that an adequate dictionary is compiled (containing all the descriptors, relying on context for a word such as "grammer" (if before noun, grammer is an adjective, otherwise, it is a noun).
While this system would be very difficult to design, I believe that the basic approach would work.
A grammar checker need I not.
AT&ROFLMAO
American and British English remain, for the most part, mutually intelligible. They have largely drifted together.
However, that has happened with a large english speaking population.
I'm expecting it to split over time into an international english, which will be largely today's american english, and whatever the english speaking countries drift into speaking. I suppose that they *could* be enough of an anchor to slow the mutation of the language, but I doubt it. I'm even more skeptical of the idea that the now established international english would follow the changes of the native speakers--there's no reason for a french-speaker and a korean speaker, both of whom speak english as an international language, to change their english due to americans or brits.
hawk
If the /. community provides any indication, good grammar checkers wouldn't be used even if they existed. Spell checkers work very well and no one seems to pay them any heed.
Chance 'em.
...saying "Just learn the grammar correctly in the first place", here's a question: can you really see no use in a computerised tool to help you learn correct grammatical usage?
It's like someone coming on asking about natural media painting apps being told "Just go to art school and learn how to use REAL paint, you lazy bastard!" - you're missing the point entirely. A grammar checker would be useful even for people with a decent grasp of grammar, as a double-check. Like spell checking, do you get it yet?
Game dev and music blog
1. break text source into a handful of slashdot comments, and submit each comment
2. wait for the inevitable uppity howling condescending grammar nazi to response to whatever grammatical errors exist, however slight or unimportant
3. reassemble text source and apply grammar nazis' edits
voila! grammar checking via redundant network of distributed grammar nazis (tm)
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
I think *I* write grammar checker is ok?
Grammer checking is a thousand-fold more complicated than most people realize. English's hoary syntax, which pretty much boils down to "8 million exceptions in search of a rule", doesn't parse easily into computer code.
But I, too, would be interested in seeing this field develop - because it has the side effect of making bot AI better! Now, a voice-activated console that understood commands in plain, sloppy English would be worth striving for. Grammer-checking in a word-processor usually just provokes me: "How *dare* you red-line this sentence; I'm quoting *Shakespeare*, you illiterate rock!"
But we'll have perfect machine-generated grammer before we've reached the level of innovation required to put a spell-checker on the comment box on Slashdot!
A linguistics professor is giving a lecture. He explains that in English, prescriptive grammar dictates that a double negative creates a positive, for instance "I ain't got no money" would parse as "I have money." He then goes on to explain that in many languages, a double negative creates a more emphatic negative, for instance, in Russian "U menya nyet nichyevo" (literally, "By me is not had nothing") uses two negative phrases to create a stronger negative. Furthermore, the prof explains, in most languages, using two positives will create a more emphatic positive, or at the very least, will not change the meaning of a phrase, for instance "Yes, I have bananas" is fundamentally the same as "I have bananas." However, the proffessor concludes, in no language does a double positive create a negative.
A student, in the back of the class, muttering under his breath, was heard to utter "Yeah, right."
Rhapsody in Numbers
Why would open source people need grammar checkers? All we have to do is post a message to Slashdot and it will be prodded, poked, parsed, and insulted until nothing is left, it's great! Spelling, grammar, translation, jargon checking, and even *^%hole tests are available! We don't even have to be on topic, any message can be submitted....
Here's to losing my Karma Bonus again....
Yes, you right quite are, it's plenty enough superiorly good. Whom was I that did wanted to used they're opened source shit that to?
I use it all the time, it okay'd this posting.
ôó
Just look at jamaican english
http://niceup.com/patois.txt
some sample phrases:
"No cup no broke, no coffee no dash wey". Even if disaster strikes your home it's always possible
that all may not be lost. (22)
you don't make a fuss there won't be a fight. (29)
"Wha eye no see, heart no leap" means that something terrible could happen but if you don't
see it, you are not frightened. (29)
"mi come here fi drink milk, mi noh come here fi count cow". A remimder
to conduct business in a straightforward manner. (22)
"The higher the monkey climbs the more him expose". A truly comic image if
you've ever been to the zoo, and comforting to any of us whose backs have been
used as a stepping-stone for someone else's success. (22)
"A city upon the hill cannot be hidden." same as above (29)
I personally believe that language will just evolve so that our childrens children, will be almost incomprehensible to us. as you can see, having africans speak english for 400 years in jamaica gave them there own particular flavour of the language.
I'll just use my special getting high powers one more time...
French, for example, adjectives come after the noun they modify.
:)
Actually, that's only true for some adjectives. There is a rule to remember which ones go before the noun: 'BANGS'
B - beauty
A - age
N - numerical order
G - goodness (or badness)
S - size
Everything else goes after the noun.
This has been your online French grammar lesson for the day.
This requires some serious AI (or just plain I) to sort out. And that only gets you past the subject line. Now re-read each of the sentences in my opening paragraph, but literally this time. Each of them would choke a grammar checker, yet for most readers they will parse perfectly well within the context.
Easier just to pay attention in Grade 7 English class, as someone already pointed out.
Among other examples ....
.... neither the German pilot nor the young, inexperienced Cypriot co-pilot could speak the same language fluently, and each had difficulty understanding how the other spoke English, the worldwide language of air traffic control. ... ... ... ....
.... the crew at over 14,000 feet would already be experiencing some disorientation because of a lack of oxygen.
Crew confusion found in Athens plane crash
By Don Phillips International Herald Tribune
WEDNESDAY, SEPTEMBER 7, 2005
PARIS The crew members of a Cypriot airliner that crashed Aug. 14 near Athens became confused by a series of alarms as the plane climbed, failing to recognize that the cabin was not pressurizing until they grew mentally disoriented because of lack of oxygen and passed out....
The plane had a sophisticated new flight data recorder that provided a wealth of information.
At 10,000 feet, or 3,000 meters, as designed, an alarm went off to warn the crew that the plane would not pressurize.
At 14,000 feet, oxygen masks deployed as designed and a master caution light illuminated in the cockpit. Another alarm sounded at about the same time on an unrelated matter, warning that there was insufficient cooling air in the compartment housing avionics equipment.
The radio tapes showed that this created tremendous confusion
During this time, the German captain and the Cypriot co-pilot discovered they had no common language and that their English, while good enough for normal air traffic control purposes, was not good enough for complicated technical conversation in fixing the problem....
Jeez, how you younguns forget! In my day, we had style and diction, and we liked it. None of that fancy-schmancy parsing irregular grammar, just pattern match a few of the worst cases, throw out a few statistics, and wow!
Of course, that was when the line printer was state of the art, and you had to cut your printout into sheets to turn your English assignment in, and two or three nroff submissions could bring the PDP 11-44 to its knees...
Envy my 5 digit Slashdot User ID!
A man's shirt is a feminine object, and a woman's blouse is a masculine object? Why?!
:)
Hey, anything that wants to be pressed against boobies all day can be assumed to be masculine.
What makes English such a pain in the backside is that the language has been so utterly simplified over the millenia that we have lots of words with identical spellings, but different parts of speech. This makes the word order critical.
Firstly, don't say it's been "simplified". Say rather that it has gained complexity in some areas and lost complexity in others.
Your point will help me illustrate:
<expound>
English used to have a larger set of grammatical suffixes (known as inflectional morphology), kind of like Latin. You put a particular suffix on a noun to mark it as the direct object; you put a particular suffix on a verb to mark its tense, number, or whatever. English has largely lost these endings, mostly due to some heavy phonological reduction of lots of its vowels during the late Old English and early Middle English periods, starting around 1000 CE and ending around 1200 CE. Basically, vowels in unstressed syllables turned to schwa (which is the first vowel in the word under, as pronounced by a typical American newscaster). Because of this, inflectional suffixes became ambiguous; because they were ambiguous, people stopped using them.
So English lost all that inflectional morphology. So what? Well, before this happened, English word-order was relatively free. Afterward, people could no longer disambiguate syntactic categories by the endings. So word-order took up that role, and English word-order became more fixed.
For more details, see [1].
</expound>
So just like a big game of whack-a-mole, a loss of complexity in one area led, in a rather straightforward manner, to an increase in complexity in another.
If we don't, in a matter of just a few years, we'll get to the point where nobody can understand anything.
This is patently untrue, but I forgive you. From an earlier post of mine:
<windbag>
This is a very common sentiment among educated people, cross-linguistically and cross-culturally. In basically every culture around the world, there is a group of people, usually middle-aged, that believes that people spoke their language "correctly" about a generation or two ago. They lament the eminent doom of their language. They blame the young, the uneducated, and the poor.
The fact is that languages change constantly, and lots of these changes can be pretty well understood as natural processes. For instance, if you're from the US, you probably pronounce the word butter with a d-like sound in normal speech (linguists call the sound a "voiced alveolar tap"). So it sounds just like "budder". When people started using that pronunciation, their elders probably thought them "lazy" as well. I can almost hear them saying, "Pronounce your t's properly!"
But think about it. In order to pronounce the word with a proper tt in the middle, you'd have to turn your voice on to say the b and the u, then turn it off to say tt, and then turn it back on to say er. It's much easier to just leave your voice on! And that's what people started doing. If you say the word with a "hard" t sound in America today, people will probably consider it strange.
</windbag>
People do not "mispronounce" and misspell words because they are stupid, lazy, poor, or young. (I realize the parent was not asserting that such is the case; however, the sentiment is common enough to warrant mentioning here.) The true reasons for these phenomena are remarkably subtle. Linguists have made great strides in understanding them, but there is still a very long way to go.
In any case, people have been misspelling words for a good healthy number of centuries now. Yet here we are, writing in English back and forth to each other. I'm not too worried.
References:
"It's one of the few languages in which you can scramble the order of the words in the sentence and not loose any meaning because the word carries enough meta-data in the form of all of the various endings."
:)
It's not like I'm a grammar nazi or anything, I just like the irony
"When the atomic bomb goes off there's devastation...but when the atomic bong goes off there's celebraaaaation!"
Man, I wish I had better karma, because I've got useful things to say here.
You can check grammar using a well-trained Hidden Markov Model and the Viterbi Algorithm. If I were to design such a program, I would have the part-of-speech tagger have a go at a sentence, and if it came back with a confidence below, say, x, then the sentence's grammar is probably not good.
This is nice because it also helps sentences keep from being awkward.
Yes
(I'm guessing yes, but not significant ones).That depends on what you mean by "significant". Japanese has two different parts of speech that correspond to English "adjectives". Some languages fail to make distinctions found in English; Choctaw has nouns, verbs, and adverbs -- an nothing else--to prepositions, no adjectives, no quantifiers (ie. numerals or logical quantifiers.) All those three English categories are verbs in Choctaw. Some people have argued that the Salishan languages and some varieties of Indonesian have only a single part of speech: nouns are verbs are adjectives etc. That's a very controversial position. Most linguists believe there is at least a universal distinction between nouns and verbs.
When you are modelling language, you are modelling the mind.Sure -- perhaps most linguists alive today believe that, and have for a long time. The question is: when languages differ, does that reflect a difference in the minds of the people, or does it just show how incomplete our understanding of language and mind is? Some languages do not have different words for "blue" and "green" but careful tests show that they do in fact distinguish the colors. On the other hand, some languages do not distinguish "left" from "right", and careful tests show that they do not distinguish them (except perhaps in reference to oppossing body parts). The connection between language and mind is there, but not very straightforward (in this humble linguists opinion). I know you were talking about parts of speech, not individual lexical items, but you can apply the same issues to the differences pointed out above.
...and unless you're using English English instead of American English. The phrase "green campus" is American English phrase, with no direct translation under English English variants.
I once had a US border security guard ask me whether I spoke English. The temptation to reply "My dear chap, I don't just speak it, I am English!" was almost unbearable, but the nearby box of latex gloves convinced me that the more concise "Yes sir" was more appropriate.
(Anyone who thinks that there is such a standard as "British" English has obviously never attempted a conversation with someone from Glasgow.)
Andrew Oakley - www.aoakley.com
What would it take to make a useful, functional grammar checker?
You'd have to find programmers who actually knew correct English grammar.
Geez, that must be like talking to a smurf.
-Eric
SJW: Someone who has run out of real oppression, and has to fake it.
You know, the funny thing is, I suspect the idea would work.
I prefer the "u" in honour as it seems to be missing these days.