Slashdot Mirror


A Useful Grammar Checker?

burtdub asks: "With the amount of raw text data available, there seems to be no shortage of ambitious language projects on the horizon, from Universal Language Translators to Junk Email Filtering. However, the mess that is the English language still seems to elude commercial attempts while being relatively ignored by the open source community. What would it take to make a useful, functional grammar checker?"

13 of 503 comments (clear)

  1. Make it for Latin by ari_j · · Score: 5, Interesting

    The best way to write a useful grammar checker is to write it for a language with a rational syntax.

    1. Re:Make it for Latin by parvenu74 · · Score: 5, Interesting

      Rational syntax? Latin? It's one of the few languages in which you can scramble the order of the words in the sentence and not loose any meaning because the word carries enough meta-data in the form of all of the various endings. Heck, regular verbs alone have 140 different forms, and irregular verbs are exactly that, with unique endings per item. And who's to say that the "nominative-ablative-dative-accusative-verb" syntactical ordering is either correct or ideal? Cicero doesn't write like that half of the time and Caesar almost never did in his "Gallic Wars." And consider that the Catholic Church, which has used Latin as its official language longer than the Romans did, has adopted a simplified vulgatum form officially, not that the various Popes and writers throughout the centuries have bothered to use that instead of the higher-browed Classical Latin.... whose rules are you proposing to follow?

      English might actually be an easier task than trying to parse Latin.

    2. Re:Make it for Latin by dgatwood · · Score: 5, Insightful
      The thing is that most Romance languages also have word order restrictions. French, for example, adjectives come after the noun they modify.

      What makes English such a pain in the backside is that the language has been so utterly simplified over the millenia that we have lots of words with identical spellings, but different parts of speech. This makes the word order critical.

      Technically, word order isn't critical in English. I can say "Campus green and tow'ring trees" and you understand I'm talking about a green campus. This was actually common usage in the not-so-distant past.

      The problem, though, is that words have become overloaded and/or multiple words combined to a single term. For example, the green lantern is probably something you carry around to provide light when the power goes out. The Lantern Green is probably a place where they play cricket.

      We're seeing this happening with things like "it's vs. its" and "their vs. they're vs. there" in some people's usage as well. Every time the spelling distinction between words breaks down, it becomes significantly more difficult for anything short of a person to get meaning out of a sentence. That's why there are so many spelling/grammar nazis on slashdot. If we don't, in a matter of just a few years, we'll get to the point where nobody can understand anything.

      There is another theory which states that this has already happened.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    3. Re:Make it for Latin by brpr · · Score: 5, Insightful
      It's depressing being a linguistics student. Every time a language-related topic is raised you have to listen to people who don't know what they're talking about spouting off and getting modded +5 insightful (or whatever the non-Slashdot equivalent of this accolade may be).

      What makes English such a pain in the backside is that the language has been so utterly simplified over the millenia

      No, it hasn't been simplified. At least, you won't find any linguist or student of Old or Middle English who'll claim that it has simplified as opposed to changed. Presumably you'll back up this outlandish statement with, say, a detailed analysis of the history of the case system in English from the Norman conquest onwards?

      that we have lots of words with identical spellings, but different parts of speech.

      Yeah, just like every other language. Do you have any data suggesting that English is unusual in this respect?

      This makes the word order critical.

      Word order isn't critical because of homographs, it's critical because the rules of English grammar are strict about word order. From a more practical point of view, it's critical because English is too poorly inflected for a parser to work out the structure of a sentence without reference to the order of the words. In any case, there's nothing particularly difficult about parsing languages with strict word order rules, or parsing languages with homophones and homophones, or parsing languages with both.

      Every time the spelling distinction between words breaks down, it becomes significantly more difficult for anything short of a person to get meaning out of a sentence.

      Not really. The problem of people writing "their" instead of "they're" is absolutely trivial compared to the staggeringly difficult task of accurately parsing natural language, or machine translation, or any other NLP problem of similar complexity. For God's sake, just list "their" as a synonim for "they're" in your parser and it will figure out which meaning was intended from the grammatical structure (there are few, if any, syntactic contexts in which more than one of "there", "their" or "they're" is correct).

      If we don't, in a matter of just a few years, we'll get to the point where nobody can understand anything.

      People have been saying this for hundreds of years.

      So, basically, you've taken one of the most difficult areas of AI (NLP) and argued that it's really difficult these days because sometimes people spell "they're" incorrectly. Weird.

      --
      Freedom is not increased by mere diminuation of government. Anarchy is freedom for the strong and slavery for the weak.
  2. Bask in it! by TheTranceFan · · Score: 5, Funny

    Ahhh the irony of asking Slashdot how to build a grammar checker!

  3. Re:The Elements of Style and a good eye. by iced_773 · · Score: 5, Informative


    Speaking of The Elements of Style, the full text of the book can be found here. It's online now. Use it.

  4. best solution: by circletimessquare · · Score: 5, Funny

    1. break text source into a handful of slashdot comments, and submit each comment

    2. wait for the inevitable uppity howling condescending grammar nazi to response to whatever grammatical errors exist, however slight or unimportant

    3. reassemble text source and apply grammar nazis' edits

    voila! grammar checking via redundant network of distributed grammar nazis (tm)

    --
    intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
    1. Re:best solution: by the+phantom · · Score: 5, Funny

      there should be a comma between 'uppity' and 'howling'
      there should be a comma between 'howling' and 'condescending'
      'response' should be 'respond'
      'voila' should be capitalized
      should read: 'via [a|the] redundant' OR 'via redundant networks'
      there should be a period after '(tm)'

  5. Re:How about LEARNING the English language? by PitaBred · · Score: 5, Interesting

    And you wonder why people are stranded on the side of the road with a flat they can't change. You can't abstract out all the mechanics of anything, no matter how advanced.
    The problem is that "content" without proper mechanics loses all of it's value, and without proper mechanics built into the content generation process, thoughts are muddled and incoherent. There's no structure enforced. That's why people start thinking crap like Scientology is a good idea. They have no rational thought processes, they're governed solely by "content", ie "emotion". Kinda like the gorillas and monkeys you see in zoo exhibits.

  6. Re:How about LEARNING the English language? by Deanalator · · Score: 5, Insightful

    Not to be a jerk, but how is that insightful? Its not even really that funny. An open source grammar checker would be extremely useful. Everyone mistypes from time to time, and often times spellcheckes are unable to catch it.

    To the best of my knowledge, its one of the harder open problems in the OSS community. Im actually surprised that someone didnt enter something like that into the google summer of code. If I had any idea where to start, I know I would have (and I did consider it). It's a very valid question, and I look forward to seeing if anyone here comes up with any good answers.

  7. Re:What would it take? by the+phantom · · Score: 5, Funny

    A linguistics professor is giving a lecture. He explains that in English, prescriptive grammar dictates that a double negative creates a positive, for instance "I ain't got no money" would parse as "I have money." He then goes on to explain that in many languages, a double negative creates a more emphatic negative, for instance, in Russian "U menya nyet nichyevo" (literally, "By me is not had nothing") uses two negative phrases to create a stronger negative. Furthermore, the prof explains, in most languages, using two positives will create a more emphatic positive, or at the very least, will not change the meaning of a phrase, for instance "Yes, I have bananas" is fundamentally the same as "I have bananas." However, the proffessor concludes, in no language does a double positive create a negative.

    A student, in the back of the class, muttering under his breath, was heard to utter "Yeah, right."

  8. adjective-noun order in French by Tumbleweed · · Score: 5, Interesting

    French, for example, adjectives come after the noun they modify.

    Actually, that's only true for some adjectives. There is a rule to remember which ones go before the noun: 'BANGS'

    B - beauty
    A - age
    N - numerical order
    G - goodness (or badness)
    S - size

    Everything else goes after the noun.

    This has been your online French grammar lesson for the day. :)

  9. Fruit flies like a banana by BlueStraggler · · Score: 5, Interesting
    Is fruit an adjective or a noun? Is flies a noun or a verb? Is like a verb or an adjective?

    This requires some serious AI (or just plain I) to sort out. And that only gets you past the subject line. Now re-read each of the sentences in my opening paragraph, but literally this time. Each of them would choke a grammar checker, yet for most readers they will parse perfectly well within the context.

    Easier just to pay attention in Grade 7 English class, as someone already pointed out.