Slashdot Mirror


Programmer's Language-Aware Spell Checker?

Jerry Asher writes "Not all of my coworkers are careful about spelling errors. Sometimes this causes real embarrassment as spelling errors creep into software interfaces. Does anyone know of spell checkers for programming languages? I don't want a text spell checker, I want a programming-language-aware spell checker. A spell checker that I can pass all of my code through and will flag spelling errors in function names, variable names, and comments, but will ignore language keywords, language constructs and expressions, and various programming styles (camel code, or underscores, or...). I want a spell checker that knows that void *functionSigniture(char *myRoutine) contains one spelling error. Does anyone have such a thing for Java or C++? Are there any Eclipse plugins that do this?"

5 of 452 comments (clear)

  1. Sounds like a good idea by PhrostyMcByte · · Score: 4, Interesting

    And not too hard to implement - all you need is a lexer and a few functions to classify different naming styles. lexertl even comes ready with a full example for C++, so get to it ;)

  2. It's a good question ... by YeeHaW_Jelte · · Score: 4, Interesting

    We've got code here that refers to 'insurrances', 'insurances', 'insurrences' and 'insurences', I'm not kidding.

    People here making fun of his request and saying that this should be set in stone in design documents, or be checked in peer code reviews are obviously not working in a run-of-the-mill software company where there's neither the inclination nor the time to do everything the formal way. Also, I have to see the first design document that correctly enumerates all the requirements for the software, let alone all the names for the variables to be used.

    --

    ---
    "The chances of a demonic possession spreading are remote -- relax."
  3. How about this by Ed+Avis · · Score: 4, Interesting

    Yes, this is a legitimate problem. I work on code that has spelling mistakes embedded into interfaces and it's very annoying. The fashionable use of StudlyCaps in programming (why? who decided that TextLikeThis is more readable than text_like_this?) makes the job a little harder but not impossible, as long as you follow the sane rule of making each word start with capital and continue lowercase, even if an acronym (so XmlParser not XMLParser or, God forbid, XMLparser - though of course XML_parser would be better than any of those).

    Enough rant. How about this:

    perl -ne "s/([a-z])([A-Z])/$1 $2/g; tr/A-Za-z/ /c; foreach (split) { print qq{$_\n} unless $seen{lc $_}++ }" source_file...

    That will give a list of unique words in your source code (use find and xargs to scan the whole source tree). Then you can run that list of words through an ordinary spellchecker such as ispell. Unfortunately when you find a mistake you have to go back and grep for it to find where it occurs. You would also need a personal dictionary for things that are not English words but nonetheless appear in code.

    I would probably keep the private word list containing things like 'foreach' and 'const' with the program source code, and have a makefile target 'make spellcheck' that runs a command like the above and then prints out all words found that are not in /usr/share/dict/words or in the private word list. Indeed, why not this:

    find . -type f -name '*.c' | xargs perl -ne "s/([a-z])([A-Z])/$1 $2/g; tr/A-Za-z/ /c; foreach (split) { print qq{$_\n} unless $seen{lc $_}++ }" >found_words
    sort -u private_word_list /usr/share/dict/words >allowed_words
    diff -u allowed_words found_words | grep -E '^[+][^+]'

    The private word list can be kept under version control and checked in whenever you add a new non-English word like 'Frobule' to your source code.

    Adding filenames and line numbers to the output is left as an exercise for the reader. You might also want to change the perl command to ignore words with length < 5.

    --
    -- Ed Avis ed@membled.com
  4. Annoying perhaps but by Taagehornet · · Score: 4, Interesting

    True, identifier names containing spelling errors can be a real annoyance, but I somehow doubt you'll ever find a usable solution, at least not as long as you'll need to interface to code beyond your control. What spell checker wouldn't choke on regular C++? Just picking a random declaration from MSDN (feel free to choose any other API, it won't change anything):

    HRESULT MFGetService(
    IUnknown* punkObject,
    REFGUID guidService,
    REFIID riid,
    LPVOID* ppvObject
    );


    You'll probably just end up spending all your day removing false positives.

  5. Ken Thompson and creat() by Maximum+Prophet · · Score: 4, Interesting

    Wow, 240 comments about spelling and programming and no-one's mentioned the famous Ken Thompson quote:

    "If I had to do it over again? Hmm... I guess I'd spell 'creat' with an 'e'."

    --
    All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)