Programmer's Language-Aware Spell Checker?
Jerry Asher writes "Not all of my coworkers are careful about spelling errors. Sometimes this causes real embarrassment as spelling errors creep into software interfaces. Does anyone know of spell checkers for programming languages? I don't want a text spell checker, I want a programming-language-aware spell checker. A spell checker that I can pass all of my code through and will flag spelling errors in function names, variable names, and comments, but will ignore language keywords, language constructs and expressions, and various programming styles (camel code, or underscores, or...). I want a spell checker that knows that void *functionSigniture(char *myRoutine) contains one spelling error. Does anyone have such a thing for Java or C++? Are there any Eclipse plugins that do this?"
And not too hard to implement - all you need is a lexer and a few functions to classify different naming styles. lexertl even comes ready with a full example for C++, so get to it ;)
We've got code here that refers to 'insurrances', 'insurances', 'insurrences' and 'insurences', I'm not kidding.
People here making fun of his request and saying that this should be set in stone in design documents, or be checked in peer code reviews are obviously not working in a run-of-the-mill software company where there's neither the inclination nor the time to do everything the formal way. Also, I have to see the first design document that correctly enumerates all the requirements for the software, let alone all the names for the variables to be used.
---
"The chances of a demonic possession spreading are remote -- relax."
Yes, this is a legitimate problem. I work on code that has spelling mistakes embedded into interfaces and it's very annoying. The fashionable use of StudlyCaps in programming (why? who decided that TextLikeThis is more readable than text_like_this?) makes the job a little harder but not impossible, as long as you follow the sane rule of making each word start with capital and continue lowercase, even if an acronym (so XmlParser not XMLParser or, God forbid, XMLparser - though of course XML_parser would be better than any of those).
/c; foreach (split) { print qq{$_\n} unless $seen{lc $_}++ }" source_file...
/usr/share/dict/words or in the private word list. Indeed, why not this:
/c; foreach (split) { print qq{$_\n} unless $seen{lc $_}++ }" >found_words /usr/share/dict/words >allowed_words
Enough rant. How about this:
perl -ne "s/([a-z])([A-Z])/$1 $2/g; tr/A-Za-z/
That will give a list of unique words in your source code (use find and xargs to scan the whole source tree). Then you can run that list of words through an ordinary spellchecker such as ispell. Unfortunately when you find a mistake you have to go back and grep for it to find where it occurs. You would also need a personal dictionary for things that are not English words but nonetheless appear in code.
I would probably keep the private word list containing things like 'foreach' and 'const' with the program source code, and have a makefile target 'make spellcheck' that runs a command like the above and then prints out all words found that are not in
find . -type f -name '*.c' | xargs perl -ne "s/([a-z])([A-Z])/$1 $2/g; tr/A-Za-z/
sort -u private_word_list
diff -u allowed_words found_words | grep -E '^[+][^+]'
The private word list can be kept under version control and checked in whenever you add a new non-English word like 'Frobule' to your source code.
Adding filenames and line numbers to the output is left as an exercise for the reader. You might also want to change the perl command to ignore words with length < 5.
-- Ed Avis ed@membled.com
True, identifier names containing spelling errors can be a real annoyance, but I somehow doubt you'll ever find a usable solution, at least not as long as you'll need to interface to code beyond your control. What spell checker wouldn't choke on regular C++? Just picking a random declaration from MSDN (feel free to choose any other API, it won't change anything):
HRESULT MFGetService(
IUnknown* punkObject,
REFGUID guidService,
REFIID riid,
LPVOID* ppvObject
);
You'll probably just end up spending all your day removing false positives.
Wow, 240 comments about spelling and programming and no-one's mentioned the famous Ken Thompson quote:
"If I had to do it over again? Hmm... I guess I'd spell 'creat' with an 'e'."
All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)