Slashdot Mirror


Programmer's Language-Aware Spell Checker?

Jerry Asher writes "Not all of my coworkers are careful about spelling errors. Sometimes this causes real embarrassment as spelling errors creep into software interfaces. Does anyone know of spell checkers for programming languages? I don't want a text spell checker, I want a programming-language-aware spell checker. A spell checker that I can pass all of my code through and will flag spelling errors in function names, variable names, and comments, but will ignore language keywords, language constructs and expressions, and various programming styles (camel code, or underscores, or...). I want a spell checker that knows that void *functionSigniture(char *myRoutine) contains one spelling error. Does anyone have such a thing for Java or C++? Are there any Eclipse plugins that do this?"

24 of 452 comments (clear)

  1. Eclipse WTP 3.3 Europa seems to do this.. almost. by pringlis · · Score: 5, Informative

    The version of Eclipse I run, Eclipse WTP 3.3, does spell checking on comments as standard. Not for variable, function names and the like though. It's a decent first attempt though. In truth, I turned it off within the first few hours. It underlines any mistakes in red which I find really annoying when scanning code as I keep thinking I've seen syntax errors. More often than not my eyes are drawn to a spelling mistake, which in many cases isn't even really a mistake, which distracts me from what I'm actually trying to look at.

  2. How about eyeball Mk 1? by uucp2 · · Score: 5, Funny

    Some people call using it a "code review". If you are really serious about it, post the code to /. - plenty of people here seem to have time to point out any spelling errors.

    1. Re:How about eyeball Mk 1? by Anonymous Coward · · Score: 5, Insightful

      Um, let me introduce you to the famous spelling mistake: HTTP Referer. How about we let computers and people each do what they're good at. Computers are good at comparing strings in a spell checker, and people are good at producing typos, spelling mistakes, and approving fixes. Discipline isn't the solution, better tools are. (I bet there's a spelling mistake in here -- which proves my point that Opera needs a spell check like Firefox!)

    2. Re:How about eyeball Mk 1? by iapetus · · Score: 5, Insightful

      You're aware of the concept that a bug is cheaper to fix the earlier you spot it? If it's flagged up as soon as it happens I have to rename that one variable in one place, and I can do it at virtually no cost. If it's flagged up after I've finished the work and committed it for review, then I'll need to change it across multiple files (sure, an IDE will do refactorings like that in most cases, but there can be side effects) and recommit. That's a far greater expense.

      --
      ++ Say to Elrond "Hello.".
      Elrond says "No.". Elrond gives you some lunch.
    3. Re:How about eyeball Mk 1? by asc99c · · Score: 5, Funny

      I bet there's a spelling mistake in here
      That's a good bet in a post explicitly pointing out a famous spelling mistake :)
  3. Sounds like a good idea by PhrostyMcByte · · Score: 4, Interesting

    And not too hard to implement - all you need is a lexer and a few functions to classify different naming styles. lexertl even comes ready with a full example for C++, so get to it ;)

  4. Re:How about the Built-in OS X spell checker? by BadAnalogyGuy · · Score: 5, Funny

    How about the Built-in OS X spell checker?

    We're talking about programming, friend.

  5. It's a good question ... by YeeHaW_Jelte · · Score: 4, Interesting

    We've got code here that refers to 'insurrances', 'insurances', 'insurrences' and 'insurences', I'm not kidding.

    People here making fun of his request and saying that this should be set in stone in design documents, or be checked in peer code reviews are obviously not working in a run-of-the-mill software company where there's neither the inclination nor the time to do everything the formal way. Also, I have to see the first design document that correctly enumerates all the requirements for the software, let alone all the names for the variables to be used.

    --

    ---
    "The chances of a demonic possession spreading are remote -- relax."
    1. Re:It's a good question ... by Corporate+Troll · · Score: 5, Informative

      As a non-native English speaker, working in a non-native english speaking team (mainly french speaking people) it is a real problem. The biggest problem happens when you search something and don't find it because you wrote it right and your coworker wrote it wrong. (Or the inverse, I don't claim to be perfect in English)

      Sure, you might say, "Write your code in French", but that's not a solution. My mother tongue is Dutch, we have a German coworker, and you never know if the next guy will be Italian. There is also this team that has to maintain code written by Spanish people.... in Spanish.... and they don't know Spanish. Fun times, if you like to hear them curse....

      In multilingual environments this problem increases drastically.

  6. Re:May I suggest.... by DarkSkiesAhead · · Score: 5, Insightful

    if you want your code to read like english, you consider a language like COBOL? Not that it would help you with spell checking, per se...

    Responses like this entirely miss the point of the question. Same with the "just review your code" responses. It's not a matter of making the language more readable. It's a matter of making the code more usable. Certainly, correct spelling is pointless without other elements of good code practice. However, bad spelling can add a lot of frustration.

    I joined a project which already had a few misspelled class names. I'm a fast typer and often I've typed out more of a filename than is spelled correctly before hitting tab to complete the name. Needless to say, I've been trained to hit tab earlier for a few choice files. But it's certainly been an irritation. Similarly, I've been confounded more than once when a function or variable couldn't be found by the compiler, only to realize that I'd spelled a word correctly rather than how the actual name was spelled.

    We choose to use English words for our class, function, and variable names for a reason. That reason is mostly defeated by misspelling the English word. A dictionary is a great idea, even for coding languages that don't "read like English".

  7. Re:simple by YttriumOxide · · Score: 4, Insightful

    It's not so simple when you're not the one writing the code, but have to deal with the results. There's an SDK that I use as a part of my job, developed by our head office in Japan - it's a set of C# classes, and nothing annoys me more than typing "Connection foo = new Connection();", then noticing Visual Studio isn't highlighting it as I'd expect. Hunting around for anywhere up to a minute and eventually finding out it is actually "Conectin" instead of "Connection". If there were a good "programmers spellchecker", I may not need to use it myself, but I could give it to my Japanese colleagues to make MY life easier! (note: the above example is fictitious, but is an illustration of the type of error that I deal with that this would prevent)

    --
    My book about LSD and Self-Discovery
    Also on facebook as: DroppingAcidDaleBewan
  8. Re:Eclipse WTP 3.3 Europa seems to do this.. almos by Bastard+of+Subhumani · · Score: 5, Funny

    Also plenty of spellcheckers will ignore one or two letter words.
    So if you use fortran, you're screwed? No change there, then ...
    --
    Only three things are certain; death, taxes, and apocryphal quotations - Ben Franklin.
  9. Re:Visual Assist by nietsch · · Score: 4, Funny

    Please don't use the names of the tools the beast of Redmond uses to stupify the world. This is /. after all, if you have to code on/for windos, then please be humble and shy about it.

    --
    This space is intentionally staring blankly at you
  10. Re:the problem is really prevalent by Hognoxious · · Score: 4, Funny

    I have seen large portions of source code, much of it commercial, containing not one or two but hundreds of spelling mistakes.
    Man, I was digging around in some SAP code recently and none of it made sense. Half of it didn't even look like English!
    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  11. How about this by Ed+Avis · · Score: 4, Interesting

    Yes, this is a legitimate problem. I work on code that has spelling mistakes embedded into interfaces and it's very annoying. The fashionable use of StudlyCaps in programming (why? who decided that TextLikeThis is more readable than text_like_this?) makes the job a little harder but not impossible, as long as you follow the sane rule of making each word start with capital and continue lowercase, even if an acronym (so XmlParser not XMLParser or, God forbid, XMLparser - though of course XML_parser would be better than any of those).

    Enough rant. How about this:

    perl -ne "s/([a-z])([A-Z])/$1 $2/g; tr/A-Za-z/ /c; foreach (split) { print qq{$_\n} unless $seen{lc $_}++ }" source_file...

    That will give a list of unique words in your source code (use find and xargs to scan the whole source tree). Then you can run that list of words through an ordinary spellchecker such as ispell. Unfortunately when you find a mistake you have to go back and grep for it to find where it occurs. You would also need a personal dictionary for things that are not English words but nonetheless appear in code.

    I would probably keep the private word list containing things like 'foreach' and 'const' with the program source code, and have a makefile target 'make spellcheck' that runs a command like the above and then prints out all words found that are not in /usr/share/dict/words or in the private word list. Indeed, why not this:

    find . -type f -name '*.c' | xargs perl -ne "s/([a-z])([A-Z])/$1 $2/g; tr/A-Za-z/ /c; foreach (split) { print qq{$_\n} unless $seen{lc $_}++ }" >found_words
    sort -u private_word_list /usr/share/dict/words >allowed_words
    diff -u allowed_words found_words | grep -E '^[+][^+]'

    The private word list can be kept under version control and checked in whenever you add a new non-English word like 'Frobule' to your source code.

    Adding filenames and line numbers to the output is left as an exercise for the reader. You might also want to change the perl command to ignore words with length < 5.

    --
    -- Ed Avis ed@membled.com
  12. Re:Eclipse WTP 3.3 Europa seems to do this.. almos by somersault · · Score: 5, Funny

    His next project is to have a handy little helper with a RAM chip avatar. His name is chippy and he comes out with helpful phrases like:

    "You appear to be creating an infinite loop. Would you like me to increment your counter variable?"

    "You appear to be writing a virus, would you like a list of the latest Windows Vista sploits?"

    --
    which is totally what she said
  13. FxCop by Koyaanisqatsi · · Score: 5, Informative

    For .net languages, FxCop does some of this checking, even understanding camel casing and underscores in tokens. And a bunch more, since it is a static code analysis tool.

    http://www.gotdotnet.com/Team/FxCop/

  14. Annoying perhaps but by Taagehornet · · Score: 4, Interesting

    True, identifier names containing spelling errors can be a real annoyance, but I somehow doubt you'll ever find a usable solution, at least not as long as you'll need to interface to code beyond your control. What spell checker wouldn't choke on regular C++? Just picking a random declaration from MSDN (feel free to choose any other API, it won't change anything):

    HRESULT MFGetService(
    IUnknown* punkObject,
    REFGUID guidService,
    REFIID riid,
    LPVOID* ppvObject
    );


    You'll probably just end up spending all your day removing false positives.

  15. Man Dies Waiting for Eclipse to Launch by Anonymous Coward · · Score: 5, Funny

    Man Dies Waiting for Eclipse to Launch

    A software engineer in San Jose, CA was found dead at his desk yesterday, apparently having died while waiting for his Java editing program, Eclipse, to finish its boot process. Coworkers say the engineer came in that morning vowing to "get Eclipse working on his box or die trying." The last thing anyone heard him say aloud was the cryptic comment: "I see the splash screen is appropriately blue." Nobody knows what he meant. The man was then thought to have fallen asleep, but hours later it was discovered that the engineer had died suddenly of apparent natural causes. The forensics team's investigation that evening was reportedly interrupted unexpectedly when the dead man's Eclipse program suddenly finished launching. The team tried to interact with it to see if they could find clues about the man's death, but the program was unresponsive and the machine ultimately had to be rebooted. At this time, the police commissioner says there is no evidence of foul play, and they currently believe the man simply died of either boredom or frustration.

    1. Re:Man Dies Waiting for Eclipse to Launch by ravenlock · · Score: 5, Informative

      Credit where credit is due -- this is an excerpt from Stevey's Tech News, Issue #1.

  16. ego != good_open_minded_programmer by MindKata · · Score: 5, Insightful

    "Any douche who doesn't realise a misspelt function name will fail to compile clearly hasn't written any code yet."

    You clearly fail to see a programmer can also create their own function names, as well as use other peoples functions. So you prove you are a very inexperienced programmer, (and close minded), which adds weight to the idea you are either young or just arrogant. Also your very apparent need to show hostility, shows a degree of insecurity, where you are over compensating, by verbally hitting out at others, in an attempt to appear to be more knowledgeable than you really are.

    The easiest way to become a better programmer, is to be more open minded. So far you have failed to demonstrate this.

    As a side note, (back in the DOS days of programming), I found the the spell checker in Multiedit very useful (especially when having to work very late at night, after the coffee stopped working! ;)

    --
    There are 10 kinds of people in the world... those who understand binary and those who don't.
    1. Re:ego != good_open_minded_programmer by KavyBoy · · Score: 5, Funny

      From the GP's website when view without Flash:
      We're the do-anything team that specialises in imaginging new ways for you to reach your audience.

      The word "pwned" doesn't spell check correctly either, but it is applicable.

  17. Re:What the fuck is the OP on? by Anonymous Coward · · Score: 5, Funny

    It's in the third word. You missed a letter.

  18. Ken Thompson and creat() by Maximum+Prophet · · Score: 4, Interesting

    Wow, 240 comments about spelling and programming and no-one's mentioned the famous Ken Thompson quote:

    "If I had to do it over again? Hmm... I guess I'd spell 'creat' with an 'e'."

    --
    All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)