Programmer's Language-Aware Spell Checker?
Jerry Asher writes "Not all of my coworkers are careful about spelling errors. Sometimes this causes real embarrassment as spelling errors creep into software interfaces. Does anyone know of spell checkers for programming languages? I don't want a text spell checker, I want a programming-language-aware spell checker. A spell checker that I can pass all of my code through and will flag spelling errors in function names, variable names, and comments, but will ignore language keywords, language constructs and expressions, and various programming styles (camel code, or underscores, or...). I want a spell checker that knows that void *functionSigniture(char *myRoutine) contains one spelling error. Does anyone have such a thing for Java or C++? Are there any Eclipse plugins that do this?"
Um, let me introduce you to the famous spelling mistake: HTTP Referer. How about we let computers and people each do what they're good at. Computers are good at comparing strings in a spell checker, and people are good at producing typos, spelling mistakes, and approving fixes. Discipline isn't the solution, better tools are. (I bet there's a spelling mistake in here -- which proves my point that Opera needs a spell check like Firefox!)
Anything that may appear in a user Interface should be kept in dedicated files. Use a standard format such as CSV, XML...It may be reviewed by non-technical people with built-in spell checker software such as excel. This is a trick mainly use for multilanguages project, but it really helps.
I particularly like the spelling feature in new vim, right-click menu (:set mousemodel=popup) to select a corrected word or remember current word as correct. Perhaps writing a vim plugin as you explain could be possible? I'd be very glad to use it too ;)
#
#\ @ ? Colonize Mars
#
Responses like this entirely miss the point of the question. Same with the "just review your code" responses. It's not a matter of making the language more readable. It's a matter of making the code more usable. Certainly, correct spelling is pointless without other elements of good code practice. However, bad spelling can add a lot of frustration.
I joined a project which already had a few misspelled class names. I'm a fast typer and often I've typed out more of a filename than is spelled correctly before hitting tab to complete the name. Needless to say, I've been trained to hit tab earlier for a few choice files. But it's certainly been an irritation. Similarly, I've been confounded more than once when a function or variable couldn't be found by the compiler, only to realize that I'd spelled a word correctly rather than how the actual name was spelled.
We choose to use English words for our class, function, and variable names for a reason. That reason is mostly defeated by misspelling the English word. A dictionary is a great idea, even for coding languages that don't "read like English".
Also people tend to miss that our brains are very good at correcting spelling mistakes as we read, doing code review trying to catch spelling mistakes can be very tough.
It's not so simple when you're not the one writing the code, but have to deal with the results. There's an SDK that I use as a part of my job, developed by our head office in Japan - it's a set of C# classes, and nothing annoys me more than typing "Connection foo = new Connection();", then noticing Visual Studio isn't highlighting it as I'd expect. Hunting around for anywhere up to a minute and eventually finding out it is actually "Conectin" instead of "Connection". If there were a good "programmers spellchecker", I may not need to use it myself, but I could give it to my Japanese colleagues to make MY life easier! (note: the above example is fictitious, but is an illustration of the type of error that I deal with that this would prevent)
My book about LSD and Self-Discovery
Also on facebook as: DroppingAcidDaleBewan
The question wasn't about user interface strings. It was about spelling in APIs. e.g. One issue at my last company, which was British, is that they standardized on US spelling, but some British spellings crept in too. So sometimes you'd get a function containing "Initialize" and sometimes "Initialise".
It strikes me that the problem is that most spell checkers try to check everything, and that a lot of code has things that really shouldn't be spell checked at all, mixed with things that should. I imagine that one way to start would be to only alert on those errors that are almost correct -- if it looks like garbage, ignore it, but if it's close, assume it should be right. Perhaps ignore prefixes / suffixes as well -- pSomething is fine, pSometihng isn't. Also, CamelCase ought to be easy enough to detect -- treat it as word boundaries, and spell check the individual words. Again, egregious misspellings probably aren't -- nextObjFoo is ok, even though Obj isn't a word -- it's so far from being a word that we assume the programmer meant it that way.
Similarly, there should probably be a set of words added that aren't "English" but are used often enough to be worth adding to the dictionary. Things like Obj, Int, and Ptr.
I think the reason such spell checkers don't exist already is fairly simple -- everyone just assumes they're impossible, and doesn't try. Couple that with the fact that a mediocre quality one would be so annoying as to be worse than useless, and you have a recipe for a program that won't get written. I don't think either of those would have to be the case if someone sufficiently clever decided to attack the problem, though.
You're aware of the concept that a bug is cheaper to fix the earlier you spot it? If it's flagged up as soon as it happens I have to rename that one variable in one place, and I can do it at virtually no cost. If it's flagged up after I've finished the work and committed it for review, then I'll need to change it across multiple files (sure, an IDE will do refactorings like that in most cases, but there can be side effects) and recommit. That's a far greater expense.
++ Say to Elrond "Hello.".
Elrond says "No.". Elrond gives you some lunch.
I remember from spellchecking some html documents a while back ago that aspell is at least aware of html. I do not know how well it works with other kinds of documents.
2) If it is that big of a problem, he's probably writing too many LoC. Refactor, reuse, learn patterns. Learn a better language.
3) Maybe he should document his API. It's pretty hard to get a spelling mistake through when your whole team has the whole API automatically documented in HTML on your intranet.
It's not about code reading like English, but more about it being easy for another coder to work with. One of my co-workers is a brilliant coder but poor at spelling. While customising a standard project component he had written I was thoroughly impressed by how well thought out and tightly coded everything was.
Unfortunately when I tried to compile my customised version, there were hundreds of errors, the majority of which were where I had spelled words correctly. Mostly I fixed these by changing the spellings in my code, but there were a couple of places where I'd somehow ended up with compilable code that was using the wrong variable, resulting in run-time crashes.
The difficulty is, any spell checker would have to realise that TcpRcv, TcpRcvr, probably even TcpRecvr are valid abbreviations, but TcpReciever is a spelling mistake. I think rather than looking at whole words, it would probably have to look for problematic sequences of letters, such as 'cie' which should almost always be 'cei'.
Spell checkers are fine but they make mistakes as well. The best thing I have found, and this goes for any project, software or printed word, is to have someone who is not connected to the project or better yet not even connected with the subject proofread what the public sees. They will often catch mistakes that jump off the page but people working on the project just don't notice. I have made some really stupid mistakes that I never saw but were on the cover of a book I was publishing. I am SO glad it was proofread before it went to press.
Attempting to tell programs the correct grammar or spelling does not always go well. While most will thank you for your input on catching their mistakes, others take it like you step on their babies head.
"Any douche who doesn't realise a misspelt function name will fail to compile clearly hasn't written any code yet."
;)
You clearly fail to see a programmer can also create their own function names, as well as use other peoples functions. So you prove you are a very inexperienced programmer, (and close minded), which adds weight to the idea you are either young or just arrogant. Also your very apparent need to show hostility, shows a degree of insecurity, where you are over compensating, by verbally hitting out at others, in an attempt to appear to be more knowledgeable than you really are.
The easiest way to become a better programmer, is to be more open minded. So far you have failed to demonstrate this.
As a side note, (back in the DOS days of programming), I found the the spell checker in Multiedit very useful (especially when having to work very late at night, after the coffee stopped working!
There are 10 kinds of people in the world... those who understand binary and those who don't.
Just looking at that declaration makes me want to claw my eyes out. Bad C++ is almost (but not quite) as bad as Perl.
What? I seriously thought "referer" was how people in the US spelled "referrer". You guys drop double letters from other words so it makes sense. :) I'm talking about "canceled" looking like a funny version of "cancelled".
--
no sig for you. come back one year.
Remember... your code will run faster if you remove some, but not all, vowels from your variable names.
To the original question: is strncpy misspelled? What about foo? sqrt? exp? Impl? Programese has an interesting linguistic history and its lexicon contains much not found in English.
While misspelled variable and function names are annoying, a refactor tool and a compile make them relatively painless. Perhaps the best approach would be to take your API documentation, run a script to split CamelCase and words_with_underscores, then feed that document to the spell checker. If it's not in your public API, it shouldn't matter how it's spelled.
Also, externalize your strings so that people with English writing training can write your field labels and error messages. Even programmers who spell check strings often misgrammarize them.
Ceci n'est pas une signature.
You're a programmer. You want a programming spell checker. YOU'RE a PROGRAMMER.
So write one, lazy bones.
Yep. Programmers should know how to spell correctly in their native language. But hey, all through school those technonerds where likely the same ones who never missed a chance to whine about how they hated their English (or whatever) classes and thought that learning grammar and spelling were a waste of time when they could be doing cool geek stuff. The rise 1337-speak and txtspeak hasn't helped.
At least in the real writing business there are editors trained and paid to catch these errors.
Being unable to spell correctly makes you look really stupid to most people.
Just FYI, if you have a decent programming environment, it should at least flag cases where you've mistyped an existing identifier. If there's an ImmediateFlag in your code, you'd get a warning if you typed ImediateFlag or ImmediateFalg or whatever. Not much help when the programmer is creating new identifiers, of course. Although I've seen cases where the programmer in question for whatever reason decided that because ImediateFlag was undefined then they would just define it, even though ImmediateFlag existed and was what they meant. That ought to get you fired in my book.
Hey by the way, pair programming is a great way to have continuous code reviews and a check on some of the more typical fumble-finger errors.