Programmer's Language-Aware Spell Checker?
Jerry Asher writes "Not all of my coworkers are careful about spelling errors. Sometimes this causes real embarrassment as spelling errors creep into software interfaces. Does anyone know of spell checkers for programming languages? I don't want a text spell checker, I want a programming-language-aware spell checker. A spell checker that I can pass all of my code through and will flag spelling errors in function names, variable names, and comments, but will ignore language keywords, language constructs and expressions, and various programming styles (camel code, or underscores, or...). I want a spell checker that knows that void *functionSigniture(char *myRoutine) contains one spelling error. Does anyone have such a thing for Java or C++? Are there any Eclipse plugins that do this?"
The version of Eclipse I run, Eclipse WTP 3.3, does spell checking on comments as standard. Not for variable, function names and the like though. It's a decent first attempt though. In truth, I turned it off within the first few hours. It underlines any mistakes in red which I find really annoying when scanning code as I keep thinking I've seen syntax errors. More often than not my eyes are drawn to a spelling mistake, which in many cases isn't even really a mistake, which distracts me from what I'm actually trying to look at.
You can add common function names, types, etc to the custom dictionary -- but all variables, etc in English will be checked as-you-type. Works in any OS X application (simplest example: TextEdit)
Visual Assist for Visual Studio does this.
Next silly question, please.
Some people call using it a "code review". If you are really serious about it, post the code to /. - plenty of people here seem to have time to point out any spelling errors.
Patent this idea!
the easy thing to do, is to just spell things correctly in the first place.
portfolio
WTF is this guy on? So now 90% of my code will have red squigglies under the variable names?
i=0;
So now the 'i' has a red underline?
That is the most nuts thing I ever heard. The spell checker would need to be sentient to figure out the rules.
Hey, I got a good idea - LEARN TO SPELL.
monk.e.boy
(all spelling mistakes in this post were intentional ;-P )
Open source, flash charts
.... that if you want your code to read like english, you consider a language like COBOL? Not that it would help you with spell checking, per se... but if one is going to be so pedantic about making sure that their procedure names can be found in an actual english dictionary why not go the whole 9 yards and write the whole program that way?
File under 'M' for 'Manic ranting'
And not too hard to implement - all you need is a lexer and a few functions to classify different naming styles. lexertl even comes ready with a full example for C++, so get to it ;)
But not, I think, significant enough to warrant a separate program or IDE plugin.
It would certainly be a "pleasant surprise" addition in a new Visual Studio / Eclipse release, but I wouldn't hold my breath.
Anything that may appear in a user Interface should be kept in dedicated files. Use a standard format such as CSV, XML...It may be reviewed by non-technical people with built-in spell checker software such as excel. This is a trick mainly use for multilanguages project, but it really helps.
I particularly like the spelling feature in new vim, right-click menu (:set mousemodel=popup) to select a corrected word or remember current word as correct. Perhaps writing a vim plugin as you explain could be possible? I'd be very glad to use it too ;)
#
#\ @ ? Colonize Mars
#
This isn't quite what you want because you have to select the text to be checked but its better than nothing !
Hope this helps
Art Makers Just an excuse to show photos of naked women !!
Vim 7 does this for c/c++ code. Just turn on spelling (something like :set spell) and it picks out spelling mistakes in comments and strings.
A small script to split up camelCase into seperate words, then feed the result through a normal spell checker. Then after that just whitelist certain words like maybe "m" as found in "mSomeVariable".
We've got code here that refers to 'insurrances', 'insurances', 'insurrences' and 'insurences', I'm not kidding.
People here making fun of his request and saying that this should be set in stone in design documents, or be checked in peer code reviews are obviously not working in a run-of-the-mill software company where there's neither the inclination nor the time to do everything the formal way. Also, I have to see the first design document that correctly enumerates all the requirements for the software, let alone all the names for the variables to be used.
---
"The chances of a demonic possession spreading are remote -- relax."
How about using a language tag like system( something similar to gettext http://www.gnu.org/software/gettext/ )? And spellchecking those files. Makes it easier to make an application support multiple languages as well.
If you want your software to be internationalizable, then you are going to need all your interface text in external text files anyway. Just spellcheck that. Your programmers really shouldn't be embeddeding any message text in the software itself, so you can just use grep to search for " marks :)
1. I'm a programmer
2. I want a program that does exactly X, Y, and Z
3. ????
4. Profit!
I am currently working on a java-based universal spell checker (the kind that can do a decent job without involving knowledge of that language). By language, I mean, English, Hindi etc.
I am amused by the idea of being able to extend that to programming languages.
The most significant problem that I am facing has nothing to do with coding the spell-checker. Its about getting a sizable dictionary of words (finding one, converting to UTF-8 etc.)
The trouble is that programming comes with a very different set of words that are not common in a normal dictionary
- acronyms (XML, JRAD etc.)
- weird keywords (foreach, esac etc.)
- regex
- portmanteau words like regex
etc.
I have a feeling that if some one can take the pain and build a dictionary of such things, it can be done. I am not much of a lexicographer. If you can find such a dictionary, get back to me, and I will see if I can get this to work for you
Why idiots? A great developer may not necessarily have a commanding grasp of English, especially if it's a second language for them. That doesn't make them stupid.
Other than that, I agree with everything else you wrote. Planning things out through proper software development techniques should isolate those issues in advance.
Okay, so it's only for Managed Assemblies (C#, VB.NET, J#, etc), but it does spell-checking, acronym-checking, and case-checking, which is nifty. Along with the other slew of introspection rules (some of which are a PITA to implement, even if it does increase the quality of the finished product).
The $$$ version of Visual Studio (the Team Suite version) comes with an introspection engine for VC++ though, it's not as flexible as FxCop but does the basics.
Then there's the countless "Spellchecker" plugins available for IDEs everywhere, VS, Eclipse, NetBeans, etc...
Compile your code and the compiler will find spelling errors in variables etc.
TextMate on OS X has spell checking functionality that is semi-useful, but it's not really "aggressive" enough, and there doesn't seem to be a way to make it such with prefs/configuration.
You can right-click on any "word" (variable name, subroutine name, whatever, just generally a whitespace-delimited group of characters) and it will check the spelling and present alternatives in the context menu. It also recognizes things like perl's sigils so correcting '$teh' turns into '$the', not 'the'.
It _won't_ automatically check spelling except in strings (so e.g. if I have '$teh = "This is a tset.";', 'tset' will be underlined, '$teh' won't). It doesn't include comments in its automatic checking either, which is probably the most annoying part about it.
Overall I typically just don't bother with it, but someone _has_ thought along these lines, at least.
What you're wanting is something that is very difficult for a computer but very easy for a human being. You want to be able to discern which parts of your text file constitute machine-readable instructions (which have to be spelt the way the machine expects them when it's running the compiler/interpreter) and which parts constitute human-readable messages (which have to be spelt the way a human would expect).
The real point is, you shouldn't have user interface messages hard-coded right there into your program at all! It will make it much harder to produce a foreign-language version three years down the line when your company expand and open their first offshore branch. Instead, you should abstract all messages out into a separate file of their own, which can then be spell-checked separately; refer to them only by means of meaningful constants within the code (e.g. if message 6 happens to be "number too small" then use something like NUMTOOSML for 6). This will make the task of internationalisation much simpler -- it can be handed off to anyone who speaks both the current and target languages, not necessarily a programmer.
If your program really is one that you will never need to internationalise, then your users probably will be able to deal with the odd mis-spelt message.
Je fume. Tu fumes. Nous fûmes!
For the record, 'I' is a word. Also plenty of spellcheckers will ignore one or two letter words.
The idea isn't anywhere near as nuts as you think it is, provided you make a habit of using meaningful variable/class names.
++ Say to Elrond "Hello.".
Elrond says "No.". Elrond gives you some lunch.
Such a spellchecker might be useful and might even be possible, but it would be much more valuable if QA caught spelling mistakes in the user interface (menu labels, error messages, etc). "An error has occured" is far to common.
I know I'm going to get flamed for this, but what is so difficult about learning to spell? Ignoring obvious developmental deficiencies, nothing is stopping you from going to the library and getting out a big fat vocab book designed for foreigners or child natives.
There is a great emphasis today on offloading skills to machinery. Sometimes this is a good thing; sometimes it's no better than crying out for a wheelchair because you have never bothered to learn to walk.
Why don't more software developers take a leaf out of Knuth, who "only proved it correct, not tried it" [grammar]? He made the effort to develop his mind such that he could be confident in his own ability to combine memorised results with logical deductions - the key skills of programming, spelling and pretty much any other ability you'll ever attempt to acquire.
Seriously, I think this is a terrific idea.
/. counts as R&R ;-) care to write one?
I have a real problem with coders who can't spell (even in code review, either Eyeball Mk1 is fallible or neither coder can spell). Functions called "markOrderRecieved" (for example) are just harder to find when I'm looking for some functionality that I know is there, and are a bump on the road when reading.
I'm primarily a java coder these days, so (1) there's no need to abbreviate for the language (2) All decent IDEs offer auto-complete, so there's no "but I hate typing long names" excuse.
An Eclipse plugin which checks all function and variable names for camel-case delimited real words would be a Good Idea.
That said, I've never heard of such a thing. Anyone not currently with a deadline (although reading
Justin.
You're only jealous cos the little penguins are talking to me.
... it's called a compiler.
Only three things are certain; death, taxes, and apocryphal quotations - Ben Franklin.
More like WTF are you on man?? If a compiler is able to work out what a variable is, what piece of code does what, which bit's of text are going to be displayed, then another spell checking program can be written to recognise this too!! It would be tricky, and there are many circumstances where it could be circumvented but why not still use it to prevent a possible spelling error, and the circumstances where it cannot tell what the word is, so what. Those circumstances you learn to spell but there's nothing wrong with another program to help prevent it!
This is a good idea, and one that can be implemented. Just because it's hard to do it right, and would need to be done seperately for different languages doesn't change the fact it would still be useful and help prevent errors.
Who need's speling and grammar?
I'm guessing that your "far to common" is a subtle easter egg?
[x] auto-moderate all posts by this user as insightful
Cos I don't know what a Space compnay is.
"XML is like violence. If it doesn't solve your problem, use more." - Anonymous Coward
I simply use my gcc's spellchecker : gcc foo.c -spell=en la.c: In function 'main': la.c:15: warning: ambigious string value 'hellp'. Did you mean: 'help' ? la.c:17: warning: ambigious variable name 'i'. Did you mean 'index' ?
OK I know this is a little off topic but why is it that Microsoft's spell checker guesses for what my English & German misspellings are, are far better than MacOS, Firefox on MacOS, and iSpell on OpenBSD?
Is this that hard of problem?
Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
I think you meant, "review their code"...
(Although clearly no automated spell checker would have caught that.)
I remember from spellchecking some html documents a while back ago that aspell is at least aware of html. I do not know how well it works with other kinds of documents.
The idea is nice and I think the problem is really prevalent. I have seen large portions of source code, much of it commercial, containing not one or two but hundreds of spelling mistakes. I also believe the problem must be more prevalent in closed source and in small businesses than open source and Free software. Another thing is that developers from countries with non-English languages often mix English with their first language in code, making it hard to maintain by other nationalities.
I expect my artists to have spelling mistakes. I expect my coders to know the semantics of a language that they are using. If they don't I question their ability on the semantics of the language the project is coded in.
The funny thing is, in my experience the most spelling errors (compared to e.g. poor grammar in comments) in code are made by those coders who speak English as their native language.
Get new coworkers... Preferably someone who's passed fifth grade.
I'm sure I've got a plugin at work that does this. It covers my code in yellow squiggly lines.
Wow, I should not post when knackered.
Well, I'm a total newbie in terms of compiler architectures and such, but throwing it out there for the purpose of discussion...
I assume a compiler will parse the source and in the process identify which tokens are key words and literals, and which are programmer-defined identifiers in the code. The spell checker would either use the same algorithm, or latch into that part of the algorithm to get at all of the identifiers. There are two possible word separators in typical code--either capital letters or underscors. (If you have something more bizarre, then I think it's a lost cause). So pass those identifiers through a filter that chops them up at each capital letter or underscore (with some exceptions, say, if the identifier is all caps). So, now you've got a pile of strings which are either oddball programming convention stuff, like "p" and "g" for pointers and globals, and things that should generally be words. The rules can include "toss out single character identifiers", "toss out everything up to first capital or underscore", etc. If you have coding guidelines that enforce variable naming conventions, this should get you most of the way.
Now you have English words that you can pass through your standard spelling engine, possibly with a dictionary tweaked for your field of endeavor to decrease false positves and escapes.
-- "This world is a comedy to those who think, a tragedy to those who feel."
The spelling error is that "functionSigniture" is not a word? Or are you suggesting that it should recognize mixed case and check each word individually? I just looked at some random code of mine, and it contains so many things that are perfectly legal that aren't English words, aside from language keywords.
For example, I have a function that provides a pretty-printed time called 'Gmtime'. The function that produces a GUID in BER form is called 'BERGUID'. In lots of places, "clear" is abbreviated as "clr". A few "Init*" functions are matched by corresponding "Deinit*" functions. An "operator new" assistant is called "OpNew".
C++ requires that a parameter passed to a function have a different name from the same class member. So it's not unusual to have two variables that do the same thing logically but must have different names. Deleting a letter from one variable name is not uncommon, like "first" and "frst".
A static version of a class member function that's used to hook into an API that doesn't support C++ natively often has an 'S' before it's name, so you may have 'ShutDown' and 'SShutDown'.
I believe that most other languages have similar issues.
You'd either have to have one crazy set of coding standards or you'd have to expect your programmers to pick the legitimate errors out of pages and pages of nonsense.
Wow, I'm glad there's such agreement between those who read the article and left comments, and those who just tagged it. :)
I tagged this article badlytagged
- string literals (not what the poster wanted, but this is what needs spelchekars the most)
- identifiers
The former can be done by a simple regexp, the latter... you can do a LALR parser, but why even bother? Just look for _any_ potential identifier; in most languages, that's [a-zA-Z_][a-zA-Z_0-9]+; and simply add the few keywords which are not English words to your dictionary. In fact, this would be nearly programming language agnostic.When it comes to StudlyCaps, anything identified as an identifier can be split _before_ any uppercase letter. This would produce a lot of single-letter tokens for ALL-CAPS #defines and the like, but as a nearby post said, you're going to ignore one-two letter tokens anyway. The usual conventions say XMLHttpRequest or XML_http_request so I wouldn't bother with XMLhttpRequest (and thus "lhttp").
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Gedit has a highlighting tool, that shows mistacs by not cuoloring them and or by giving them a reb backround (for insance is you foget a ; " the end of a line it will mark the nexed thing red,) and if you (due to fast typing misspell viod or float or another other comand it will not couldre the texed!!!!
Yes, this is a legitimate problem. I work on code that has spelling mistakes embedded into interfaces and it's very annoying. The fashionable use of StudlyCaps in programming (why? who decided that TextLikeThis is more readable than text_like_this?) makes the job a little harder but not impossible, as long as you follow the sane rule of making each word start with capital and continue lowercase, even if an acronym (so XmlParser not XMLParser or, God forbid, XMLparser - though of course XML_parser would be better than any of those).
/c; foreach (split) { print qq{$_\n} unless $seen{lc $_}++ }" source_file...
/usr/share/dict/words or in the private word list. Indeed, why not this:
/c; foreach (split) { print qq{$_\n} unless $seen{lc $_}++ }" >found_words /usr/share/dict/words >allowed_words
Enough rant. How about this:
perl -ne "s/([a-z])([A-Z])/$1 $2/g; tr/A-Za-z/
That will give a list of unique words in your source code (use find and xargs to scan the whole source tree). Then you can run that list of words through an ordinary spellchecker such as ispell. Unfortunately when you find a mistake you have to go back and grep for it to find where it occurs. You would also need a personal dictionary for things that are not English words but nonetheless appear in code.
I would probably keep the private word list containing things like 'foreach' and 'const' with the program source code, and have a makefile target 'make spellcheck' that runs a command like the above and then prints out all words found that are not in
find . -type f -name '*.c' | xargs perl -ne "s/([a-z])([A-Z])/$1 $2/g; tr/A-Za-z/
sort -u private_word_list
diff -u allowed_words found_words | grep -E '^[+][^+]'
The private word list can be kept under version control and checked in whenever you add a new non-English word like 'Frobule' to your source code.
Adding filenames and line numbers to the output is left as an exercise for the reader. You might also want to change the perl command to ignore words with length < 5.
-- Ed Avis ed@membled.com
His next project is to have a handy little helper with a RAM chip avatar. His name is chippy and he comes out with helpful phrases like:
"You appear to be creating an infinite loop. Would you like me to increment your counter variable?"
"You appear to be writing a virus, would you like a list of the latest Windows Vista sploits?"
which is totally what she said
"Ya" is clearly intentional and comes from a dialect, so that's ok.I'm not one either, so this means I get to shout at you, right?
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
For .net languages, FxCop does some of this checking, even understanding camel casing and underscores in tokens. And a bunch more, since it is a static code analysis tool.
http://www.gotdotnet.com/Team/FxCop/
- string literals (not what the poster wanted, but this is what needs spelchekars the most)
- identifiers
Ook... and- comments
but I hardly use these anywayThe creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Doesn't Visual Assist from Whole Tomato do this? I've used it in the past and I'm sure spelling mistakes (and a whole host of other things) were pointed out.
:-)
I'm not associated with Whole Tomato, but if anyone from WT sees this, can I have a free subscription
No sharp objects, I'm a programmer!
Nothing personal, but it's not actually a programmer's job to make sure everything is speelled correktly. This is part of the QA process before a product rolls out the door. Sure, you should do your best, but you need another pair of eyes (or several pairs of eyes) looking at the UI in addition to your own. You can easily miss the forest for the trees.
$nice = $webHosting + $domainNames + $sslCerts
I'm sure you aren't using .NET, but if you were, FxCop will check for spelling mistakes in code and comments and strings, along with 1M other coding issues (like malformed variable names, parameters).
No sharp objects, I'm a programmer!
I had your problem once because I was working with people whose first language was not english. I don't write US English either and I always left English spellings in by mistake.
I used aspell and went through huge parts of the source, telling it what wasn't misspelled. It was an incredible pain in the neck because it got confused over all the variable names, bits of C syntax etc etc.
Once I had a dictionary, though, I could recheck the source periodically and although there were a lot of false warnings, we still caught a lot of problems that would have gone into the production release.
As you can work out, I didn't restrict the test to strings - this is because misspelled variable names can cause bugs too so I checked for them as well.
Cheers,
Tim
This is all just my personal opinion.
I made one and use it for my open source Java database. It is very simple so far, based on a word list. Supports camel case and so on. It is here: H2 Database Engine, src/tools/org/h2/tools/doc/SpellChecker.java. Or here: SpellChecker.java. It can also check XML, HTML, JSP,... Words shorter than 2 or less characters are ignored. If you want to spin off you own project go ahead, I can help you.
I have included it in the build script: Whenever you write more than a few lines of new code (or documentation) the spell checker will bark because it doesn't know the word. Maybe I should add an automated 'word list expander' that checks unknown words on the internet... Anyway, the hard part will be to convince your coworkers to use it.
Wow, first good question I've heard asked on Slashdot in a long time.
Very interesting idea.
Don't ever put interface text messages in your code! use a separate messages file. Not only that it helps a lot with internationalization, but it makes really easy to spot and correct spelling errors. Also useful for logging.
-- EOF
I had a problem similar to this when I had to work with an offshore development team a few years ago. The solution we used (as did other people responding to this post) was to ensure that all the UI visible strings were moved to a resource file and we spell checked that instead of the source code. It allowed us to port to other languages quickly and cheaply too.
"Similarly, there should probably be a set of words added that aren't "English" but are used often enough to be worth adding to the dictionary. Things like Obj, Int, and Ptr."
Or they are "English", such as a function that flags "setColour" as incorrect because it is a US English dictionary and British spelling.
This is a non-trivial problem to do right. The spell checker has to be not only familiar with CamelCase, word fragments that might be added (like the Obj, Int, Ptr or various prefixes), and the programming language syntax, but it would also need to be familiar with the native spoken language.
One strategy might be to strip out all the programming syntax fluff (something like ctags) and then run a spell checker on that with a custom dictionary and a script to split up such things as CamelCase. You'd have to do the same for comments (which ctags normally ignores).
In any case, with ctags, something like aspell, and a bit of custom scripting and dictionary fiddling, it looks tricky but doable as a batch process. Doing it interactively in the editor would be slightly trickier, but if your editor can invoke programs, not hard.
Does that mean that we need an automated grammar checker for our code too?
Cheers, Chris
After you've tried to use functionSignature and study the compiler output, the spelling error will immediately become clear.
I don't understand the problem. Most often you need variables, objects etc. more than once.
Well, yeah, unless you misspell it the same way every time.
Finding other idiots on
What? Since when in what language will a function fail to compile if you have spelled it incorrectly? As a matter of fact, it will compile and most editors will pick up the spelling error as a legit function to be called from elsewhere in the code and will add it to list of autocompletable functions which will spread the error!
www.aleo.no
For the record, 'I' is a word.
Not exactly. I is a word. Lower case I should get a red line still.
It's not like comments matter much. If someone makes a few errors in a comment, it can still be read, if someone makes an error in the code, well...they cause an error.
I just read Slashdot for the articles.
"Nothing personal, but it's not actually a programmer's job to make sure everything is speelled correktly. "
Apparently not. Good thing it's not a programmers job to do math right either.
"This is part of the QA process before a product rolls out the door."
Programmers are good at buckpassing, much like managers.
"Sure, you should do your best, but you need another pair of eyes (or several pairs of eyes) looking at the UI in addition to your own."
Yes, but are they really "doing their best"? One has to wonder sometimes when reading this forum.
"You can easily miss the forest for the trees."
Now when someone on slashdot complains about book publishers and the cost of books, remember why editors exist. Someone else "doing your job" is going to add to the bill.
True, identifier names containing spelling errors can be a real annoyance, but I somehow doubt you'll ever find a usable solution, at least not as long as you'll need to interface to code beyond your control. What spell checker wouldn't choke on regular C++? Just picking a random declaration from MSDN (feel free to choose any other API, it won't change anything):
HRESULT MFGetService(
IUnknown* punkObject,
REFGUID guidService,
REFIID riid,
LPVOID* ppvObject
);
You'll probably just end up spending all your day removing false positives.
Is it ironic that you don't know the meaning of the word "semantics"? Why yes. Yes, I do believe it is.
visual assist by http://www.wholetomato.com/ for visual studio
I thought slight mispellings of words in code was a form of job security?
class blah
{
private var wokrDay:Date;
public function testDate(workDay) : Boolean
{
return wokrDay.isValid();
}
}
Or at least a way to punish maintenance developers.
The newest version of MyEclipse 6.0 has a spell checker. Now, if that's what you want, you can have it pretty cheaply. Personally, I think it's a question of education in general. If your co-workers are so poorly educated that they can't spell, then a spell checker is only going to solve the surface problem. Usually, bad spelling goes along with bad grammar and bad writing--which equates to bad thinking and logic. There are no absolute requirements for a bad speller to be an idiot, but I regret to say the two are correlated. I work in a large corporation and it is daily that I get some illiterate email from a co-worker that informs me that my collegue is really not that bright. So, with all respect, your peer's problem is larger than a spell checker could solve.
If you're an emacs users, just turn on the flyspell minor mode.
I write code.
It's actually impossible for the computer to know whether you're creating an infinite loop.
The closest thing I've ever seen in a IDE to a spellchecker are the intellisense functions that essentially allow you to type the first few letters of a namespace, class, member, or function and have it type out the rest of the words to you. I know this isn't exactly what you are looking for, but if spelling is something you want to work on, its time to break out the old noodle.
I'd like to reiterate that Textmate can do exactly what the OP is asking for. You can set spellchecking on or off for any scope in a language definition. What the parent post is describing is just the out-of-the-box setup. You need to jump in to the "bundle editor" and add some preferences. You could, for example, turn spellchecking on for comments, literal strings, function definitions and function calls, but off for variables and function arguments.
1. post to /.
2. use title "Sony rootkit source code 1/200"
3. read grammar nazi comments
4. profit!
Yeah, not to nitpick but, you see; 'i', being a variable-name, would be a properly camel-cased 'I' from the point of view of the spellchecker.
Religion is what happens when nature strikes and groupthink goes wrong.
Man Dies Waiting for Eclipse to Launch
A software engineer in San Jose, CA was found dead at his desk yesterday, apparently having died while waiting for his Java editing program, Eclipse, to finish its boot process. Coworkers say the engineer came in that morning vowing to "get Eclipse working on his box or die trying." The last thing anyone heard him say aloud was the cryptic comment: "I see the splash screen is appropriately blue." Nobody knows what he meant. The man was then thought to have fallen asleep, but hours later it was discovered that the engineer had died suddenly of apparent natural causes. The forensics team's investigation that evening was reportedly interrupted unexpectedly when the dead man's Eclipse program suddenly finished launching. The team tried to interact with it to see if they could find clues about the man's death, but the program was unresponsive and the machine ultimately had to be rebooted. At this time, the police commissioner says there is no evidence of foul play, and they currently believe the man simply died of either boredom or frustration.
Ben Hocking
Need a professional organizer?
Yes, it's called a "compiler"
There are some simple things that it could do as 'warnings' though, checking if the test variable is being referenced in the loop, or if it's a global variable it could check if it's modified in any functions being called etc.. you could have a poorly constructed loop that will only repeat infinitely in weird conditions, but the computer won't know that that isn't intentional of course.. and in certain programs you want 'infinite 'loops anyway, or loops that will run until you kill the app..
which is totally what she said
Sure, it's the halting problem. We all know that. But there are several common cases where you can deduce that there is an infinite loop in the code. It won't catch all infinite loops, but that doesn't make it useless.
(Suns Java seems to be good at detecting some of those by default when it complains about unreachable return statement)
Spell checkers are fine but they make mistakes as well. The best thing I have found, and this goes for any project, software or printed word, is to have someone who is not connected to the project or better yet not even connected with the subject proofread what the public sees. They will often catch mistakes that jump off the page but people working on the project just don't notice. I have made some really stupid mistakes that I never saw but were on the cover of a book I was publishing. I am SO glad it was proofread before it went to press.
Attempting to tell programs the correct grammar or spelling does not always go well. While most will thank you for your input on catching their mistakes, others take it like you step on their babies head.
(I keeed, I keeed)
Just then the floating disembodied head of Colonel Sanders started yelling Everything You Know Is Wrong!-Weird Al
You can find lists of words in various languages here:
ftp://ftp.ox.ac.uk/pub/wordlists/
I don't know anything about the quality or copyright status of this.
Just add all of your "Key Words" into the Spell Checker's locally recognized list.
e. g. Add strlen, memset, alloc and you other favorite commands to the Spell Checker's list of recognized words.
You can also add any of your Variable Names (whether or not they are spelled correctly).
KDeleveop uses kwrite as an internal editor, which offers autocompletion of words-- so if you have a function, MyFunc, defined it will autocomplete it after you type Myf. This cuts down considerably on fat fingering function/object/variable names.
What GP meant is that if you misspell a function call (Not a function definition) You will get a compiler error stating you called an undefined function. Technically speaking, there is no way to misspell a function definition. That is, You are creating the function, any name will be OK. You can make grammatical or syntactical mistakes anyway.
The Problem here is the assumption that a function name is english/spanish/whatever. W R O N G. When you code, you are writing in a language, a programming language. If a function name is a reference to a word from another language (For example, MyCounter is a reference to the English word "counter") That is Intertextuality, and a spellchecker is not supposed to understand it.
Taking this into account, there is an spell checker for programming languages, It's called a Compiler.
WTF am I doing replying to an AC at 5 A.M on a Friday night?
Ah, but will the compiler fix the grammar in your comments?
Cheers, Chris
"Any douche who doesn't realise a misspelt function name will fail to compile clearly hasn't written any code yet."
;)
You clearly fail to see a programmer can also create their own function names, as well as use other peoples functions. So you prove you are a very inexperienced programmer, (and close minded), which adds weight to the idea you are either young or just arrogant. Also your very apparent need to show hostility, shows a degree of insecurity, where you are over compensating, by verbally hitting out at others, in an attempt to appear to be more knowledgeable than you really are.
The easiest way to become a better programmer, is to be more open minded. So far you have failed to demonstrate this.
As a side note, (back in the DOS days of programming), I found the the spell checker in Multiedit very useful (especially when having to work very late at night, after the coffee stopped working!
There are 10 kinds of people in the world... those who understand binary and those who don't.
Its actually simple , code a lexer and the feed all the variable names to a dictionary.You can also use standard lexers like Flex.
how about hacking the linker map file to generate a list of function/variable names? ie, "ld -M". then run the resulting word list through a standard spell checker. the thing is, all you really need is a way ti generate a list of names...
I see, the problem is that function and variablenames often aren't normal words. I can't think of a way to solve the issue 100% but I can suggest an approach that may cover a large and important part.
- first you'd want to have coding standards with naming standards. From this it should be possible to extract a set of acceptable abbreviations, nameparts and pre & postfixes.
- then look at the problem domain. Ideally there's a dictionary already of domain related verbs and nouns with their definitions, if only to improve communication between designers and programmers. Additionally you probably have a sort of general system wide design with items like classes, relations, files, devices, etc needed by several projects. These can be in plain language and therefore subjected to an ordinary spellchecker.
- both then lead to a project dependent collection of words. A context sensitive (able to recognize functionnames etc) word chopper breaks your variablenames in parts and compares them to that collection like a spellchecker. It's not necessary to break according to capitalization or underscores but it would make things a little easier. I think it's do-able up 'till here but it gets tricky when you want to leave language words out. Keywords aren't the problem of course, difficulties will arise with libraries which don't adhere to your standards. Perhaps it's possible to identify names and add them to an exceptions list.
- it'd be great when this works at least to the point where the main coding parts (APIs, classes, public methods/functions, etc.) are correct because too much detail will get too expensive (to develop, run and use) and your coders won't like it. If the 'outside' is correct then only a maintainer will see misspellings in the code and it'll be so local that it can be corrected if desired.
"I'm not much interested in interoperability. I want substitutability. I want to be able to throw your software out."
In 2007, providing a development tool that does not auto-correct and point out misspellings, syntax errors, etc, is like providing a car without a windshield because the first cars didn't use them, and technically the driver doesn't need it. How many years - what is it, now, well over a decade? - that Visual Studio has had "Intellisense", that does exactly what the poster describes. I just don't understand this anti-MS holier-than-thou attitude when non-MS developers ask questions like these.
EditPadPro (www.jgsoft.com) is not free, in either sense, but it's very cheap for what it does. I have turned most of the development teams at my last three jobs onto it. One of its key features is configurable, user-extensible syntax highlighting. The highlighter includes the option to exclude matched language tokens from spellchecking. In the built-in highlighting schemes, for example, it will usually spellcheck inside comments but not much else, but as mentioned, you can easily take their color scheme and change it to suit.
--K
Why don't they just use an IDE? Visual Studio autocompletes variable and function names, and of course in most cases a misspelling in a string literal wouldn't cause an error.
How wonderfully ironic :)
Still, the point stands; if your developers can't form a coherent sentence using well-spelled function names I'd fear for their code in the first place. It only takes a couple of typos to make code readability drop through the floor. You don't want automated tools you want to hire developers who can write.
But, the quality of education seems to have declined in recent years. I remember writing stuff for English class at school and you'd get your work scribbled in red ink for making spelling mistakes, all the time. I've looked at my brother's marked English homework (he's 15) and even the glaring mistakes are missed. Having to type everything rather than hand write it seems to be the source.
People need to be able to write, and not just trust a spell checker. But then again, this ALSO falls down when you don't have native English speakers on staff.
I've got a couple of projects on the radar right now where tiny spelling mistakes are in production code - API definitions, symbols that are exported - that just appear in every version. If someone had been reviewing and had an eye for it they would be fixed. What doesn't help is none of the guys on the projects are English or American besides me..
Eclipse sucks ass because I can't split the display window - something I could do with Emacs say... 20 years ago...
If you mean you've made the same typo everywhere, either it wasn't noticeable enough to matter or you can just do a global find/replace on it.
They're not so much idiots, but I would find it INCREDIBLY difficult to name a function "functionSigniture". It's just WRONG.
There is absolutely no reason for spelling it that way, even if you're Hooked On Phonics, it isn't pronounced that way in any language approaching English.
If this is an exported API function then it would cause a huge problem. Now, consider that a guy who is writing a function which calculates some kind of signature (perhaps a hash or a certification routine) cannot even spell the word which describes what he is doing. Does that give you confidence that the signature function is correct?
I use the Eclipse spell check but with a custom dictionary. I just add my idiosyncracies into the dictionary. THat's on Ecliopse 3.2.2 You need to download the dictionary I think, at least I did cos I wanted UK spelling... (am with Henry Higgins on english use in the US)
Technically is a lint.
-- Patent no.123456: A way to personalize
Spell checking variable names isn't exactly what IntelliSense does, and Eclipse is actually better than vanilla VS.Net at producing red squigglies under your code (I'm told that VS does it for Visual Basic, but for C# you need something like ReSharper).
I don't see this as a "MS bad" kind of thing, rather just a really low-impact "problem" that would be less than trivial to fix.
Why not simply create your own custom.dic file and use it with a text editor. Although my work is not with programming languages, I've found that having a 'legal.dic' containing legal terminology and 'audit.dic' containing auditor terminology has been invaluable with my work. I can't imagine not being able to do the same for program code, since much of the codes are predefined in a lexicon and easily transferred to a dictionary file.
Yeah, but article poster stated that he wants a spell checker that fill note that thers an error in "void functionSigniture(some)" which dos makes kinda sense in a shared project where others will have to work with an existing error and fixing it can be tedious. My guess is that using aspell and just divide function names on capital letters and _ it should be little problem to implement. It will however demand some structure to your function names, so no calculateWhereYuoarerightNow(), but that should usually be mildly inconvenient. The compiler however is great at interpreting a programing language, but dont give a rats arse if the function names are readable and correctly spelled....
www.aleo.no
I wrote i source code parser in PERL that did exactly this as just one of it's many functions....
Clean code, pretty structure, correct English spelling, consistant naming conventions....
PERL is every man's friend.
I'm not sure spell-checking can really be made to work because, by definition spell-checkers flag anything that is not in the allowed list (also called dictionary) as an error. But source code always contains tons of identifiers that are not real words, like pid, ret, req, riid, etc. The problem is that there are hundreds if not thousands of them in a large project and that you get a ton of new ones making the maintenance of a custom directory a pain.
But I've been annoyed by spelling errors too and what I noticed is that the same errors come over and over again. So what I did is write a script that specifically checks for common typos. And I've very imaginatively called it 'typos'.
What's great with this approach is that, no matter whether you're writing a C, Perl, PHP or HTML file, 'seperate' is never going to be a real word. So we can identify these with no cumbersome custom dictionary, and a very very low false positive rate.
Typos is open-source (GPL) and has no dependency that I know of (besides perl). So you can try it out just by downloading it, making the script executable, and running it with no argument on your source:
I'm working on an Eclipse plug-in that aims to go beyond spell-checking (although it will implicitly do that too), into verifying that the name you choose for your method fits the implementation. This is possible to do since you can extract the approximate semantics of method names from a large set of implementations -- in short, since most get methods tend to do roughly the same stuff, you can capture the essence of what "get" means. A nifty feature that I'd like to work in as well is the ability to automatically generate a reasonable name when performing an "extract method" refactoring. See my papers for details.
In principle, I'm with you on notion of spelling. I think that proper diction is essential to properly communicating and how people perceive you. Let's face it - a well worded reply, for example, is likely to be viewed more favourably than one that is littered with mistakes even if the actual message is the same. Similarly, properly worded code is going to inspire more confidence than one that looks like it was written by children.
But I can foresee situations in which the developers do make genuine mistakes. "Signiture" is probably a poor example because it relates to a very specific task if you're talking something like encryption. But with other less critical tasks, these can bubble up. Ultimately, developing is coming up with solutions to known problems - spelling has little to do with that core exercise. Of course, there's the issue that the commenting may likely follow suit, but that's another story.
I think that truly superb developers should have both qualities but ideal characteristics are rarely easily found. That's why it's necessary for people with different proficiencies are required to spot such mistakes early on. Like the original parent said, a system design document (SDD) would easily nail this from the get-go, especially if multiple people are collaborating on a project. After all, a software development project really shouldn't have only one person involved.
Apparently you haven't lived in the south much. I know plenty of redneck hicks that would pronounce it sig-ni-chure and half of them would probably spell it that way.......how many of them are smart enough to be programmers is up to debate, but based on some of the contractors that have been sent to us that we wind up rejecting, I wouldn't be surprised if it was twice what should be......
Layne
This leaves the problems of ubiquitous abbreviations in code, e.g. QuatMult for quaternion multiplication, and non-English function names in pre-existing libraries over which we have no control. These problems can be solved by counting the occurrences of candidate errors and seeing if the count exceeds some threshold. If "quat" occurs 100 times in your code then it's a safe bet that it's a valid abbreviation and/or part of some widely used library. In that case we could consider automatically adding it to the dictionary. It's only likely to be a misspelling of quit if it occurs just three or four times (I'm assuming that the spellchecking operation is performed frequently enough to catch all such errors "in the bud" before they propagate wildly. If not then it's likely to be a case of StabulDoorHorseBolted.)
If widely-used character sequences are automatically added to the dictionary, we could rely on this same process to add the keywords of the language to the spellchecker dictionary automatically, saving some manual effort. It would then be easy to add the few remaining false positives (rarely used keywords) to the dictionary by hand. Of course there's probably some code somewhere that does all this already.
Only check words greater than 6-7 letters long. Find all dictionary words that are the same length +/- 1 and start with the same letter (nobody gets that wrong). From those, find all words that have almost all the same letters in the same place. (Search from both ends, and if you've covered 80% of the word by the time both searches find a difference, it's a hit.)
Flag if the difference is:
A single vowel replaced by another incorrect one - signiture, independant, definate, seperate
Repeated consonants where there should be only one, or vice versa. - bussiness, occurence
I bet that would catch 95% of these sorts of misspellings with very few false positives.
It's really awful to have to write a very diplomatically phrased email to a team leader to explain that one of his coders has created an API in which they have consistently and unfailingly used the word "Recieve", and that some day that API will probably be part of what we expose externally, and that you'd really appreciate it being fixed before people actually start using it. But it's even worse when the API is specced in a Word doc and the misspeelings are in there, too.
Worst of all, though, is trying to use the damn API while your brain is distracting you with fits of "I before E except after C!"
None the less, the shame I felt in raising the issue at all was matched only by my disappointment that no one else had caught it already.
I'm surprised that in all your arrogance you didn't bother to spell check your own post. `misspelt` is not a word. The correct spelling is 'misspelled'.
Being a good programmer requires, DEMANDS, a meticulous and thorough approach, as well as dedication to doing things right. If you have those qualities, you don't make spelling errors to begin with. If you don't have them, or would rather use a crutch (spell checker) instead of developing them, get the fuck out of my profession and go flip burgers or something. I'm serious.
The Farewell Tour II
You have one again confirmed Hartman's Law (or Skitt's, depending on preference; see http://en.wikipedia.org/wiki/Hartman's_law).
"Misspelt" is a legitimate spelling in British English. It's in the OED, with examples from 1762 to 1990.
Since I have just corrected you, I assume I have made an error somewhere in this post, though I haven't managed to find it.
.sig withheld by request
Sounds sort of like what goes on in my code editor of choice.
A long time ago i created a spell check task for ant to do just this.
http://code.google.com/p/antspell/
It looks like it has been forked http://code.google.com/p/bspell/ seams to be in active development.
It's in the third word. You missed a letter.
For internal use code, everyone should know what "Whch option are you having the most of?" means, anyway. Heh heh heh.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
For a big project that has time budgeted for this sort of thing, you could adapt any number of spell checking text editors which accept custom dictionaries that can have words added on the fly. The only requirement would be that it could read a plain text file without needing to convert it to some other format first. On OS X I use BBEdit for this type of thing.
In general the idea is to create a custom dictionary with your known set of function names, variables, etc. and have your QC team add new ones as they are doing the check each day. It would probably help to start with a library pre-populated with words of your chosen programming language of course.
A fool throws a stone into a well and a thousand sages can not remove it.
Actually, I'm pretty sure that LaTeX gets its unusual capitalization from TeX, which is capitalized that way based off its logo emphasizing its typesetting abilities. Of course, there are also quite a few derivatives: ConTeXt, TeX-XeT, MiKTeX, TeXeT, BibTeX, and others. And, lest you think you can screen for the existence of "TeX", there's also LyX. Still, dumping them into a user dictionary is a relatively painless way of dealing with them.
Ben Hocking
Need a professional organizer?
$ man creat
Citizens Against Plate Tectonics
I had a old professor who is a German, he got his Ph. D. in US. And when he was a graduate student, he wrote a Fortran program once, but it didn't compile. He read through the code several times and could not find any problem. Then he called a TA from sort of CS department at that time. The TA couldn't figure out the problem either. After several hours, they got tired and gave up. At that night, my professor suddenly realized on his bed, at all the places he should write "Function", he wrote "Funktion", which was German. So even the program doesn't compile, we can't really tell what's the problem.
Um... Don't you do code reviews before code becomes mainstream or gets released?
Hey now, no need to be like that.
They can come be a SysAdmin, where we value practicality over chisling code on a card punch because "That is the way REAL programmers do it."
Never answer an anonymous letter. - Yogi Berra
A spell-checker either sees the word as being in its dictionary or not, but doesn't know in what contexts it is valid. It doesn't know that a possessive pronoun doesn't have an apostrophe in it, but a contraction or possessive noun does; that there are pairs and triplets of homophones; and other ways in which words can be used incorrectly, yet still be valid spellings in other contexts.
And don't forget 'referer', 'umount', and similar misspelled words that are correct when dealing with computers.[100% ISO 646 Compliant]
SVM, ERGO MONSTRO.
All right, I don't have any answers for you, and I'm not even a programmer (outside of QBasic games), but I thought I'd share this idea with how a programming spell-checker could work. It seems too simple so there's probably flaws with it, or someone would have done it by now, but anyway:
First, as normal, a syntax checker looks for issues when you enter a new line. If it sees what should be a variable* name, a little icon appears next to it which means "new". The programmer seems the icon and is satisfied as this is, indeed, the first time they used the variable.
Also at this point, the computer adds the variable to a variable list
Ok, so on subsequent uses of the variable, the computer looks it up on the list, sees it there, and so doesn't display the "new" icon.
And of course, if you see the new icon when you've used the variable already, then you either made a spelling mistake now, or you did when you first made the variable. Either way, it's brought to your attention (assuming you remember using the variable before). If you clear the line with the "new" icon, it's removed from your variable list automatically.
*Variable is used in this example, there can be other lists for other stuff such as sub-routine, function, constant, etc.
"When the atomic bomb goes off there's devastation...but when the atomic bong goes off there's celebraaaaation!"
No, that was not "redundant"... Come on mods, do it right, or don't bother.
The problem is that very autocompletion. That means that you are getting used to long function names where you only skim over the exact letters, so you can in fact get a spelling error in a public class, use it and release it only to actually find the error and fix it too late. And then you can't fix it, because you're breaking your interface!
I was doing work at NASA. NASA was still into punch cards years after very powerful text editors came into existence. I remember the day my girl friend offered to key punch the PDP-11 code I had written onto coding pad to cards. "Honey, you sure can't spell very good. Good thing I caught it. Move is spelled with an 'e'." :-(
Stop offshoring your code and you'll get much better spelling. This is from experience.
I agree that it seems like you'd spend most of your time removing false positives. I'm not totally adverse to the idea, but like a lot of people posting I just don't see how it could be done effectively - maybe that means I'm getting old...
Programming is different than languages that we use to speak - vocabulary and style are nearly unbounded, abbreviations are common-place, etc. You can make up the words as you go along...
Is "refererr" a spelling mistake? Maybe it refers to some error? Maybe they were trying for "referrer"?
What about "Srvc"? "Servc"? "Servce"?
What about "funcy" ? Is it a function pointer or the word "fancy" or the word "funky" ? Does "funCy" make a difference?
To get this anywhere near effective, it seems like you'd have to impose some restrictions on style and variable naming, and yes - I consider that restriction a bad thing.
Something Witty Goes Here
Wow, 240 comments about spelling and programming and no-one's mentioned the famous Ken Thompson quote:
"If I had to do it over again? Hmm... I guess I'd spell 'creat' with an 'e'."
All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)
Still a great IDE after all these years...
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Have a look at Krazy, the EBN's code quality checker if you want more info. It detects common spelling errors and suggests the appropriate spelling (US English); it's done via a Perl script IIRC with different modules for the various check types.
Not really, this is pointless. It may be nice to fix comments, but the very code itself will be rife with specific jargon that no spell-checked will appropriately handle.
Anyone who's worked with a POS system knows the definition of PO. The spell-checker won't pick up on this, but everybody (users, analysts, programmers, testers, support staff) will refer to this object as a PO. So then you'll also need to make the dictionary jargon-aware. Don't know if you've ever been on a government or large-business project, but this is an issue in and of itself. It's typical for big companies to actually have an on-line list (wiki-style) of the "commonly-used" acronyms in the company, and they're not all 3 letters.
I mean, is this function name incorrect? TransferUrisPOToJiruSofToPrint. That means something to people I worked with, but means absolutely nothing to the spell-checker. So even if you or someone else has "the solution", this is definitely not some type "hey, just add spell checker" problem, you're also pumping in 4 different acronyms just to make this function pass.
works for me. Does not cover identifiers though
Hardly any C library functions are real words. Plus, matching the actual name of a function is more important than being "spelled right", so far from checking actual code, it could essentially ONLY check function prototypes and variable declarations, and tell you to refactor-rename them.
We've secretly replaced Slashdot with new Folgers Crystals - let's see if it notices.
I have a spell checker that's extensable. It's in AppleWorks. I put in all the commands and some commonly used variables and arguments like D$ = CHR$(4) from AppleSoft BASIC. Works great. It'd do assembly too if I put in the 6502/65816 op codes.
As for "I don't want a text spell checker, I want a programming-language-aware spell checker", put down the bong and get away from the keyboard for a while. All spell checkers check text no matter what the content, as long as it's made aware of the text to be tested (ie. can be extended via typed additions, linked text files containing the terms, or extended by asking if terms from proven programs that are marked wrong really are and asks if you want to add it to the dictionary). If you're doing your editing in an unextensable closed and proprietary editing routine built into a programming software package rather than linking to an external editor, you're hosed; stop it.
"I may be synthetic, but I'm not stupid." -- Bishop 341-B
With enough flags?
mark
no, clearly he meant you need to keep all your _identifiers_ in external files too, by "interface" he means API
We've secretly replaced Slashdot with new Folgers Crystals - let's see if it notices.
It's not free but IDEA by IntelliJ can help solve your troubles (if you're using Java). It supports "rename functionality" with intelligent search and replace in Java. It will replace all uses of the code and even rename getters/setters if you're changing an internal variable. So while we can't stop the spelling errors when we make them, we can easily remove them.
There's also a spellchecker plugin for IDEA but it only checks String literals and comments.
-Peter
1. The stop-problem is undecidable only on a device with infinite RAM, if you put an upper bound on the RAM, you get a decidable problem (in theory only).
2. There are some practical ways to construct proofs that a loop ends (remember the CS lectures). Sure, it's not a perfect solution, but if you can't construct a proof that the loop ends, you'd better rethink the loop, and possibly rewrite it.
You're a programmer. You want a programming spell checker. YOU'RE a PROGRAMMER.
So write one, lazy bones.
So far the best spell checker for Eclipse I have found is eSpell, http://www.bdaum.de/eclipse/
eSpell can do C++ and java, I use it for C++. For some reason the main download page above does not list C++, but http://www.bdaum.de/eclipse/eSpell3/index.html does. It does work pretty well, though on large source files I have had it make eclipse a bit sluggish.
here's a paper i wrote for sigplan notices in 2004 talking about the options, and how it works in practice: http://www.jessies.org/~enh/publications/checking- code-spelling.pdf
plus the editor i wrote
http://software.jessies.org/evergreen/
has this functionality.
'I' is a word, but 'i' isn't. And most spell checkers will catch that error.
Someone (TBL) was maybe enjoying a bit of the refer when codifying the HTTP specification...
Errors in programming (and technical documentation) become standards because the act of programming is creation and invention. Spell it "umount" and forever the act of "unmounting" is done via umount. No one bats an eye. When you define a variable everyone else must follow your lead.
If you say, "Hey, one who refers is not "REFER" but "REFERRER" your code will not work. HTTP_REFERER will forever be spelled that way until HTTP/1.0 and HTTP/1.1 retire. However, referring to the meaning of that key must be spelled correctly to be proper English. E.g.: "Check the referrer of the requested link with the optional HTTP_REFERER variable."
Don't want spelling mistakes in your Hungarian variable names? Be the first to create them.
-- @rjamestaylor on Ello
You might enjoy FindBugs. The project also offers an Eclipse plugin.
Why bother.
Its impossible for a computer program to be constructed which can do so for all cases (hence, the halting problem), but that doesn't mean that its impossible to detect some infinite loops, or to detect constructs which are particularly likely to be infinite loops, either of which could, in theory, be useful features in an IDE.
Spelling/grammar checkers for human language aren't flawless, either, but they still have utility. The fact that perfection in a task is impractical or even provably impossible doesn't rule out useful applications.
Shouldn't this be from the HandsOffMyCamelCase department?
To
I need one that says...
"You appear to be creating an infinite loop. Would you like me to change if ($b=$a) to if ($b==$a)?"
(I did that the other day.)
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
someone should start the open source project. Maybe the checker could compare words within the document to see if a word is similar to other common code. it could follow the standard for code practice. Shot in the dark from a newb
I was wondering when someone would catch it!
Context is a huge problem when dealing with natural language. The great thing about spellchecking a program, however, is that context is made perfectly clear by the strict syntax. Otherwise, finding a reliable compiler would replace spellchecking as your top concern.
In Visual Studio (C#) you get:
"Assignment in conditional expression is always constant; did you mean to use == instead of = ?"
Kaetemi
If you are too damn lazy or too stupid to type your language properly, then you shouldn't be a programmer. Become an insurance adjuster or something less demanding.
I don't think I'd like to hire someone who can't spell. It shows volumes about you.
Intelligence starts with a keen understanding and application of your language.
if you simply must have it, editplus has syntax highlighting and offers spellchecking dictionaries.
They're using their grammar skills there.
I would rather have a spell checker that checks contents of strings in my source code, but ignores words that do not have suggestions in the spell checker. The latter would make sure I do not get annoying notifications for non-specified acronyms or words that are not meant to be spelled correctly.
... seems to do this well for a good price. http://www.editpadpro.com/index.html
In PHP at least there is a function: <a href="http://www.php.net/manual/en/function.token- get-all.php">token_get_all</a>. This will return an array of tokens contained within a string, which you can then loop through and do magick upon.<br /> />
Please note...this is a very naive and hacky example<br
Like so:
function getFunctionNames($source_file){
$source = file_get_contents($source_file);
$tokens = token_get_all($source);
$function_started = FALSE;
foreach($tokens as $token) {
if(is_array($token)){
if(token_name($token[0])==="T_FUNCTION"){
$function_started = TRUE;
}
if(token_name($token[0])==="T_STRING" &&
$function_started == TRUE){
$function_name = $token[1];
echo $function_name,"\n";
$function_started = FALSE;
}
}
}
}
The general idea is fine, and I agree that misspellings are a problem.
...
...
Now I spell words the Australian way (mostly like the Brits): honour, colour, centre, kilometre, etc.
Also, I would naturally call a library of mathematical routines libmaths.so and its header maths.h
I find the term "math" foreign and unsightly, like "creat" instead of "create".
So what we really need is some kind of internationalisation in code.
A German-speaker should be able to read German keywords, a Spanish-speaker Spanish keywords, etc. A good source control package should be able to arrange this.
Maybe one day?
e.g. a German-speaking programmer might see:
DATEI *quelle =
and the English-speaking programmer sees:
FILE *source =
Wouldn't that be nice. Or else we translate all the keywords into Esperanto or Interlingua or suchlike.
I am anarch of all I survey.
I probably took an hour the other day going through and pounding the L everywhere I put fufill where I meant fulfill. *sigh* Same way every blasted time! (I can't seem to hit the Q lately either.)
Back in my day when we chiseled our bits into stone and sent them by mule train from village to village...
I think he's on http_referer :)
in java if u type
verible berd;
and then u call
bird = 1;
it will not compile
a spell checker would be very hard to wright as there are mant diferent ways of spelling variles eg
VaribleOne,
Varible_One,
Varible_one,
Varible1,
varible1,
varibleone,
varibleOne,
VARIBLE_ONE,
VARIBLEONE,
ectra
Other then dealing with camel case, there's no need to stand on your head in Perl; ispell can already spell-check software by using the "external deformatter" feature. It even comes with sample deformatters that handle C, C++ (in two ways), and sh/bash. In fact, one of the reasons I added the capability for external deformatters was to be able to spell-check program comments. To deal with camel case, one might change Ed's Perl script so that instead of converting "camelCase" to "camel Case" and downcasing the result, you instead converted it to something reversible like "camel___Case". Write that to a temp file, ispell it with the appropriate external deformatter, and convert it back to the original form. That said, when I tested the C deformatter on my own code (I think it was the ispell source), the results quickly convinced me to give up. Look through your code again, and note all the variable names that contain unusual abbreviations ("ch", "cp", "ptr", etc.). Note the line comments that have been abbreviated to make them fit. Note the application-specific terminology in the comments, and the huge list of odd library function names. The first time through, you're going to get very tired of adding all those things to your personal dictionary, even if you remember to make a dictionary specific to the application. Yes, there are ways to mitigate the problem, such as providing a predefined dictionary for popular libraries (but have you done ls /usr/lib | wc lately?). But even that's a problem, because a lot of libraries can have function names that will hide legitimate misspellings.
I'm not saying that it's an impossible wish. But it's nontrivial to get it right.
Yep. Programmers should know how to spell correctly in their native language. But hey, all through school those technonerds where likely the same ones who never missed a chance to whine about how they hated their English (or whatever) classes and thought that learning grammar and spelling were a waste of time when they could be doing cool geek stuff. The rise 1337-speak and txtspeak hasn't helped.
At least in the real writing business there are editors trained and paid to catch these errors.
Being unable to spell correctly makes you look really stupid to most people.
Just FYI, if you have a decent programming environment, it should at least flag cases where you've mistyped an existing identifier. If there's an ImmediateFlag in your code, you'd get a warning if you typed ImediateFlag or ImmediateFalg or whatever. Not much help when the programmer is creating new identifiers, of course. Although I've seen cases where the programmer in question for whatever reason decided that because ImediateFlag was undefined then they would just define it, even though ImmediateFlag existed and was what they meant. That ought to get you fired in my book.
Hey by the way, pair programming is a great way to have continuous code reviews and a check on some of the more typical fumble-finger errors.
You're asking for the wrong feature.
All of your interface strings should be externalized as resources.
(Why you are not doing this is the bigger question...)
Once this is done, you have a list of strings without any programming language dependencies.
Pass it through to your favorite spell checker.
Code comments, though, is another issue. However, your customers should not be reading it.
(NT)
Heh yes. "You appear to think this is Pascal/BASIC, would you like me to temporarily erase that part of your memory?"
which is totally what she said
Yes it works in Vim 7.0 - It is job if the plug-in handeling Syntax Highlight. The Plugin should should switch the spell checker on and off depending on which language construct was detected.
Martin
In Vim the Spell Checking is handled by the Syntax-Highlight Plugin which can switch spelling on an of depending on which highlight is used.
But then: All the programming language plug ins I have seen so far do what xemacs does: strings definitions and comments. But HTML and Wiki plug ins tend to use more complex rules.
Martin
well personally although I don't do a lot of programming in my role (I work across the street, networking) I've always found a good notepad based editor does the trick - personally I use editplus which highlights all of the code and when I get something wrong, it's usually pretty obvious straight away. Still won't help you with naming your functions wrong (because that's really the crux of this /. article) but hey that's why you have to doublecheck and make sure you've got those right the first time!
Boy, I love these cool eye-dee-ee things. They really make text editors look primitive!
No, but neither will an automatic grammar checker - their great toys to play with if you want a laugh, but I have yet to see one that was actually capable of telling the difference between good and bad grammar.
(For example, Word's grammar checker completely missed the misuse of "their" for "they're" above - as trivial and glaring an error as can imagine. Oops, it didn't notice the missing subject in the previous sentence, either!)
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Your story about that man's death is somewhat funny (poor man!), but please stay on topic!
I am really interested in spell checking my programs.
On topic: Checking comments is nice, but the next step should be checking words in strings/ (and XML properties for programs supporting i18n).
It's actually impossible for the computer to know whether you're creating an infinite loop.
Oh really? My computer in Aleph-1. Stop being pedantic, dad, and get back to your Delphi coding.
Stick Men
Yup, but I still had to do it for several programs and interfaces built on top of them. I blame it on the UI, but it could be due to inline scripted replacement cowardice as well.
Back in my day when we chiseled our bits into stone and sent them by mule train from village to village...
Wouldn't it be simpler to encapsulate the spell checker within the syntax highlighter, or call it from the syntax checker. That way you could define a list of do not spell check words for the language and use the syntax checker to define what needs to be checked for spelling. The syntax checker would do most of the string analysis that would be needed to identify what should be spell checked and what shouldn't, I am assuming?