Programmer's Language-Aware Spell Checker?
Jerry Asher writes "Not all of my coworkers are careful about spelling errors. Sometimes this causes real embarrassment as spelling errors creep into software interfaces. Does anyone know of spell checkers for programming languages? I don't want a text spell checker, I want a programming-language-aware spell checker. A spell checker that I can pass all of my code through and will flag spelling errors in function names, variable names, and comments, but will ignore language keywords, language constructs and expressions, and various programming styles (camel code, or underscores, or...). I want a spell checker that knows that void *functionSigniture(char *myRoutine) contains one spelling error. Does anyone have such a thing for Java or C++? Are there any Eclipse plugins that do this?"
The version of Eclipse I run, Eclipse WTP 3.3, does spell checking on comments as standard. Not for variable, function names and the like though. It's a decent first attempt though. In truth, I turned it off within the first few hours. It underlines any mistakes in red which I find really annoying when scanning code as I keep thinking I've seen syntax errors. More often than not my eyes are drawn to a spelling mistake, which in many cases isn't even really a mistake, which distracts me from what I'm actually trying to look at.
Visual Assist for Visual Studio does this.
Next silly question, please.
Some people call using it a "code review". If you are really serious about it, post the code to /. - plenty of people here seem to have time to point out any spelling errors.
.... that if you want your code to read like english, you consider a language like COBOL? Not that it would help you with spell checking, per se... but if one is going to be so pedantic about making sure that their procedure names can be found in an actual english dictionary why not go the whole 9 yards and write the whole program that way?
File under 'M' for 'Manic ranting'
And not too hard to implement - all you need is a lexer and a few functions to classify different naming styles. lexertl even comes ready with a full example for C++, so get to it ;)
How about the Built-in OS X spell checker?
We're talking about programming, friend.
Anything that may appear in a user Interface should be kept in dedicated files. Use a standard format such as CSV, XML...It may be reviewed by non-technical people with built-in spell checker software such as excel. This is a trick mainly use for multilanguages project, but it really helps.
I particularly like the spelling feature in new vim, right-click menu (:set mousemodel=popup) to select a corrected word or remember current word as correct. Perhaps writing a vim plugin as you explain could be possible? I'd be very glad to use it too ;)
#
#\ @ ? Colonize Mars
#
This isn't quite what you want because you have to select the text to be checked but its better than nothing !
Hope this helps
Art Makers Just an excuse to show photos of naked women !!
Vim 7 does this for c/c++ code. Just turn on spelling (something like :set spell) and it picks out spelling mistakes in comments and strings.
A small script to split up camelCase into seperate words, then feed the result through a normal spell checker. Then after that just whitelist certain words like maybe "m" as found in "mSomeVariable".
We've got code here that refers to 'insurrances', 'insurances', 'insurrences' and 'insurences', I'm not kidding.
People here making fun of his request and saying that this should be set in stone in design documents, or be checked in peer code reviews are obviously not working in a run-of-the-mill software company where there's neither the inclination nor the time to do everything the formal way. Also, I have to see the first design document that correctly enumerates all the requirements for the software, let alone all the names for the variables to be used.
---
"The chances of a demonic possession spreading are remote -- relax."
I am currently working on a java-based universal spell checker (the kind that can do a decent job without involving knowledge of that language). By language, I mean, English, Hindi etc.
I am amused by the idea of being able to extend that to programming languages.
The most significant problem that I am facing has nothing to do with coding the spell-checker. Its about getting a sizable dictionary of words (finding one, converting to UTF-8 etc.)
The trouble is that programming comes with a very different set of words that are not common in a normal dictionary
- acronyms (XML, JRAD etc.)
- weird keywords (foreach, esac etc.)
- regex
- portmanteau words like regex
etc.
I have a feeling that if some one can take the pain and build a dictionary of such things, it can be done. I am not much of a lexicographer. If you can find such a dictionary, get back to me, and I will see if I can get this to work for you
Why idiots? A great developer may not necessarily have a commanding grasp of English, especially if it's a second language for them. That doesn't make them stupid.
Other than that, I agree with everything else you wrote. Planning things out through proper software development techniques should isolate those issues in advance.
Okay, so it's only for Managed Assemblies (C#, VB.NET, J#, etc), but it does spell-checking, acronym-checking, and case-checking, which is nifty. Along with the other slew of introspection rules (some of which are a PITA to implement, even if it does increase the quality of the finished product).
The $$$ version of Visual Studio (the Team Suite version) comes with an introspection engine for VC++ though, it's not as flexible as FxCop but does the basics.
Then there's the countless "Spellchecker" plugins available for IDEs everywhere, VS, Eclipse, NetBeans, etc...
TextMate on OS X has spell checking functionality that is semi-useful, but it's not really "aggressive" enough, and there doesn't seem to be a way to make it such with prefs/configuration.
You can right-click on any "word" (variable name, subroutine name, whatever, just generally a whitespace-delimited group of characters) and it will check the spelling and present alternatives in the context menu. It also recognizes things like perl's sigils so correcting '$teh' turns into '$the', not 'the'.
It _won't_ automatically check spelling except in strings (so e.g. if I have '$teh = "This is a tset.";', 'tset' will be underlined, '$teh' won't). It doesn't include comments in its automatic checking either, which is probably the most annoying part about it.
Overall I typically just don't bother with it, but someone _has_ thought along these lines, at least.
For the record, 'I' is a word. Also plenty of spellcheckers will ignore one or two letter words.
The idea isn't anywhere near as nuts as you think it is, provided you make a habit of using meaningful variable/class names.
++ Say to Elrond "Hello.".
Elrond says "No.". Elrond gives you some lunch.
It's not what he means. He talks about problems like getBussinessObjekt() instead of getBusinessObject(). Apart from that, you're 100% right and it should even be that way if you don't want to internationalize it.
The OP's question was about spell-checking the source code itself (for consistency in variable names), not any UI messages that might or might not be contained in it.
It's not so simple when you're not the one writing the code, but have to deal with the results. There's an SDK that I use as a part of my job, developed by our head office in Japan - it's a set of C# classes, and nothing annoys me more than typing "Connection foo = new Connection();", then noticing Visual Studio isn't highlighting it as I'd expect. Hunting around for anywhere up to a minute and eventually finding out it is actually "Conectin" instead of "Connection". If there were a good "programmers spellchecker", I may not need to use it myself, but I could give it to my Japanese colleagues to make MY life easier! (note: the above example is fictitious, but is an illustration of the type of error that I deal with that this would prevent)
My book about LSD and Self-Discovery
Also on facebook as: DroppingAcidDaleBewan
The question wasn't about user interface strings. It was about spelling in APIs. e.g. One issue at my last company, which was British, is that they standardized on US spelling, but some British spellings crept in too. So sometimes you'd get a function containing "Initialize" and sometimes "Initialise".
Why don't more software developers take a leaf out of Knuth, who "only proved it correct, not tried it" [grammar]?
Yeah, that's great and all, but this is something you can only do for small algorithms. Proving correctness of large systems takes huge amounts of time and resources, so for most developers this is just not feasible. Also a while ago, someone notable in the Java world (Gafter? Bloch? Can't seem to find the link) blogged about how he had discovered that one of the core textbook examples taught the last 30 years in the courses about proving code was in fact incorrect... The assumptions no longer held for 64 bit systems. If the experts can't get 5 lines of code right, what use is it in practice?
He made the effort to develop his mind such that he could be confident in his own ability to combine memorised results with logical deductions - the key skills of programming, spelling and pretty much any other ability you'll ever attempt to acquire.
Oh, so the easy solution is to become a genius, then all problems will be trivial. That's very smart of you AC, wonder why no one thought of that before?
Being bitter is drinking poison and hoping someone else will die
Only three things are certain; death, taxes, and apocryphal quotations - Ben Franklin.
More like WTF are you on man?? If a compiler is able to work out what a variable is, what piece of code does what, which bit's of text are going to be displayed, then another spell checking program can be written to recognise this too!! It would be tricky, and there are many circumstances where it could be circumvented but why not still use it to prevent a possible spelling error, and the circumstances where it cannot tell what the word is, so what. Those circumstances you learn to spell but there's nothing wrong with another program to help prevent it!
This is a good idea, and one that can be implemented. Just because it's hard to do it right, and would need to be done seperately for different languages doesn't change the fact it would still be useful and help prevent errors.
Who need's speling and grammar?
I'm guessing that your "far to common" is a subtle easter egg?
[x] auto-moderate all posts by this user as insightful
By saying this am I right in also assuming that you never make mistakes with your coding either? Or are you trying to say that mastery of the English language is a much simpler skill than that of programming? Either way, I'm still a bad speller but I know I'd much prefer to brush up on good coding practice than start training for the next spelling bee... but that's just me.
Cos I don't know what a Space compnay is.
"XML is like violence. If it doesn't solve your problem, use more." - Anonymous Coward
What you're wanting is something that is very difficult for a computer but very easy for a human being. You want to be able to discern which parts of your text file constitute machine-readable instructions (which have to be spelt the way the machine expects them when it's running the compiler/interpreter) and which parts constitute human-readable messages (which have to be spelt the way a human would expect).
Bollocks, what you describe is absolutely possible, otherwise syntax highlighting would not be possible. In Java, everything between /* and */ and everything starting with // are comments and thus should be spellchecked. Everything between two ", should also be spellchecked since those are literal strings.
Of course, you're right that literal strings have no place in source code, but that was not the question.
He dosn't only want to spellcheck software interface messages etc, he also wants to spellcheck function names containing natural language.
I simply use my gcc's spellchecker : gcc foo.c -spell=en la.c: In function 'main': la.c:15: warning: ambigious string value 'hellp'. Did you mean: 'help' ? la.c:17: warning: ambigious variable name 'i'. Did you mean 'index' ?
Also a while ago, someone notable in the Java world (Gafter? Bloch? Can't seem to find the link) blogged about how he had discovered that one of the core textbook examples taught the last 30 years in the courses about proving code was in fact incorrect...
Found it... Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken. I like his final paragraph:
"We programmers need all the help we can get, and we should never assume otherwise. Careful design is great. Testing is great. Formal methods are great. Code reviews are great. Static analysis is great. But none of these things alone are sufficient to eliminate bugs: They will always be with us. A bug can exist for half a century despite our best efforts to exterminate it. We must program carefully, defensively, and remain ever vigilant."
Being bitter is drinking poison and hoping someone else will die
I think you meant, "review their code"...
(Although clearly no automated spell checker would have caught that.)
I remember from spellchecking some html documents a while back ago that aspell is at least aware of html. I do not know how well it works with other kinds of documents.
I almost wrote exactly the same thing as you. Luckily, I decided to RTFA first and avoid making a total berk of myself.
It's true I tell you, feller at work's next door neighbour read it in the paper.
The idea is nice and I think the problem is really prevalent. I have seen large portions of source code, much of it commercial, containing not one or two but hundreds of spelling mistakes. I also believe the problem must be more prevalent in closed source and in small businesses than open source and Free software. Another thing is that developers from countries with non-English languages often mix English with their first language in code, making it hard to maintain by other nationalities.
I expect my artists to have spelling mistakes. I expect my coders to know the semantics of a language that they are using. If they don't I question their ability on the semantics of the language the project is coded in.
The funny thing is, in my experience the most spelling errors (compared to e.g. poor grammar in comments) in code are made by those coders who speak English as their native language.
I'm sure I've got a plugin at work that does this. It covers my code in yellow squiggly lines.
Wow, I should not post when knackered.
Well, I'm a total newbie in terms of compiler architectures and such, but throwing it out there for the purpose of discussion...
I assume a compiler will parse the source and in the process identify which tokens are key words and literals, and which are programmer-defined identifiers in the code. The spell checker would either use the same algorithm, or latch into that part of the algorithm to get at all of the identifiers. There are two possible word separators in typical code--either capital letters or underscors. (If you have something more bizarre, then I think it's a lost cause). So pass those identifiers through a filter that chops them up at each capital letter or underscore (with some exceptions, say, if the identifier is all caps). So, now you've got a pile of strings which are either oddball programming convention stuff, like "p" and "g" for pointers and globals, and things that should generally be words. The rules can include "toss out single character identifiers", "toss out everything up to first capital or underscore", etc. If you have coding guidelines that enforce variable naming conventions, this should get you most of the way.
Now you have English words that you can pass through your standard spelling engine, possibly with a dictionary tweaked for your field of endeavor to decrease false positves and escapes.
-- "This world is a comedy to those who think, a tragedy to those who feel."
The spelling error is that "functionSigniture" is not a word? Or are you suggesting that it should recognize mixed case and check each word individually? I just looked at some random code of mine, and it contains so many things that are perfectly legal that aren't English words, aside from language keywords.
For example, I have a function that provides a pretty-printed time called 'Gmtime'. The function that produces a GUID in BER form is called 'BERGUID'. In lots of places, "clear" is abbreviated as "clr". A few "Init*" functions are matched by corresponding "Deinit*" functions. An "operator new" assistant is called "OpNew".
C++ requires that a parameter passed to a function have a different name from the same class member. So it's not unusual to have two variables that do the same thing logically but must have different names. Deleting a letter from one variable name is not uncommon, like "first" and "frst".
A static version of a class member function that's used to hook into an API that doesn't support C++ natively often has an 'S' before it's name, so you may have 'ShutDown' and 'SShutDown'.
I believe that most other languages have similar issues.
You'd either have to have one crazy set of coding standards or you'd have to expect your programmers to pick the legitimate errors out of pages and pages of nonsense.
Well, all that he found was that it is easy to overflow a 32 bit signed int value, which is nothing new. The concept of an array of 2^30+ bytes is what broke this code - recompile the code on a 64 bit platform and it just works, bug free (up until 2^63 arrays, but still..). The fact that 'nobody will ever need more than 2^30 (over 1 billion) array entries' isn't a bug in the software per se, just a bug in the specification.
Wow, I'm glad there's such agreement between those who read the article and left comments, and those who just tagged it. :)
I tagged this article badlytagged
- string literals (not what the poster wanted, but this is what needs spelchekars the most)
- identifiers
The former can be done by a simple regexp, the latter... you can do a LALR parser, but why even bother? Just look for _any_ potential identifier; in most languages, that's [a-zA-Z_][a-zA-Z_0-9]+; and simply add the few keywords which are not English words to your dictionary. In fact, this would be nearly programming language agnostic.When it comes to StudlyCaps, anything identified as an identifier can be split _before_ any uppercase letter. This would produce a lot of single-letter tokens for ALL-CAPS #defines and the like, but as a nearby post said, you're going to ignore one-two letter tokens anyway. The usual conventions say XMLHttpRequest or XML_http_request so I wouldn't bother with XMLhttpRequest (and thus "lhttp").
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Yes, this is a legitimate problem. I work on code that has spelling mistakes embedded into interfaces and it's very annoying. The fashionable use of StudlyCaps in programming (why? who decided that TextLikeThis is more readable than text_like_this?) makes the job a little harder but not impossible, as long as you follow the sane rule of making each word start with capital and continue lowercase, even if an acronym (so XmlParser not XMLParser or, God forbid, XMLparser - though of course XML_parser would be better than any of those).
/c; foreach (split) { print qq{$_\n} unless $seen{lc $_}++ }" source_file...
/usr/share/dict/words or in the private word list. Indeed, why not this:
/c; foreach (split) { print qq{$_\n} unless $seen{lc $_}++ }" >found_words /usr/share/dict/words >allowed_words
Enough rant. How about this:
perl -ne "s/([a-z])([A-Z])/$1 $2/g; tr/A-Za-z/
That will give a list of unique words in your source code (use find and xargs to scan the whole source tree). Then you can run that list of words through an ordinary spellchecker such as ispell. Unfortunately when you find a mistake you have to go back and grep for it to find where it occurs. You would also need a personal dictionary for things that are not English words but nonetheless appear in code.
I would probably keep the private word list containing things like 'foreach' and 'const' with the program source code, and have a makefile target 'make spellcheck' that runs a command like the above and then prints out all words found that are not in
find . -type f -name '*.c' | xargs perl -ne "s/([a-z])([A-Z])/$1 $2/g; tr/A-Za-z/
sort -u private_word_list
diff -u allowed_words found_words | grep -E '^[+][^+]'
The private word list can be kept under version control and checked in whenever you add a new non-English word like 'Frobule' to your source code.
Adding filenames and line numbers to the output is left as an exercise for the reader. You might also want to change the perl command to ignore words with length < 5.
-- Ed Avis ed@membled.com
His next project is to have a handy little helper with a RAM chip avatar. His name is chippy and he comes out with helpful phrases like:
"You appear to be creating an infinite loop. Would you like me to increment your counter variable?"
"You appear to be writing a virus, would you like a list of the latest Windows Vista sploits?"
which is totally what she said
The parent of your post understood that perfectly, unlike the vast majority of comments to this story. I think your remark should have been a reply to one of those, and not to this... :-)
What the... I'm an idiot, I could have sworn your comment was attached to #20461667
*searches for the delete button*
"Ya" is clearly intentional and comes from a dialect, so that's ok.I'm not one either, so this means I get to shout at you, right?
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
For .net languages, FxCop does some of this checking, even understanding camel casing and underscores in tokens. And a bunch more, since it is a static code analysis tool.
http://www.gotdotnet.com/Team/FxCop/
- string literals (not what the poster wanted, but this is what needs spelchekars the most)
- identifiers
Ook... and- comments
but I hardly use these anywayThe creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Doesn't Visual Assist from Whole Tomato do this? I've used it in the past and I'm sure spelling mistakes (and a whole host of other things) were pointed out.
:-)
I'm not associated with Whole Tomato, but if anyone from WT sees this, can I have a free subscription
No sharp objects, I'm a programmer!
Nothing personal, but it's not actually a programmer's job to make sure everything is speelled correktly. This is part of the QA process before a product rolls out the door. Sure, you should do your best, but you need another pair of eyes (or several pairs of eyes) looking at the UI in addition to your own. You can easily miss the forest for the trees.
$nice = $webHosting + $domainNames + $sslCerts
I'm sure you aren't using .NET, but if you were, FxCop will check for spelling mistakes in code and comments and strings, along with 1M other coding issues (like malformed variable names, parameters).
No sharp objects, I'm a programmer!
I had your problem once because I was working with people whose first language was not english. I don't write US English either and I always left English spellings in by mistake.
I used aspell and went through huge parts of the source, telling it what wasn't misspelled. It was an incredible pain in the neck because it got confused over all the variable names, bits of C syntax etc etc.
Once I had a dictionary, though, I could recheck the source periodically and although there were a lot of false warnings, we still caught a lot of problems that would have gone into the production release.
As you can work out, I didn't restrict the test to strings - this is because misspelled variable names can cause bugs too so I checked for them as well.
Cheers,
Tim
This is all just my personal opinion.
I made one and use it for my open source Java database. It is very simple so far, based on a word list. Supports camel case and so on. It is here: H2 Database Engine, src/tools/org/h2/tools/doc/SpellChecker.java. Or here: SpellChecker.java. It can also check XML, HTML, JSP,... Words shorter than 2 or less characters are ignored. If you want to spin off you own project go ahead, I can help you.
I have included it in the build script: Whenever you write more than a few lines of new code (or documentation) the spell checker will bark because it doesn't know the word. Maybe I should add an automated 'word list expander' that checks unknown words on the internet... Anyway, the hard part will be to convince your coworkers to use it.
No, you do. Fowler's asserts "no agreement" and advises your approach in cases of ambiguity. To be specific: the comma would have been obligatory had there been a possibility of reading "spelling and pretty much any other ability..." as two items that make up "programming". This misinterpretation is impossible by virtue of the implied "etc." in the final item.
It is quite possible that your remark applies to the simplified version of English spoken across the pond
And they might carry on being misunderstood when the spellchecker ignores context or is not available. Every man has his own set of priorities, but only the most arrogant offloads the responsibility of making his language easily decipherable.
Don't ever put interface text messages in your code! use a separate messages file. Not only that it helps a lot with internationalization, but it makes really easy to spot and correct spelling errors. Also useful for logging.
-- EOF
I had a problem similar to this when I had to work with an offshore development team a few years ago. The solution we used (as did other people responding to this post) was to ensure that all the UI visible strings were moved to a resource file and we spell checked that instead of the source code. It allowed us to port to other languages quickly and cheaply too.
"Similarly, there should probably be a set of words added that aren't "English" but are used often enough to be worth adding to the dictionary. Things like Obj, Int, and Ptr."
Or they are "English", such as a function that flags "setColour" as incorrect because it is a US English dictionary and British spelling.
This is a non-trivial problem to do right. The spell checker has to be not only familiar with CamelCase, word fragments that might be added (like the Obj, Int, Ptr or various prefixes), and the programming language syntax, but it would also need to be familiar with the native spoken language.
One strategy might be to strip out all the programming syntax fluff (something like ctags) and then run a spell checker on that with a custom dictionary and a script to split up such things as CamelCase. You'd have to do the same for comments (which ctags normally ignores).
In any case, with ctags, something like aspell, and a bit of custom scripting and dictionary fiddling, it looks tricky but doable as a batch process. Doing it interactively in the editor would be slightly trickier, but if your editor can invoke programs, not hard.
Does that mean that we need an automated grammar checker for our code too?
Cheers, Chris
"nearly all" is an overstatement.
The bug is a consequence of integer overflow. Which is a perfectly common type of bug in languages where "int" and thelike overflow. But really, in most cases it's insane to expose messy internal details of the representation and computer to the programmer.
Yes, sometimes you need to. But most of the time you should be using a language and/or library that encapsulates details like that so that you can say "some integer" and have the language and/or library take care of whatever mess needs to be dealt with to present the abstraction: an integer.
In C, when you say "int", the thing you get only vaguely behave like the stuff called "integer" in maths. integers don't "overflow", and you sure as hell don't get a negative result if you add two positive parts.
So, nearly all binary searches and mergesorts that are written in a language that exposes the guts of the machine, without properly abstracting even simple stuff like integers, are broken. I bet most mergesorts and binary searches written in any saner language, or using any sane library, is free of this bug. And the next half-dozen bugs to be discovered in the C one.
Meanwhile, mergesorts and binarysearches written in Python, Ruby, Perl, or any other higher-level language are immune to this, and many other classes of bugs that has literally been haunting the low-level messies for half a century.
Well, yeah, unless you misspell it the same way every time.
Finding other idiots on
What? Since when in what language will a function fail to compile if you have spelled it incorrectly? As a matter of fact, it will compile and most editors will pick up the spelling error as a legit function to be called from elsewhere in the code and will add it to list of autocompletable functions which will spread the error!
www.aleo.no
For the record, 'I' is a word.
Not exactly. I is a word. Lower case I should get a red line still.
Ah. Well, compilers generally tend to report errors like misnamed functions and variables when you try to compile the program.
.....
Anyway, it's hardly worth getting your knickers in a twist over. Either the code is going to be open, and everyone will get the ability to change the variable and function names to whatever they want if they so desire; or it'll be caged up, and nobody in the real world will ever so much as see them. If I had enough mind left to pay to petty stuff like function names, I'd be wondering what big stuff I'd missed
Je fume. Tu fumes. Nous fûmes!
It's not like comments matter much. If someone makes a few errors in a comment, it can still be read, if someone makes an error in the code, well...they cause an error.
I just read Slashdot for the articles.
When you work for a business equipment (read: "Photocopier") manufacturer, having your head office in Japan tends to be pretty normal!
My book about LSD and Self-Discovery
Also on facebook as: DroppingAcidDaleBewan
>> How about the Built-in OS X spell checker?
We're talking about programming, friend.
Actually, it works in XCode like everywhere else. Click at the beginning of the a comment, press cmd-; and it starts checking the spelling. Quite useful when you just typed in a lengthy comment, and found a few mistakes. And not completely useless when you just added things to a header file.
True, identifier names containing spelling errors can be a real annoyance, but I somehow doubt you'll ever find a usable solution, at least not as long as you'll need to interface to code beyond your control. What spell checker wouldn't choke on regular C++? Just picking a random declaration from MSDN (feel free to choose any other API, it won't change anything):
HRESULT MFGetService(
IUnknown* punkObject,
REFGUID guidService,
REFIID riid,
LPVOID* ppvObject
);
You'll probably just end up spending all your day removing false positives.
Is it ironic that you don't know the meaning of the word "semantics"? Why yes. Yes, I do believe it is.
The newest version of MyEclipse 6.0 has a spell checker. Now, if that's what you want, you can have it pretty cheaply. Personally, I think it's a question of education in general. If your co-workers are so poorly educated that they can't spell, then a spell checker is only going to solve the surface problem. Usually, bad spelling goes along with bad grammar and bad writing--which equates to bad thinking and logic. There are no absolute requirements for a bad speller to be an idiot, but I regret to say the two are correlated. I work in a large corporation and it is daily that I get some illiterate email from a co-worker that informs me that my collegue is really not that bright. So, with all respect, your peer's problem is larger than a spell checker could solve.
It's not a major problem, but it's annoying to work with code containing mispellings. I've seen things like appendToQue or serie (used as the singular of series) in code I've worked in, it adds to the number of things you have to remember.
Autocomplete helps, but not every platform has it and it frequently breaks on VC++ IME.
If you're an emacs users, just turn on the flyspell minor mode.
I write code.
I must violently disagree with the Offtopic moderation to the above post.
Spelling is a very common problem in the so-called "developed countries" (hahahaaa.. *hrm*) today, to an extent that I, a non native English speaker, can even spot spelling or even grammar errors in comments on Slashdot. You should be ashamed, period.
Really, "learn to spell" is not offtopic, quite the opposite, it's dead on.
Some time ago, I was operator on a French IRC channel and we have decided to apply a "mode Pivot" (from Bernard Pivot, a renowned French TV presenter of a cultural program named "Apostrophes"). We were five ops and had a hard time to keep up kicking the people making spelling/grammar errors. I lurk on IRC from time to time now, and it's even worse: people there just CAN NOT SPELL PROPERLY. Heck, IRC is not limited to 160 chars per message. "Kestu fé" is NO equivalent to "Que fais-tu", neither is "wt r u doin" to "what are you doing".
While typos can be "pardoned", plain mispelling from the start is a simple lack of proper education.
It's actually impossible for the computer to know whether you're creating an infinite loop.
1. post to /.
2. use title "Sony rootkit source code 1/200"
3. read grammar nazi comments
4. profit!
Yeah, not to nitpick but, you see; 'i', being a variable-name, would be a properly camel-cased 'I' from the point of view of the spellchecker.
Religion is what happens when nature strikes and groupthink goes wrong.
Man Dies Waiting for Eclipse to Launch
A software engineer in San Jose, CA was found dead at his desk yesterday, apparently having died while waiting for his Java editing program, Eclipse, to finish its boot process. Coworkers say the engineer came in that morning vowing to "get Eclipse working on his box or die trying." The last thing anyone heard him say aloud was the cryptic comment: "I see the splash screen is appropriately blue." Nobody knows what he meant. The man was then thought to have fallen asleep, but hours later it was discovered that the engineer had died suddenly of apparent natural causes. The forensics team's investigation that evening was reportedly interrupted unexpectedly when the dead man's Eclipse program suddenly finished launching. The team tried to interact with it to see if they could find clues about the man's death, but the program was unresponsive and the machine ultimately had to be rebooted. At this time, the police commissioner says there is no evidence of foul play, and they currently believe the man simply died of either boredom or frustration.
Ben Hocking
Need a professional organizer?
I don't understand the '0' score... using gettext() is a good idea (unless you come with a better mecanism for internationalization).
There are some simple things that it could do as 'warnings' though, checking if the test variable is being referenced in the loop, or if it's a global variable it could check if it's modified in any functions being called etc.. you could have a poorly constructed loop that will only repeat infinitely in weird conditions, but the computer won't know that that isn't intentional of course.. and in certain programs you want 'infinite 'loops anyway, or loops that will run until you kill the app..
which is totally what she said
I can vouch for using the built-in spellchecker everyday while coding in TextMate.
Pluralitas non est ponenda sine neccesitate
Sure, it's the halting problem. We all know that. But there are several common cases where you can deduce that there is an infinite loop in the code. It won't catch all infinite loops, but that doesn't make it useless.
(Suns Java seems to be good at detecting some of those by default when it complains about unreachable return statement)
Note, however, that no spellchecker will catch homonyms or words that can be written in two words or one, such as "spell checker"/"spellchecker."
Spell checkers are fine but they make mistakes as well. The best thing I have found, and this goes for any project, software or printed word, is to have someone who is not connected to the project or better yet not even connected with the subject proofread what the public sees. They will often catch mistakes that jump off the page but people working on the project just don't notice. I have made some really stupid mistakes that I never saw but were on the cover of a book I was publishing. I am SO glad it was proofread before it went to press.
Attempting to tell programs the correct grammar or spelling does not always go well. While most will thank you for your input on catching their mistakes, others take it like you step on their babies head.
(I keeed, I keeed)
Just then the floating disembodied head of Colonel Sanders started yelling Everything You Know Is Wrong!-Weird Al
Use of last comma is not universal, ubiquitous or absolutely necessary.
http://en.wikipedia.org/wiki/Serial_comma
You can find lists of words in various languages here:
ftp://ftp.ox.ac.uk/pub/wordlists/
I don't know anything about the quality or copyright status of this.
KDeleveop uses kwrite as an internal editor, which offers autocompletion of words-- so if you have a function, MyFunc, defined it will autocomplete it after you type Myf. This cuts down considerably on fat fingering function/object/variable names.
Ah, but will the compiler fix the grammar in your comments?
Cheers, Chris
"Any douche who doesn't realise a misspelt function name will fail to compile clearly hasn't written any code yet."
;)
You clearly fail to see a programmer can also create their own function names, as well as use other peoples functions. So you prove you are a very inexperienced programmer, (and close minded), which adds weight to the idea you are either young or just arrogant. Also your very apparent need to show hostility, shows a degree of insecurity, where you are over compensating, by verbally hitting out at others, in an attempt to appear to be more knowledgeable than you really are.
The easiest way to become a better programmer, is to be more open minded. So far you have failed to demonstrate this.
As a side note, (back in the DOS days of programming), I found the the spell checker in Multiedit very useful (especially when having to work very late at night, after the coffee stopped working!
There are 10 kinds of people in the world... those who understand binary and those who don't.
Its actually simple , code a lexer and the feed all the variable names to a dictionary.You can also use standard lexers like Flex.
how about hacking the linker map file to generate a list of function/variable names? ie, "ld -M". then run the resulting word list through a standard spell checker. the thing is, all you really need is a way ti generate a list of names...
In 2007, providing a development tool that does not auto-correct and point out misspellings, syntax errors, etc, is like providing a car without a windshield because the first cars didn't use them, and technically the driver doesn't need it. How many years - what is it, now, well over a decade? - that Visual Studio has had "Intellisense", that does exactly what the poster describes. I just don't understand this anti-MS holier-than-thou attitude when non-MS developers ask questions like these.
EditPadPro (www.jgsoft.com) is not free, in either sense, but it's very cheap for what it does. I have turned most of the development teams at my last three jobs onto it. One of its key features is configurable, user-extensible syntax highlighting. The highlighter includes the option to exclude matched language tokens from spellchecking. In the built-in highlighting schemes, for example, it will usually spellcheck inside comments but not much else, but as mentioned, you can easily take their color scheme and change it to suit.
--K
How wonderfully ironic :)
Still, the point stands; if your developers can't form a coherent sentence using well-spelled function names I'd fear for their code in the first place. It only takes a couple of typos to make code readability drop through the floor. You don't want automated tools you want to hire developers who can write.
But, the quality of education seems to have declined in recent years. I remember writing stuff for English class at school and you'd get your work scribbled in red ink for making spelling mistakes, all the time. I've looked at my brother's marked English homework (he's 15) and even the glaring mistakes are missed. Having to type everything rather than hand write it seems to be the source.
People need to be able to write, and not just trust a spell checker. But then again, this ALSO falls down when you don't have native English speakers on staff.
I've got a couple of projects on the radar right now where tiny spelling mistakes are in production code - API definitions, symbols that are exported - that just appear in every version. If someone had been reviewing and had an eye for it they would be fixed. What doesn't help is none of the guys on the projects are English or American besides me..
If you mean you've made the same typo everywhere, either it wasn't noticeable enough to matter or you can just do a global find/replace on it.
They're not so much idiots, but I would find it INCREDIBLY difficult to name a function "functionSigniture". It's just WRONG.
There is absolutely no reason for spelling it that way, even if you're Hooked On Phonics, it isn't pronounced that way in any language approaching English.
If this is an exported API function then it would cause a huge problem. Now, consider that a guy who is writing a function which calculates some kind of signature (perhaps a hash or a certification routine) cannot even spell the word which describes what he is doing. Does that give you confidence that the signature function is correct?
Technically is a lint.
-- Patent no.123456: A way to personalize
Spell checking variable names isn't exactly what IntelliSense does, and Eclipse is actually better than vanilla VS.Net at producing red squigglies under your code (I'm told that VS does it for Visual Basic, but for C# you need something like ReSharper).
I don't see this as a "MS bad" kind of thing, rather just a really low-impact "problem" that would be less than trivial to fix.
Why not simply create your own custom.dic file and use it with a text editor. Although my work is not with programming languages, I've found that having a 'legal.dic' containing legal terminology and 'audit.dic' containing auditor terminology has been invaluable with my work. I can't imagine not being able to do the same for program code, since much of the codes are predefined in a lexicon and easily transferred to a dictionary file.
Yeah, but article poster stated that he wants a spell checker that fill note that thers an error in "void functionSigniture(some)" which dos makes kinda sense in a shared project where others will have to work with an existing error and fixing it can be tedious. My guess is that using aspell and just divide function names on capital letters and _ it should be little problem to implement. It will however demand some structure to your function names, so no calculateWhereYuoarerightNow(), but that should usually be mildly inconvenient. The compiler however is great at interpreting a programing language, but dont give a rats arse if the function names are readable and correctly spelled....
www.aleo.no
TextMate can check any scope, this is true (including comments, which I've turned on and it's saved me some pain from my team)... but the spell checker doesn't handle CamelCase. Oddly enough it does handle underline_vars.
RTFA. The author isn't interested in getting various strings correct, he's interested in getting class names, variable names and function names correct.
I'm not sure spell-checking can really be made to work because, by definition spell-checkers flag anything that is not in the allowed list (also called dictionary) as an error. But source code always contains tons of identifiers that are not real words, like pid, ret, req, riid, etc. The problem is that there are hundreds if not thousands of them in a large project and that you get a ton of new ones making the maintenance of a custom directory a pain.
But I've been annoyed by spelling errors too and what I noticed is that the same errors come over and over again. So what I did is write a script that specifically checks for common typos. And I've very imaginatively called it 'typos'.
What's great with this approach is that, no matter whether you're writing a C, Perl, PHP or HTML file, 'seperate' is never going to be a real word. So we can identify these with no cumbersome custom dictionary, and a very very low false positive rate.
Typos is open-source (GPL) and has no dependency that I know of (besides perl). So you can try it out just by downloading it, making the script executable, and running it with no argument on your source:
I'm working on an Eclipse plug-in that aims to go beyond spell-checking (although it will implicitly do that too), into verifying that the name you choose for your method fits the implementation. This is possible to do since you can extract the approximate semantics of method names from a large set of implementations -- in short, since most get methods tend to do roughly the same stuff, you can capture the essence of what "get" means. A nifty feature that I'd like to work in as well is the ability to automatically generate a reasonable name when performing an "extract method" refactoring. See my papers for details.
In principle, I'm with you on notion of spelling. I think that proper diction is essential to properly communicating and how people perceive you. Let's face it - a well worded reply, for example, is likely to be viewed more favourably than one that is littered with mistakes even if the actual message is the same. Similarly, properly worded code is going to inspire more confidence than one that looks like it was written by children.
But I can foresee situations in which the developers do make genuine mistakes. "Signiture" is probably a poor example because it relates to a very specific task if you're talking something like encryption. But with other less critical tasks, these can bubble up. Ultimately, developing is coming up with solutions to known problems - spelling has little to do with that core exercise. Of course, there's the issue that the commenting may likely follow suit, but that's another story.
I think that truly superb developers should have both qualities but ideal characteristics are rarely easily found. That's why it's necessary for people with different proficiencies are required to spot such mistakes early on. Like the original parent said, a system design document (SDD) would easily nail this from the get-go, especially if multiple people are collaborating on a project. After all, a software development project really shouldn't have only one person involved.
Apparently you haven't lived in the south much. I know plenty of redneck hicks that would pronounce it sig-ni-chure and half of them would probably spell it that way.......how many of them are smart enough to be programmers is up to debate, but based on some of the contractors that have been sent to us that we wind up rejecting, I wouldn't be surprised if it was twice what should be......
Layne
This leaves the problems of ubiquitous abbreviations in code, e.g. QuatMult for quaternion multiplication, and non-English function names in pre-existing libraries over which we have no control. These problems can be solved by counting the occurrences of candidate errors and seeing if the count exceeds some threshold. If "quat" occurs 100 times in your code then it's a safe bet that it's a valid abbreviation and/or part of some widely used library. In that case we could consider automatically adding it to the dictionary. It's only likely to be a misspelling of quit if it occurs just three or four times (I'm assuming that the spellchecking operation is performed frequently enough to catch all such errors "in the bud" before they propagate wildly. If not then it's likely to be a case of StabulDoorHorseBolted.)
If widely-used character sequences are automatically added to the dictionary, we could rely on this same process to add the keywords of the language to the spellchecker dictionary automatically, saving some manual effort. It would then be easy to add the few remaining false positives (rarely used keywords) to the dictionary by hand. Of course there's probably some code somewhere that does all this already.
Only check words greater than 6-7 letters long. Find all dictionary words that are the same length +/- 1 and start with the same letter (nobody gets that wrong). From those, find all words that have almost all the same letters in the same place. (Search from both ends, and if you've covered 80% of the word by the time both searches find a difference, it's a hit.)
Flag if the difference is:
A single vowel replaced by another incorrect one - signiture, independant, definate, seperate
Repeated consonants where there should be only one, or vice versa. - bussiness, occurence
I bet that would catch 95% of these sorts of misspellings with very few false positives.
It's really awful to have to write a very diplomatically phrased email to a team leader to explain that one of his coders has created an API in which they have consistently and unfailingly used the word "Recieve", and that some day that API will probably be part of what we expose externally, and that you'd really appreciate it being fixed before people actually start using it. But it's even worse when the API is specced in a Word doc and the misspeelings are in there, too.
Worst of all, though, is trying to use the damn API while your brain is distracting you with fits of "I before E except after C!"
None the less, the shame I felt in raising the issue at all was matched only by my disappointment that no one else had caught it already.
You mean RTFS. There is no FA, only a summary!
The Farewell Tour II
You have one again confirmed Hartman's Law (or Skitt's, depending on preference; see http://en.wikipedia.org/wiki/Hartman's_law).
"Misspelt" is a legitimate spelling in British English. It's in the OED, with examples from 1762 to 1990.
Since I have just corrected you, I assume I have made an error somewhere in this post, though I haven't managed to find it.
.sig withheld by request
A long time ago i created a spell check task for ant to do just this.
http://code.google.com/p/antspell/
It looks like it has been forked http://code.google.com/p/bspell/ seams to be in active development.
It's in the third word. You missed a letter.
For internal use code, everyone should know what "Whch option are you having the most of?" means, anyway. Heh heh heh.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
For a big project that has time budgeted for this sort of thing, you could adapt any number of spell checking text editors which accept custom dictionaries that can have words added on the fly. The only requirement would be that it could read a plain text file without needing to convert it to some other format first. On OS X I use BBEdit for this type of thing.
In general the idea is to create a custom dictionary with your known set of function names, variables, etc. and have your QC team add new ones as they are doing the check each day. It would probably help to start with a library pre-populated with words of your chosen programming language of course.
A fool throws a stone into a well and a thousand sages can not remove it.
Actually, I'm pretty sure that LaTeX gets its unusual capitalization from TeX, which is capitalized that way based off its logo emphasizing its typesetting abilities. Of course, there are also quite a few derivatives: ConTeXt, TeX-XeT, MiKTeX, TeXeT, BibTeX, and others. And, lest you think you can screen for the existence of "TeX", there's also LyX. Still, dumping them into a user dictionary is a relatively painless way of dealing with them.
Ben Hocking
Need a professional organizer?
Remember... your code will run faster if you remove some, but not all, vowels from your variable names.
To the original question: is strncpy misspelled? What about foo? sqrt? exp? Impl? Programese has an interesting linguistic history and its lexicon contains much not found in English.
While misspelled variable and function names are annoying, a refactor tool and a compile make them relatively painless. Perhaps the best approach would be to take your API documentation, run a script to split CamelCase and words_with_underscores, then feed that document to the spell checker. If it's not in your public API, it shouldn't matter how it's spelled.
Also, externalize your strings so that people with English writing training can write your field labels and error messages. Even programmers who spell check strings often misgrammarize them.
Ceci n'est pas une signature.
$ man creat
Citizens Against Plate Tectonics
Um... Don't you do code reviews before code becomes mainstream or gets released?
Hey now, no need to be like that.
They can come be a SysAdmin, where we value practicality over chisling code on a card punch because "That is the way REAL programmers do it."
Never answer an anonymous letter. - Yogi Berra
A spell-checker either sees the word as being in its dictionary or not, but doesn't know in what contexts it is valid. It doesn't know that a possessive pronoun doesn't have an apostrophe in it, but a contraction or possessive noun does; that there are pairs and triplets of homophones; and other ways in which words can be used incorrectly, yet still be valid spellings in other contexts.
And don't forget 'referer', 'umount', and similar misspelled words that are correct when dealing with computers.[100% ISO 646 Compliant]
SVM, ERGO MONSTRO.
All right, I don't have any answers for you, and I'm not even a programmer (outside of QBasic games), but I thought I'd share this idea with how a programming spell-checker could work. It seems too simple so there's probably flaws with it, or someone would have done it by now, but anyway:
First, as normal, a syntax checker looks for issues when you enter a new line. If it sees what should be a variable* name, a little icon appears next to it which means "new". The programmer seems the icon and is satisfied as this is, indeed, the first time they used the variable.
Also at this point, the computer adds the variable to a variable list
Ok, so on subsequent uses of the variable, the computer looks it up on the list, sees it there, and so doesn't display the "new" icon.
And of course, if you see the new icon when you've used the variable already, then you either made a spelling mistake now, or you did when you first made the variable. Either way, it's brought to your attention (assuming you remember using the variable before). If you clear the line with the "new" icon, it's removed from your variable list automatically.
*Variable is used in this example, there can be other lists for other stuff such as sub-routine, function, constant, etc.
"When the atomic bomb goes off there's devastation...but when the atomic bong goes off there's celebraaaaation!"
What are you looking at while you are coding? Intellisense would have quit displaying anything as soon as you made your 'mistake', thus you'd know there was something wrong. "Hunting around for anywhere up to a minute" just makes it seem like you are either copying someone elses code (since you aren't looking at the screen) or have no idea how your IDE tool works.
The problem is that very autocompletion. That means that you are getting used to long function names where you only skim over the exact letters, so you can in fact get a spelling error in a public class, use it and release it only to actually find the error and fix it too late. And then you can't fix it, because you're breaking your interface!
I was doing work at NASA. NASA was still into punch cards years after very powerful text editors came into existence. I remember the day my girl friend offered to key punch the PDP-11 code I had written onto coding pad to cards. "Honey, you sure can't spell very good. Good thing I caught it. Move is spelled with an 'e'." :-(
I agree that it seems like you'd spend most of your time removing false positives. I'm not totally adverse to the idea, but like a lot of people posting I just don't see how it could be done effectively - maybe that means I'm getting old...
Programming is different than languages that we use to speak - vocabulary and style are nearly unbounded, abbreviations are common-place, etc. You can make up the words as you go along...
Is "refererr" a spelling mistake? Maybe it refers to some error? Maybe they were trying for "referrer"?
What about "Srvc"? "Servc"? "Servce"?
What about "funcy" ? Is it a function pointer or the word "fancy" or the word "funky" ? Does "funCy" make a difference?
To get this anywhere near effective, it seems like you'd have to impose some restrictions on style and variable naming, and yes - I consider that restriction a bad thing.
Something Witty Goes Here
Wow, 240 comments about spelling and programming and no-one's mentioned the famous Ken Thompson quote:
"If I had to do it over again? Hmm... I guess I'd spell 'creat' with an 'e'."
All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)
Still a great IDE after all these years...
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Have a look at Krazy, the EBN's code quality checker if you want more info. It detects common spelling errors and suggests the appropriate spelling (US English); it's done via a Perl script IIRC with different modules for the various check types.
Not really, this is pointless. It may be nice to fix comments, but the very code itself will be rife with specific jargon that no spell-checked will appropriately handle.
Anyone who's worked with a POS system knows the definition of PO. The spell-checker won't pick up on this, but everybody (users, analysts, programmers, testers, support staff) will refer to this object as a PO. So then you'll also need to make the dictionary jargon-aware. Don't know if you've ever been on a government or large-business project, but this is an issue in and of itself. It's typical for big companies to actually have an on-line list (wiki-style) of the "commonly-used" acronyms in the company, and they're not all 3 letters.
I mean, is this function name incorrect? TransferUrisPOToJiruSofToPrint. That means something to people I worked with, but means absolutely nothing to the spell-checker. So even if you or someone else has "the solution", this is definitely not some type "hey, just add spell checker" problem, you're also pumping in 4 different acronyms just to make this function pass.
Let me guess. Your approach to debugging is to write bug free code, and your keyboard doesn't even have a backspace key.
;-)
Program Intellivision!
Hardly any C library functions are real words. Plus, matching the actual name of a function is more important than being "spelled right", so far from checking actual code, it could essentially ONLY check function prototypes and variable declarations, and tell you to refactor-rename them.
We've secretly replaced Slashdot with new Folgers Crystals - let's see if it notices.
OK, but is it smart enough to underline "bgcolr" if you typo for bgcolor?
(yeah, that's what syntax highlighting is for - vim syntax highlighting, at least, isn't smart enough to catch if it's a valid attribute but doesn't belong to this kind of element, say, )
We've secretly replaced Slashdot with new Folgers Crystals - let's see if it notices.
I have a spell checker that's extensable. It's in AppleWorks. I put in all the commands and some commonly used variables and arguments like D$ = CHR$(4) from AppleSoft BASIC. Works great. It'd do assembly too if I put in the 6502/65816 op codes.
As for "I don't want a text spell checker, I want a programming-language-aware spell checker", put down the bong and get away from the keyboard for a while. All spell checkers check text no matter what the content, as long as it's made aware of the text to be tested (ie. can be extended via typed additions, linked text files containing the terms, or extended by asking if terms from proven programs that are marked wrong really are and asks if you want to add it to the dictionary). If you're doing your editing in an unextensable closed and proprietary editing routine built into a programming software package rather than linking to an external editor, you're hosed; stop it.
"I may be synthetic, but I'm not stupid." -- Bishop 341-B
I hate internationalized error messages. Unless someone reported the problem in your language, which is usually not the case, you have to first figure out what the error was in English before you can google anything.
With enough flags?
mark
no, clearly he meant you need to keep all your _identifiers_ in external files too, by "interface" he means API
We've secretly replaced Slashdot with new Folgers Crystals - let's see if it notices.
It's not free but IDEA by IntelliJ can help solve your troubles (if you're using Java). It supports "rename functionality" with intelligent search and replace in Java. It will replace all uses of the code and even rename getters/setters if you're changing an internal variable. So while we can't stop the spelling errors when we make them, we can easily remove them.
There's also a spellchecker plugin for IDEA but it only checks String literals and comments.
-Peter
1. The stop-problem is undecidable only on a device with infinite RAM, if you put an upper bound on the RAM, you get a decidable problem (in theory only).
2. There are some practical ways to construct proofs that a loop ends (remember the CS lectures). Sure, it's not a perfect solution, but if you can't construct a proof that the loop ends, you'd better rethink the loop, and possibly rewrite it.
You're a programmer. You want a programming spell checker. YOU'RE a PROGRAMMER.
So write one, lazy bones.
So far the best spell checker for Eclipse I have found is eSpell, http://www.bdaum.de/eclipse/
eSpell can do C++ and java, I use it for C++. For some reason the main download page above does not list C++, but http://www.bdaum.de/eclipse/eSpell3/index.html does. It does work pretty well, though on large source files I have had it make eclipse a bit sluggish.
here's a paper i wrote for sigplan notices in 2004 talking about the options, and how it works in practice: http://www.jessies.org/~enh/publications/checking- code-spelling.pdf
plus the editor i wrote
http://software.jessies.org/evergreen/
has this functionality.
'I' is a word, but 'i' isn't. And most spell checkers will catch that error.
Someone (TBL) was maybe enjoying a bit of the refer when codifying the HTTP specification...
Errors in programming (and technical documentation) become standards because the act of programming is creation and invention. Spell it "umount" and forever the act of "unmounting" is done via umount. No one bats an eye. When you define a variable everyone else must follow your lead.
If you say, "Hey, one who refers is not "REFER" but "REFERRER" your code will not work. HTTP_REFERER will forever be spelled that way until HTTP/1.0 and HTTP/1.1 retire. However, referring to the meaning of that key must be spelled correctly to be proper English. E.g.: "Check the referrer of the requested link with the optional HTTP_REFERER variable."
Don't want spelling mistakes in your Hungarian variable names? Be the first to create them.
-- @rjamestaylor on Ello
You might enjoy FindBugs. The project also offers an Eclipse plugin.
Why bother.
Its impossible for a computer program to be constructed which can do so for all cases (hence, the halting problem), but that doesn't mean that its impossible to detect some infinite loops, or to detect constructs which are particularly likely to be infinite loops, either of which could, in theory, be useful features in an IDE.
Spelling/grammar checkers for human language aren't flawless, either, but they still have utility. The fact that perfection in a task is impractical or even provably impossible doesn't rule out useful applications.
Shouldn't this be from the HandsOffMyCamelCase department?
I need one that says...
"You appear to be creating an infinite loop. Would you like me to change if ($b=$a) to if ($b==$a)?"
(I did that the other day.)
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
This confuses me :(((((((((((((((
That's a serious question that deserves a non-rhetorical answer, which you are obviously disqualified from providing.
And what makes you think that's going to help? Have you considered the possibility that somebody without "obvious developmental deficiencies" may simply still be doomed to bad spelling? Also, what standard are you using to decide what counts as a "developmental deficiency," and does that standard fly in the face of actual average spelling skills? I.e., have you considered that perhaps you're expecting everybody to be above average?
Spelling only appears to be a trivial skill to those who master it. The relationship between the written representations of words and their phonological representations is very hard to unravel, and is typically only rational when you compare it against archaic stages of the language (i.e., when you compare it to the way people spoke 600 years ago).
Are you adequate?
I was wondering when someone would catch it!
Context is a huge problem when dealing with natural language. The great thing about spellchecking a program, however, is that context is made perfectly clear by the strict syntax. Otherwise, finding a reliable compiler would replace spellchecking as your top concern.
In Visual Studio (C#) you get:
"Assignment in conditional expression is always constant; did you mean to use == instead of = ?"
Kaetemi
If you are too damn lazy or too stupid to type your language properly, then you shouldn't be a programmer. Become an insurance adjuster or something less demanding.
I don't think I'd like to hire someone who can't spell. It shows volumes about you.
Intelligence starts with a keen understanding and application of your language.
if you simply must have it, editplus has syntax highlighting and offers spellchecking dictionaries.
They're using their grammar skills there.
I would rather have a spell checker that checks contents of strings in my source code, but ignores words that do not have suggestions in the spell checker. The latter would make sure I do not get annoying notifications for non-specified acronyms or words that are not meant to be spelled correctly.
... seems to do this well for a good price. http://www.editpadpro.com/index.html
In PHP at least there is a function: <a href="http://www.php.net/manual/en/function.token- get-all.php">token_get_all</a>. This will return an array of tokens contained within a string, which you can then loop through and do magick upon.<br /> />
Please note...this is a very naive and hacky example<br
Like so:
function getFunctionNames($source_file){
$source = file_get_contents($source_file);
$tokens = token_get_all($source);
$function_started = FALSE;
foreach($tokens as $token) {
if(is_array($token)){
if(token_name($token[0])==="T_FUNCTION"){
$function_started = TRUE;
}
if(token_name($token[0])==="T_STRING" &&
$function_started == TRUE){
$function_name = $token[1];
echo $function_name,"\n";
$function_started = FALSE;
}
}
}
}
I was doing a summer program in Brooklyn many years ago, and some of the people were having trouble with their Fortran declarations because the compiler wouldn't accept INTERGER. Sounds like a joke, but true.
The general idea is fine, and I agree that misspellings are a problem.
...
...
Now I spell words the Australian way (mostly like the Brits): honour, colour, centre, kilometre, etc.
Also, I would naturally call a library of mathematical routines libmaths.so and its header maths.h
I find the term "math" foreign and unsightly, like "creat" instead of "create".
So what we really need is some kind of internationalisation in code.
A German-speaker should be able to read German keywords, a Spanish-speaker Spanish keywords, etc. A good source control package should be able to arrange this.
Maybe one day?
e.g. a German-speaking programmer might see:
DATEI *quelle =
and the English-speaking programmer sees:
FILE *source =
Wouldn't that be nice. Or else we translate all the keywords into Esperanto or Interlingua or suchlike.
I am anarch of all I survey.
I probably took an hour the other day going through and pounding the L everywhere I put fufill where I meant fulfill. *sigh* Same way every blasted time! (I can't seem to hit the Q lately either.)
Back in my day when we chiseled our bits into stone and sent them by mule train from village to village...
Other then dealing with camel case, there's no need to stand on your head in Perl; ispell can already spell-check software by using the "external deformatter" feature. It even comes with sample deformatters that handle C, C++ (in two ways), and sh/bash. In fact, one of the reasons I added the capability for external deformatters was to be able to spell-check program comments. To deal with camel case, one might change Ed's Perl script so that instead of converting "camelCase" to "camel Case" and downcasing the result, you instead converted it to something reversible like "camel___Case". Write that to a temp file, ispell it with the appropriate external deformatter, and convert it back to the original form. That said, when I tested the C deformatter on my own code (I think it was the ispell source), the results quickly convinced me to give up. Look through your code again, and note all the variable names that contain unusual abbreviations ("ch", "cp", "ptr", etc.). Note the line comments that have been abbreviated to make them fit. Note the application-specific terminology in the comments, and the huge list of odd library function names. The first time through, you're going to get very tired of adding all those things to your personal dictionary, even if you remember to make a dictionary specific to the application. Yes, there are ways to mitigate the problem, such as providing a predefined dictionary for popular libraries (but have you done ls /usr/lib | wc lately?). But even that's a problem, because a lot of libraries can have function names that will hide legitimate misspellings.
I'm not saying that it's an impossible wish. But it's nontrivial to get it right.
Yes, it is smart enough to underline bgcolr if you typo for bgcolor.
... etc. And it handles much more than html, but it handles those things less rigidly. That is, if it is a php document and you use the function mysql_fetch_array() then it will color code to blue. PHP lets you make your own functions, but BBEdit may not know what those are but still must let those functions be created without fuss, so if you were to type mysqlfetcharray() instead, it would neither mark it as misspelled nor color code it as blue, which is a polite way of saying that it's neither wrong nor right. The problem comes in remaining flexible without sounding alarms all the time -- it's very easy to make typos in custom function names. Also, if one uses html in a php doc (for example) the above syntax spelling starts to break down and it treats html code more generically and fails to start pointing out typos in markup tags. It probably assumes that the tags are for anything-goes xml tags, but I don't know. And I also don't know how good it is for this sort of stuff outside of html/php/perl/mysql, but it's very nice for me.
It also color codes different parts of the tags. The default is blue for the tag itself (so "") and the properties in purple (such as "bgcolor=") and then a rust color for property values and gray for comments
Yep. Programmers should know how to spell correctly in their native language. But hey, all through school those technonerds where likely the same ones who never missed a chance to whine about how they hated their English (or whatever) classes and thought that learning grammar and spelling were a waste of time when they could be doing cool geek stuff. The rise 1337-speak and txtspeak hasn't helped.
At least in the real writing business there are editors trained and paid to catch these errors.
Being unable to spell correctly makes you look really stupid to most people.
Just FYI, if you have a decent programming environment, it should at least flag cases where you've mistyped an existing identifier. If there's an ImmediateFlag in your code, you'd get a warning if you typed ImediateFlag or ImmediateFalg or whatever. Not much help when the programmer is creating new identifiers, of course. Although I've seen cases where the programmer in question for whatever reason decided that because ImediateFlag was undefined then they would just define it, even though ImmediateFlag existed and was what they meant. That ought to get you fired in my book.
Hey by the way, pair programming is a great way to have continuous code reviews and a check on some of the more typical fumble-finger errors.
Good point - this could be a colourful debate. Standardize on American or standardise on English? We need a few comments to gauge a reponse.
Heh yes. "You appear to think this is Pascal/BASIC, would you like me to temporarily erase that part of your memory?"
which is totally what she said
Yes it works in Vim 7.0 - It is job if the plug-in handeling Syntax Highlight. The Plugin should should switch the spell checker on and off depending on which language construct was detected.
Martin
In Vim the Spell Checking is handled by the Syntax-Highlight Plugin which can switch spelling on an of depending on which highlight is used.
But then: All the programming language plug ins I have seen so far do what xemacs does: strings definitions and comments. But HTML and Wiki plug ins tend to use more complex rules.
Martin
No, but neither will an automatic grammar checker - their great toys to play with if you want a laugh, but I have yet to see one that was actually capable of telling the difference between good and bad grammar.
(For example, Word's grammar checker completely missed the misuse of "their" for "they're" above - as trivial and glaring an error as can imagine. Oops, it didn't notice the missing subject in the previous sentence, either!)
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
It's actually impossible for the computer to know whether you're creating an infinite loop.
Oh really? My computer in Aleph-1. Stop being pedantic, dad, and get back to your Delphi coding.
Stick Men
Yup, but I still had to do it for several programs and interfaces built on top of them. I blame it on the UI, but it could be due to inline scripted replacement cowardice as well.
Back in my day when we chiseled our bits into stone and sent them by mule train from village to village...