Who Needs Case-Sensitivity in Java?
David Barber asks: "I've just started learning Java, and to my exceptional disappointment it is as case-sensitive as C. I'd like to ask Slashdot readers to make the case for case-sensitivity in a programming language, because I can't see it. Although I've used C on and off since 1976, I also have a history of Fortran, COBOL, PL/I, assembler, and other legacy languages that were never case sensitive (perhaps due to the single case nature of card punches). Today I use modern languages including Visual Basic which preserves case for pleasing appearance, but is not case-sensitive itself (it will correct the case for you in the IDE, which is quite nice). In all my years of programming I have never seen the rationale for making a programming language case sensitive. It simply makes typing it in harder, and mistakes easier, yet we persevere with maintaining it in modern languages like Java. Without making this into a religious war, can someone make the argument of why case-sensitivity in a language is 'a good thing'? And don't confuse this with handling case-sensitive data, which is fine."
Here are some reasons I just made up (though a couple actually affected my programming).
Because it makes sense that all symbols are uniquely identified from a set of characters, rather than each symbol being identified by a huge set of names (var, vaR, vAr, vAR, etc). There may be a need for a "canonical name", which is it? All lowercase? All uppercase?
Because it makes dynamic programming and reflection even slower and/or more error-prone (I have experienced this in PHP which is case-insensitive and it bugged the hell out of me [and my program]).
Because it takes fewer CPU cycles when compiling or scanning source code.
Because some languages use case to indicate a different class of variable (Ruby for instance, issues a warning if you try and change a variable starting with uppercase).
Because many programmer's text editors are case-sensitive (I know, I know, chicken, egg, etc).
Because lowercase/uppercase could be a harder problem if you use a language which allows Unicode symbols (Perl6?). (Is this possible? I have no idea).
Because sometimes it actually is useful to have a symbol "ID" and another one "id" in the same symbol table.
Because stuff like case and english language is not part of programming, programming is about precision and computers. Introducing ambiguity (whether for the compiler or the programmer) can't be good.
Because C is case-sensitive, and C is a popular language.
You might want to try PHP5 though, it's a lot like Java but case-insensitive.
When I make a class Person {..} I want the other developers to use
Person person = new Person(..); not
person person = new perSon();
It also becomes a mess when you have some people write
If(something){
}
later you see IF or if.
Case sensivity preserves sanity and helps enforce coding standard.
It's a good thing, learn to deal with it.
In my opinion, case senssitivity allows for more readable code if using long variable or method names .
For instance:
MySteadfastObject.doSomeReallyBizarreParsing()
instead of
mYSTEadfasoBJEct.DOSomerEAllybizaReparsiNG()
Emphasizing readability instead of easy-writing is (mostly) a Good Thing (TM).
- SetSlower is a procedure that reduces the speed
- SetsLower is a function that gets a lower bound in a set of sets
These are completely unrelated identifiers which are rendered equivalent by BASIC and other case-insensitive languages. It may look like a stupid example, but I've been annoyed on several occasions by misinterpreations of VB code that were caused by case-insensitivity. As a C/C++/Prolog/Haskell/Modula/... -coder I'm probably biased toward liking case-sensitivity, but I can't see why liking case-insensitivity should be objectively better; be more than just a bias.
--
What is wanted is not the will to believe, but the will to find out, which is the exact opposite -- Bertrand Russell, "Skeptical Essays", 1928
Therefore, lets leave this issue as it is until someone comes up with good arguments to choose either one or the other.
Now lets think...what would happen if it didn't error out because of case sensitive erors? Wouldn't that make it "easier" to make mistakes?
This set me thinking. The guy who posted the article would probably prefer Perl to Java. It is case sensitive, but will let him get away with his sloppy coding practices by simply creating a new variable every time it encounters one which only differentiates from another in case. Then once he's learnt the error of his ways, he'll either return to Java with a greater appreciation of it's reasonably strict syntax, or become a fan of "use strict;" ...
I remember reading the Jargon File entry on "discipline and bondage" programming languages, which was quite disparaging about them. I found myself in wholehearted disagreement with that attitude, as most programmers I work with need to have discipline imposed on them - coding standards, style checkers and peer review help, but the quality of C code seems to generally better than Perl or C++ simply because the language is much smaller and narrowly defined.
Chris
Being case-sensitive AND following capitalization guidelines makes code much easier to read. I don't see any reason to allow the same characters with different capitalization to refer to different variables, but I definitely think any references to a variable or function should be capitalized the same way it is defined and that all keywords should be capitalized consistently.
Since programming languages are meant for use by technical people, and since computer programming and mathematics are so intimately related, it pays to let computer programmers use the same tools that mathematicians do
I only partly agree with this statement as I think there are two distinct types of programming - the technical, mathematical type and a far less mathmatical 'data processing' (ie pretty forms for putting data in databases) type of programming.
With DP case sensitvity just gets in the way - the programmers may not be very technical and any maths used would probably only be complicated by use of the 'same' name for different variables.
I think VB is definately in the DP category of languages and hence has no need for case sensitivity. As Java is often marketed as a DP language it's case-sensetivity is a serious drawback (at least as far as DP is concerned - in other areas of Java's use it may be usefull).
Tk
At some point, somewhere, the entire internet will be found to be illegal.
they should allow the use of subscripts, superscripts, and Greek letters too, to make the notation more powerful and more intuitive.
I think this should be handled on the editor side rather than in the language. For example, you'd still name a variable sigma_1 but the editor would display it as a sigma character with subscript index 1. This has a great benefit: it maintains compatibility. People can work on the code even if they only have access to a regular text editor, and it'll be encoding-safe.
It'd require strict naming conventions, of course, but people should use strict naming anyway.
C is carefully designed so that it does not assume that the underlying platform on which it runs is natively using ASCII. A number of relatively obscure features, especially trigraphs, were put into the language specifically to make this work.
While case-folding is fairly easy in ASCII because upper and lower case letters are exactly one bit distant, it would substantially complicate compilation on other platforms. It is relatively unnatural for the computer to allow case-insensitivity, even in ASCII, and in machines that natively use something other than ASCII it can be quite tedious.
Having dealt with C implementations that are targeted for machines which are radically different from what most people are used to using, I have a lot of respect for the portability of C. For example, I once worked with a C implementation on an IBM mainframe processor that had no stack, so the C stack had to be synthesized using machine registers and memory conventions, but this worked!
C was designed to be small AND portable. Java was designed to be, well, portable. No matter how careful you try to be, dropping case-sensitivity from the language would lead to nightmares when trying to achieve portability.
Meaningful names are long to reduce ambiguity. It shouldn't be necessary to read the code to understand the intent of a variable or function.
i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net
Must disagree. I can't think of a language that enforces clean code. Can anyone?
Conventions matter as far as clean code, even at the level of case choices. Language choice doesn't help here, does it?
Speakers of other languages are not quite as fortunate. I'll try to explain, but the horrible lack of Unicode slashdot coupled with the extremely stupid character filter will make this slightly more difficult than it should be.
German, for example, has a letter which is basically a "double s". This letter only has a lower case form, in upper case, this letter becomes "SS". However, "SS" in lower case becomes "ss", not "double s".
French has a character which is a lower case "e" with two dots above. The upper case form of this letter is the normal "E" in france, but in french canada this letter becomes an upper case "E" with two dots.
There are other languages which has characters that only exist in upper case or lower case forms.
Do you realise just how complex the casing rules becomes when you have to take these things into consideration? Keep in mind that Java supports all unicode characters in symbols.
The exact same argument can be used when explaining why the oeprating system kernel shouuld not have case-insignificant file names. This is a localisation issue and neither your java compiler nor the operating system kernel should have to worry about what locale you have in order to determine how a certain string of characters should be interpreted. (yes, encoding issues always creeps in, but that's on a different level).
Just think about it. Your program compiles properly if you select "french france" when you log in, but fails when you use "french canada".
Don't you think it's easier just to specify that symbols are case significant?
I couldn't disagree more. If I read DeviceEntity.java, and it says:
:-).
getDeviceName(customerId)
I for damn sure don't want to search for the same thing in DeviceServlet.java and be frustrated because it says:
GETDeviceNAME
ugh. At the most basic level, this is dumb. If I mistyped the name with incorrect capitalization, it *won't compile*! This is not a subtle error, it would be obvious! So it does not at all lead to errors.
Case sensitivity means I know a variable, method, whatever, is always going to "look" the same to me when I'm scanning the code or when I type a quick search into vi (/getDeviceName). I don't need my new intern who likes a different set of notation littering my code with GetDeviceName because the compiler lets him get away with it
I'll agree with you that having both ThisIsImportant and thisisimportant in a module and relying on case-insensitivity to differentiate is probably not a good idea, though...
It's a strange world -- let's keep it that way
This is actually a problem with java ...
...
This is NOT possible in Windows NT
Some might argue that this is a problem with Windows, not Java.
With DP case sensitvity just gets in the way - the programmers may not be very technical and any maths used would probably only be complicated by use of the 'same' name for different variables. [...] As Java is often marketed as a DP language it's case-sensetivity is a serious drawback (at least as far as DP is concerned - in other areas of Java's use it may be usefull).
I think a better distinction is between scripting languages and programming languages. Scripting languages are meant for short bits of coding by non-experts. Programming languages are meant for large bases of code built by professionals.
It's a continuum, of course; no language is used for only one of those. But Java is clearly intended to be pretty far towards the professional end of the spectrum. Non-experts working on small projects should pick a language better suited to their needs; Java will seem to them to be balky and annoying.
And as an aside: non-experts should stick to small projects. I think the huge danger with scripting languages (in which category I'd include things like pre-dot-net VB) is that although they are great getting non-programmers into doing a little programming, they let people get away with a lot of stuff that is dangerous on larger scales.
It's as if a guy who successfully changed a lightswitch in his house grabbed his trusty screwdriver and tried to tackle wiring a 500-rack server facility. He might get some stuff working, but it would be flaky, dangerous, and impossible to maintain. Just like so many code bases I've seen put together by "not very technical" programmers.
This is a non-issue. You can bring the same argument and it would not work exactly the same way on strong typing, or you could ask that the compiler catches obvious spelling errors.
The fact of the matter is, if you had used a variable called foo somewhere, you should have no reason to call it Foo elsewhere and FOO later. No matter if the language allows case sensitivity or not, that would just be bad practice and would confuse the hell out of anybody who would try reading your code.
Not even going to mention what other posters touched upon, that casing is used to indicate the entity type (constant, variable, class, function etc).
You're snobbish hatred of a language that's really Not That Bad is indcative of the fact that you are not "A Real Programmer".
Some very powerful things have been done with visual basic, and the true test of a "Real Programmer" is doing those Powerful Things on time, underbudget, and in Good Working Condition regardless of the environment of choice for the application.
Notice my hideous but Meaning Laden usage of capitalization. While i don't believe that a capitalization scheme should be enforced by the compiler, i do appreciate having it as a tool to enforce coding standard schemes.
Smurfy,
-T
Old truckers never die, they just get a new peterbilt
Consider this snippet of java code:
And yet you see TransactionRolledBackException propagating to a higher level. You have looked at the code and swear you are catching it. Oops, missed that upper case B.Arguments that you can do something like this:
are irrelevant. This is also perfectly legal in java:Case sensitivity does nothing to enforce preferred conventions, and actually enforces unreadable code.Would I like to read code that says System.arrayCopy(...), or Color.BLACK ? Sure I would, but that wouldn't compile. Case sensitivity enforces the wrong thing (literal consistency) rather than convention (conceptual consistency).
Would I like to rename that class which someone called subProductPanel to SubProductPanel? Sure I would, but that would involve changing the 15 classes that reference it, not to mention the file name, which would confuse CVS on windows clients terribly.
So I am stuck with my subProductPanel class name without more effort that it is worth to fix, because of case sensitivity. If everything was case insensitive, then I would be able to change it easily.
As for portability, case sensitive and case insensitive are incompatible and equally create portability problems. Windows having a case insensitive file system causes limitations in Java's portability because Java considers MyClass and myClass to be two separate names needing two separate files. Windows will not support that. So Java code that runs on Linux will not work on Windows, and vice-versa (think about code which reads from the file system).
Bottom line:
Case sensitivity creates an environment condusive to hard-to-find bugs because people aren't case sensitive. They may notice case when it is out of place, but not normally.(sOMEtHINgouToFplACE vs. somethingOutOfPlace will be noticed. Not, however someThingOutOfPlace or SomethingOutOfPlace and only programmers consider SOMETHINGOUTOFPLACE different).
A case sensitive environment enforces only the non-use of conventions. A case insensitive environment lets you use the convention even if the author of the API forgot to.
As for BasicSend vs. BasicsEnd, that is an argument to use something other than case to differentiate words, not just hoping that people notice the difference. Wouldn't Basic_Send and Basics_End be much better?
As for differences in how languages handle case insensitivity, String.toLowerCase() figures it out, so would the computer language. You establish rules and stick to them. Will a language which targets the English speaking get all other languages right (the way most speakers of that language would intuit)? Probably not. That doesn't mean that the target audience has to spend thousands of hours chasing down unnecessarily hard to find bugs.
For heaven's sake, if you mispelled on case in a program, 99% of the time you'll pick it up in the first compile - problem solved!! Sounds like poor programming discipline and laziness more than anything else.
Case-insensitivity is probably the single most annoying thing with otherwise decent languages such as Scheme. In case-sensitive languages, errors in casing are usually catched by a compiler. With case-insensitive languages it's easy to make errors that are not catched, because the different looking identifiers are actually the same.
My .02 euros. YMMV.
Software should be free as in speech, but if we also get some free beer, all the better.
I agree that one should think carefully before using short variables, but there are a lot of cases where using longer ones just doesn't make sense. Concrete (although trivial) example (in Scheme):
There is simply no reason why this simple function would need a longer variable name than "x". The question is, can I use short variable names, and still write code that others can understand, or should I opt for longer names just for the sake of readability. For a function shorter than say, 5 lines, anybody can read it independent of the length of variable names (as long as you don't deliberately try to fill the lines ofcourse). In fact, it might well be harder to read, were longer names used, because the overall code would be longer. (In Java it often happens that you have to cut a single function call to multiple lines simply because you have to deal with extensively long identifier names. Ofcourse "import" can solve part of this, but then the reader of the code has to know what's imported if you ever have two classes with the same basename, which again is often impossible to prevent, even with base J2SDK.)
I agree though, that "Foobar" and "foobar" could just as well be made something like "foobar" and "foobar_obj" or what ever naming convention one happens to prefer. Experience has shown that a lot of people seem to like the "turn the first character to lower-case" convention.
PS. I realize perfectly well that the larger your codebase, and the more you have people working on it, the more descriptive names you need.
Software should be free as in speech, but if we also get some free beer, all the better.
ASCII has a relatively simple, consistent mapping from lower- to uppercase and vice versa. Letters all have a single upper-lower 1:1 pair, while all others are caseless and upper and lower are the same. It's a simple map.
But ASCII is obsolete. Java doesn't use it, even for variable names. When you go beyond ASCII, you have many different case maps to choose from. Different cultures have different case equivalency rules, some of which are rather complicated. And then, as others have mentioned, there are the coding conventions that make use of casing to make distinctions, but these conventions ride on top of the cultural conventions regarding which distinctions matter. (I don't want to get into the details, but what one man considers a single letter another might consider to be two, for example.)
You can avoid all of these thorny cultural disagreements by either limiting all identifiers to ASCII with its single equivalency map, or by just making the rule that identifiers are the same if and only if they are composed of the same sequence of characters and avoiding the issues of equivalency maps altogether. (Until you get to normalization, which is another can of worms....)
Since Java allows a much wider range of characters in its identifiers than the puny ASCII character set, it chose the second option: ignore equivalency maps and declare that "if it's not the same, it's different".
"Those who have never entered upon scientific pursuits know not a tithe of the poetry by which they are surrounded."
IMHO it helps enforce coding consistency. Very helpful when writing and debugging code. If you wrote a program using hungarian notation and say use: //my java is rusty
String arrMyArray[5];
It is easier if case sensitivity won't let the lazy type arrmyarray[x]. It almost disappears in the text. When debugging and examining code, consistent variable names help you pick stuff out of the fray, visually, easier.
I am a huge fan of languages such as C and java which are fast, strongly typed, and rigidly enforce consistency in referencing variables.
Slop=hard to maintain.
Imagine reading a book where the case is all whacked. It wouldn't be as easy.
neilio
What if I'm iterating over the elements in a 2D array--say a bitmap? Should I not use x and y for my counters?
Why? Giving temporaries/counters descriptive names makes them look important, when they're generally just a detail. Besides that, in languages like C++ and C99, they're generally declared at the point of their use, so there's no question about what they are doing. Even when they're declared far away, we know that names like i are indices. What does counter tell me that I wouldn't already know?