Slashdot Mirror


Should "B" be the Same as "b"?

joshua42 asks: "Although having used Linux and FreeBSD for many years, I have yet to come across anyone seriously questioning the traditional UNIX style file system name paradigm. With an Amiga background (It should be the same for people growing up with Windows, or those growing up with no computer at all (God forbid!).) it took me quite a while to get used to 'A' and 'a' being treated as different characters. This is of course fairly easy to accept and to understand if you have a technical background. I do however have a hard time to see how aunt Ginny will ever be able to distinguish between her 'Letter.txt', 'LETTER.TXT' and 'letter.txt' files. In real life, upper and lower case letters represents almost identical information to most people. Has any thoughts been spent on this issue, now that our favorite OS is becoming increasingly mainstream? Does it need to be addressed? Have any attempts been done? What are the implications to parts outside the file systems?" This is an interesting point. As Unix grows more and more popular, the simple things we've taken for granted about the filesystem may stand in the way of general users adopting it. What ways can you think of that will mitigate this problem for new Linux users without actually affecting too much? Special shells for novice users, that can simplify much of the complexity may be the way to go, here.

22 of 330 comments (clear)

  1. Well, it's not like the OS chooses case for you.. by dmorin · · Score: 4, Funny
    Correct me if I'm wrong, but doesn't Aunt Ginnie have to actually hold the Shift key, or press Caps Lock, in order to get anything beyond "letter.txt"? Therefore, can't it be assumed in the case of Letter.txt that she did it on purpose? Sure, I'll agree that the case of LETTER.TXT is probably a user who put capslock on and forgot about it. But why deny her the ability to express herself the way she intended, if that's what she intended?

    My solution is for the OS to ignore the caps lock key. Not only would it solve the case problem, but it would shut up a whole lot of AOL users.

    :-D

  2. Flame-baitey topic by SN74S181 · · Score: 3, Insightful

    This is a flamebait topic.

    Why don't we ask: "Should the convention for tapping threads in metal be switched to left hand threads by default?"

    Nothing will change as a result of the discussion, and nothing should change. It's the 'simplify UNIX and destroy it in the process' arguement all over again.

    Good grief.

    1. Re:Flame-baitey topic by rtaylor · · Score: 3, Insightful

      End result, having l == L isn't right either. You're just going to piss off people in a non-en locale.

      What you really want are unicode enable file naming with colation patterns applied appropriately per locale the users locale.

      --
      Rod Taylor
    2. Re:Flame-baitey topic by Paul+Jakma · · Score: 4, Informative

      yes there is a good reason. and it is not about "removing" case-sensitivity, it's about adding it.

      The basic issue is that Unix filesystems at the kernel level do not interpret filesystems in /any/ way what so ever. The only restrictions on Unix filesystem names is that no byte within a name may equal '\0' or '/'. You can put whatever characters or bits you want in a filename as long no byte equals ASCII \0 or /.

      This means no "tolower()" or case comparison overhead in the kernel. No complicated (and perhaps non-obvious) policy in the kernel.

      It also means filename schemes are easily extensible in userspace. Eg, Unix filesystems support Unicode, UTF-8, ISO8559-[0-9][0-9], and whatever other encoding system you want provided you respect '\0' and '/'. In fact Unix supported Unicode, UTF-8, etc.. almost from day one (ie 1970), literally /before/ these 'beyond ASCII' schemes were even invented. Unix filesystems also support many many other future encoding schemes that have not been invented yet. :)

      Basically, tolower() / case comparison can be easily done in userspace - hence that is the best place for it. Now, of course, userspace might not always agree on policy or how to implement it, but that is not a kernel problem.

      Case sensitivity is a matter of taste, and as such it's best not done in the kernel (where it will be set in stone forever). That's actually a general Unix design principle "policy should be implemented in userspace" - and it's actually a very good design principle..

      now let's see how many slashdotters fail to realise it..

      --
      I use Friend/Foe + mod-point modifiers as a karma/reputation system.
  3. Apple.... by jeffy124 · · Score: 4, Informative

    Apple OSX is already case-insensitive in terms of filenames, probably for the reason mentioned. MS Windows/DOS have probably all done that for the exact same reason as well.

    Of course, in OSX this did cause a security hole in Apache, but it was small, required a specific setup, and was easily fixed.

    --
    The One Rule Of Chess You'll Ever Need: Don't play someone who carries a kit in their bookbag.
    1. Re:Apple.... by medcalf · · Score: 5, Informative

      Actually, OS X per se is not this way. The HFS+ filesystem used by OS X is this way. Using UFS on OS X (built-in and easily used if you want to) uses a case-sensitive, rather than simply case-preserving, filesystem.)

      --
      -- Two men say they're Jesus. One of them must be wrong. - Dire Straits
    2. Re:Apple.... by Van+Halen · · Score: 3, Insightful
      Yes, the default in OS X is HFS+. Apparently UFS is much slower, and many Carbon applications have problems running from a UFS partition (I use HFS+ but have read this in many user reports). I'm not sure why - OS X handles resource forks on NFS filesystems by creating a ._File.Name file to hold the resource fork. I would assume it does the same for UFS, but I don't know. Or maybe some applications expected to be able to write to the same file using different capitalization. Don't know why they would be so sloppy, though.

      Back to the main topic, I agree with Apple's position on this. I used to wonder why they would put in such a ridiculous limitation when it would be so easy to "fix," but then I thought about it. There is absolutely no reason to have files named "readme.txt" and "Readme.txt" in the same directory. If they are truly different files, then name them differently - ie, "ReadmeFirst.txt" and "ReadmeLast.txt." Capitalization is mostly arbitrary and conveys no information in this context. It can only lead to confusion when a human has to interact with such files. Sort of like George Foreman's house, where his kids are named george, GEORGE, GeorgE, georgE, GeOrGe, etc. What a mess!

      If you're still not convinced, think of all the chaos and confusion that would happen if domain names were case sensitive.

  4. Sort order by myawn · · Score: 3, Insightful
    'A' is different than 'a'. This isn't unique to computer filenames; we all learned this in the first grade. If there is any confusion, it is probably caused by those other computers that blurred a distinction we were all quite comfortable with.

    What is confusing is that "A" and "a" don't sort next to each other -- so, letter.txt doesn't end up following Letter.txt, but instead is down somewhere past Zebra.jpg. That defies reason; if something is to be fixed, let it be that.

    --
    Subscribers can see articles in the future? So what? Everyone gets to see them in the future.
    1. Re:Sort order by Papineau · · Score: 4, Informative

      That behavior depends on your locale.

      Let's say I have a file with the following (meaningless) unsorted content:
      asklhf
      Adjgd
      zaskd
      Zaoifh


      If I sort it with LC_ALL=posix sort myfile, here's what I get:
      Adjgd
      Zaoifh
      asklhf
      zaskd


      Now, that is exactly the kind of behavior that you dislike.
      Try this (LC_ALL=en_US sort myfile) now:
      Adjgd
      asklhf
      Zaoifh
      zaskd


      Much like you wanted it to be, right? The C locale seems to give the same results as posix, and fr_CA gives the same thing as en_US. I'll leave it to somebody else to explain it by looking in a specific standard, or in the source code.

      So in short, check that your locale is correctly specified, and sort should do what you want it to do. Or, you could just use the --ignore-case of sort.

      Or were you talking about something system-wide, for ls, file selection boxes, etc.? Then it depends on where the sorting is done, and might be more difficult to fix (since you'll always miss one place).

    2. Re:Sort order by Narchie+Troll · · Score: 4, Funny

      Oh? Well, as a result of a single capitalized "C" in your above post, you just accused another poster of snorting a soft drink. Capitalization does matter. *chuckles*

  5. 'A' is not 'a' by mmynsted · · Score: 3, Interesting

    >Although having used Linux and FreeBSD for many years, I have yet to
    >come across anyone seriously questioning the traditional UNIX style
    >file system name paradigm.

    >With an Amiga background (It should be the same for people growing up
    >with Windows, or those growing up with no computer at all (God
    >forbid!).) it took me quite a while to get used to 'A' and 'a' being
    >treated as different characters. This is of course fairly easy to
    >accept and to understand if you have a technical background.
    >I do however have a hard time to see how aunt Ginny will
    >ever be able to distinguish between her 'Letter.txt', 'LETTER.TXT' and
    >'letter.txt' files.

    Just like how aunt Ginny was likely somehow able to grasp that her
    name is written aunt Ginny and not aunt gInNy, aunt gINNy, or other
    combination. Give her a little credit. Simply explain that the case
    is part of the file name. Your example Letter.txt file names would be
    a perfect way to show her the difference. Just make each contain
    different information, and open each one to show her they are
    different.

    File systems should be case sensitive. An upper case 'A' is a different
    character than a lower case 'a'. We should not confuse people by
    tricking them when the create file names.

    >In real life, upper and lower case letters represents almost identical
    >information to most people.

    Almost, but not identical.

    >Has any thoughts been spent on this issue, now that our favorite OS is
    >becoming increasingly mainstream?
    >
    >Does it need to be addressed?

    No.

    >Have any attempts been done?

    I hope not. Mount a case insensitive file system if you want one.
    Leave existing file systems alone.

    >What are the implications to parts
    >outside the file systems?" This is an interesting point.

    >As Unix
    >grows more and more popular, the simple things we've taken for granted
    >about the filesystem may stand in the way of general users adopting
    >it.

    The sooner people accept that 'Ginny' and 'gInNy' are not the same the
    sooner they will understand how to interact with a computer.

    >What ways can you think of that will mitigate this problem for new
    >Linux users without actually affecting too much? Special shells for
    >novice users, that can simplify much of the complexity may be the way
    >to go, here.

    How about a mouse-click'n GUI like GNOME, KDE, etc.

  6. some rules of English by nwanua · · Score: 3, Informative

    Yes, I know this is English-specific, but perhaps other languages have similar distinctions:

    what's the difference between:

    "I went to school."
    and
    "I went to School." ?

    In the first sentence, school is being used as a regular noun: which school? Who cares? On the other hand, in the second sentence, School is being used as proper name - there can be only one School.

    In other words, if English speakers can understand the nuance between school and School, then said English speaker (please avoid dissing the US publik skool edukashion sistem) can reasonably be expected to distinguish between letter.txt and Letter.txt (ie. "letter? Which one?" vs. "Letter? ahh yes, THE Letter").

    Anyhoo, an example of a totally confused implementation: Mac OS X: some things understand the difference, some don't:

    ie: /home/Dock and /home/dock go to the same place. Yet do a pwd or try tab completion and it's all confused. (the location in finder is /home/Dock, for clarity). My take on the issue is "I will remember how you named it; just kindly tell me the file you want, exactly how I told you it's called".

    Nwanua.

    ps. if the above is true ONLY for English, all you have to do is politely state that fact, and we'll all be better informed...

  7. Unix has it right by photon317 · · Score: 3, Interesting


    The problem is more complicated than the question makes it out to be. An Ideal filesystem should allow any random binary bits to make up a filename, such that the filenames can be Unicode, so that Chinese people can name files in Chinese, Math professors can use the unicode for a math formula as the name of a document describing how to solve it. When you think in this bigger sense - it becomes a lot harder.

    Ideally the encoding method (Unicode in this example) should provide some way of seeing the equivalency of certain characters (two different representations of the equal sign, two different cases of the letter A, etc..), and the application should be able to make use of this during a regex search, or maybe even during a library wrapped "open() or readdir()" call, where the application is "Windows Explorer", "bash", or anything else.

    Ultimately this has to be resolved in userland tools and the libraries that support them - the best answer for the filesystem layer is to support all possible characters literally and meaningfully in filenames, so as not to restrict the schemes layered on top of it.

    --
    11*43+456^2
  8. Preserve Case but don't make it case sensitive by topham · · Score: 5, Insightful

    The only reason why Unix is case sensitive is because it was easier, and faster to implement it as such in the early days.

    It is not quite efficient to Preserve Case, and not make it case-sensitive.

    I used a file system like this (HPFS) for many years and much prefer it over the case-sensitive alternatives.

    It is also a security concern. If I have 2 files, which are identical except for case it is possible I could run the wrong one. Why? Point and Click interfaces barely show a difference between o and O, etc.

    There is also no need for 2 files with the same name, and different case when it comes to SOURCE CODE. I have seen more than 1 program implemented like this and it is downright confusing and stupid. " No no, not "ubergeek.c", "Ubergeek.c"... etc.

    Garbage. Crap. Total waste of resources.

    I've been working in a database language that is case-insensitive for a number of years as well. It is damn nice to not have to worry about somebody typing something in differently than expected. It isn't a problem. And I don't have to call UPPER every time I do something!

    case-sensitive is a pain in the ass.

    1. Re:Preserve Case but don't make it case sensitive by sigwinch · · Score: 4, Informative
      The only reason why Unix is case sensitive is because it was easier, and faster to implement it as such in the early days.
      No, it's because case sensitivity is the Right Thing.
      It is also a security concern. If I have 2 files, which are identical except for case it is possible I could run the wrong one. Why? Point and Click interfaces barely show a difference between o and O, etc.
      If by "Point and Click" you mean "the egregiously bad fonts chosen for Windows", I agree. They have other problems, such as "1Il" (one capital-eye lowercase-ell) and "O0" (oh zero). (Will the real Bruce Perens please stand up? ;-)
      There is also no need for 2 files with the same name, and different case when it comes to SOURCE CODE. I have seen more than 1 program implemented like this and it is downright confusing and stupid. " No no, not "ubergeek.c", "Ubergeek.c"... etc.
      On the other hand, it is arguably useful to distinguish between file.c and file.C.
      I've been working in a database language that is case-insensitive for a number of years as well. It is damn nice to not have to worry about somebody typing something in differently than expected. It isn't a problem. And I don't have to call UPPER every time I do something!
      All computers are not Vaxes. All text is not 7-bit ASCII. For a general purpose Unicode-compatible system, **THERE IS NO WAY TO BE CASE INSENSITIVE**. Period. End of story. No further discussion. How do you handle "Â" versus "â"? Or "" versus ""? (Capital thorn versus small thorn.) Or "Æ" versus "aE"? Or similar things for terrorist languages like Arabic and Klingon? Or the Russian letter whose name escapes me that looks exactly like a capital "O" but *isn't*. (That one's good for all sorts of fun.) The answer is that you don't even try. Anything you do is going to break badly, and a system that is randomly broken is less useful than a system that is consistent.
      --

      --
      Kuro5hin.org: where the good times never end. ;-)

  9. Think about it a second by PD · · Score: 4, Interesting

    iF yOU wROTE a lETTER tO yOUR aUNT gINNY lIKE tHIS wOULD sHE nOTICE sOMETHING wRONG wITH iT?

    If you think she would, then she can grasp the concept that case makes a difference. Give her a little credit.

  10. In the meantime... by dar · · Score: 5, Informative

    For bash users: Add the following to the .inputrc in your home dir.

    set completion-ignore-case on

    Then when hitting tab to complete a filename, it will fix the case for you. i.e. typing "vi xf8" and pressing tab will get you "vi XF86Config" etc.

    --
    My other Slashdot ID is much lower.
  11. Linux file system is OBSOLETE by Jouni · · Score: 5, Interesting

    ... and now that I got your attention, let me specify that all other file systems as well are obsolete in the context of the USER INTERFACE.

    Frankly, aunt Ginny should *never* have to deal with files and file names. She should not need to know what a file is, nor choose to "save" or "discard" her work after she has written the letter to her friend Margaret. She does not know her HD from her RAM, and all for the better. She would worry to death over having her letter spun around on a magnetic disc, it would get all jumbled up for sure!

    File system is an internal, abstract and archaic database that is familiar to programmers and geeks, but a lousy way to represent data for the general user. There are few things worse than navigating a blind hierarchy of unknown folders with no contextual guide to help.

    The system should remember the letter when it is written, keep tabs on when it was written, put the subject in a "recent letters" list and generally manage the internal filing transparent to the user. The storage capacity of a modern computer can last aunt Ginny for years, the real trouble is in FINDING her data, the file names alone do little good for that.

    For a wonderful example of how well you could do without a filesystem, look at the operation of the Palm OS devices. Anyone could learn to use them. No files in sight! It's only recently that the clever engineers at Palm jumped off the deep end by adding a file system for the flash carts. Anyone who has ever used those knows what a nightmare managing them is.

    Aunt Ginny knows fsck all about file systems. Lets keep it that way.

    (Oh, and the answer in the context of user interfaces? Go for the most HUMAN representation. People are not very sensitive at all to upper/lowercase letters. We should not punish them for this.)

    Jouni

    --
    Jouni Mannonen | Game Designer, Consultant
    1. Re:Linux file system is OBSOLETE by Kidbro · · Score: 3, Insightful

      People are not very sensitive at all to upper/lowercase letters.

      lEt ME asK yOu OnE thiNG jOUNI, arE yOu aBSoLuTElY SURe abOuT thAt?

      I know for sure that I am sensitive about it, and it really gets on my nerves when they're not used properly. Of course, people are different, but most [non 1337 script kiddies] people do care...

  12. Aunt Ginney won't care! by zulux · · Score: 3, Insightful

    It's not like your aunt is going to use Emacs - she'll just point and click with whatever graphical software she is using.

    Leave case sensitivity alone - it's the right thing to do, just hide any ease-of-use problems it may introduce with a GUI.

    That keeps the smart people happy, and the dumb people happy. We're all happy!

    --

    Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.

  13. Re:Well, it's not like the OS chooses case for you by Cy+Guy · · Score: 3, Informative

    what is the advantage to having "Aaa.txt" exist in the same directory as "AAA.txt"

    Because the file name Aaa.txt is "0x41 0x61 0x61 0x2e 0x74 0x78 0x74" in hexadecimal which is not the same as "0x41 0x41 0x41 0x2e 0x74 0x78 0x74" the hexadecimal equivalent of "AAA.txt".

    When you get down to the core operation of the kernel, it shouldn't be burdened with having to do conversions of 0x41 to 0x61. If some one writing an application wants to make that distinction, that's fine (and could easily be incorporated into programming libraries), but it shouldn't be the job of the OS.

  14. Re:Unix has it (almost) right by gehrehmee · · Score: 3, Insightful
    white space - Filenames with embedded white space work with most basic system commands, but will break shell scripts that aren't prepared for them (which means most shell scripts).
    Unfortunately, any such script will break on alot more then just whitespace. Failing to catch this sort of thing is a bug in the script or application, not a shortcoming of the OS or filesystem.
    --
    "You know, Hobbes, some days even my lucky rocketship underpants don't help" -- Calvin