Linux/Mac/Windows File Name Friction
lessthan0 writes "In 1995, Microsoft added long file name support to Windows, allowing more descriptive names than the limited 8.3 DOS format. Mac users scoffed, having had long file names for a decade and because Windows still stored a DOS file name in the background. Linux was born with long file name support four years before it showed up in Windows. Today, long file names are well supported by all three operating systems though key differences remain. "
Long filenames aren't all they are cracked up to be. I got made fun of once for using one. I can remember it so clearly now, we were in music theory class in high school and we had to use Finale on a Mac (OS 7 at the time) for our composition projects. I named one of my projects something like "Suso's Music Theory assignment number 4 for Mr. Becker 1993-9-24.mus" and saved it. A week later I was on the same Mac and noticed a file that wasn't mine called "Making fun of people who use really long filenames for their music theory assignments.mus". Nobody was admit to doing it but I knew who it was. I was devastated and never felt comfortable again in that class.
Now I'm scarred for life. I should have listened to my parents and gone with 8.3.
But in this particular case, the summary has as much meat as the story, with the added benefit of saying it in a paragraph instead of several ( and even that's too long ).
For those of you who haven't read it, here it is: Windows, Linux and Mac OS X all support long file names, albeit differently. Linux is case sensitive, the others are not.
Tada! Two sentances. I imagine, were I a perl coder, I could have done it in half of one, but there you go.
Mod me down with all of your hatred and your journey towards the dark side will be complete!
Why are computer file names and conventions and protocols so messed up? It's bizarre -- and Microsoft has been one of the worst offenders with one of the most powerful positions and opportunities to make it a better filename-naming world.
I had worked in the DOS world long ago, and I'd always been frustrated with not only the restriction of the 8.3 naming convention, but the added imposition of:
Many years later, I had opportunity to consult in the Windows/DOS world after having worked in the Unix world for over a decade -- figured Microsoft had had enough time and money to work out the kinks in what had obviously been an early-technology constraint for the brain dead old DOS naming restrictions. Not. Sigh.
And then the transition was a nightmare, whoever conjured up the VFAT naming format and the "tilde" mapping backwards compatibility to FAT names should have been shot. A golden opportunity lost.
And then everything swings completely the other direction where anything goes. This may curry favor with users, but wreaks havoc on billions of lines of code which all of a sudden choke on what had been simple parsing routines -- fixable, but at great expense. I still think this was a paradigm shift that somehow could have accommodated the user space/community but still allowed some sanity in the machine world.
But layered on, or dovetailed into that quagmire is the Microsoft insistence they "know better than thou"... and the condescending insistence of dragging the ".3" extension nightmare into the new rules for file naming. Would have been okay to "allow" ".3" naming, but to impose the bizarre rules and behaviors Microsoft has? (How many of you have files named picture.jpg.jpg.jpg out there?)
Options to show extension, defaults to hide extensions, and continued reliance and semantics applied to extensions continue to make the filenaming world a landmine field.
And, Microsoft dares to allow mixed case naming, but does case insensitive handling of file names... don't even get me started about some of the bizarre results and buggy behavior I've traced to that. I only wish I'd had a chargeback code for all of the time I've spent fixing and debugging systems that all come back to the file naming. Sigh, again.
All of this isn't to let Unix and Unix style file naming skate. I've had problems, though fewer, there. But, at least it's seemingly (to me) more consistent and predictable, though there has been what I call "Windows" creep in that there have appeared some apps that somehow think managing and imposing "transparently" the extension to "file type" mapping is a good thing (it's not).
(One of the funniest Unix debacles I experienced was debugging a groups application -- they were moving files around and losing all but one each processing cycle... turned out they were remote copying from one Unix that had 14 (or more, can't remember) char limit on file names to an old SunOS system that allowed only 11. The remote copy that moved files from one system to the other for subsequent processing did so without complaint, the receiving side silently truncated the incoming files -- which were identical in name through 11 chars... essentially copying the incoming files over and over again on top of the same file... Sigh and sheesh!)
Too bad the article didn't mention what happens when you copy long filenames over the network. All kinds of crazy things happen in all kinds of client/server combinations.
Try copying a 40 character file from a windows server to a OSX client. What happens? Well... it depends if you used appletalk or SMB to connect with.
What about OSX server -> a windows client... depends on the version of windows AND OSX of course.
I've had nightmares.
Wanna be safe most of the time:
1) No spaces
2) Under 32 character filenames
3) Alphanumeric, underscore, period, or hyphen ONLY
4) Only a single period allowed.
I've got a CD-ROM that is unreadable under Windows XP because a Mac user put the files in a directory containing a '>' character.
If I can turn off Joliet comprehension I'll have access to the files in their original ISO9660 8.3 glory.
It's unfortunate that Microsoft's Joliet driver doesn't realize it's presenting names the OS can't tolerate. Otherwise it could replace the forbidden characters with % escapes before returning them to the OS. Or, alternately, handing the ISO9660 name to the OS if the Joliet name was forbidden by Windows' rules.
Did you somehow miss the link? It basically said to remove files with a preceding '-' (-filename) you do 'rm -- -filename' or 'rm ./-filename'. And to remove a file with unprintable characters try 'rm file?with?unprintable?characters'.
It's like sex, except I'm having it!
So, your OS supports long filenames, huh? Then why doesn't the vendor use them for all the cryptically named shared libraries, scripts, etc. that clutter up any modern os system directory?
They way I look at it, the day I look at something like "d3d8.dll" or whatever drek is fermenting in \WINDOWS32\ and it is actually named with a descriptive filename, then that OS will truly support long filenames.
Not sure where the Linux crown compares, but OS X is getting better with each revision. Classic Mac OS had this one down (mostly) cold.
The white zone is for loading and unloading only. If you need to load or unload go to the white zone. It's a way of life
Why not simply follow the POSIX standard*? You can avoid a lot of hassles that way. Isn't that why we have standards?? I know, it doesn't resolve the conflict with Windows case "insensitivity", but ... it does provide interoperability between POSIX-compliant OSes.
* upper/lower case alphabetic characters, numeric digits, underscore, dash, and period.
From the wikipedia entry on NTFS:
"Though the file system supports paths up to ca. 32,000 Unicode characters with each path component (directory or filename) up to 255 characters long, certain names are unusable, since NTFS stores its metadata in regular (albeit hidden and for the most part inaccessible) files; accordingly, user files cannot use these names."
The article incorrectly states "Windows file names can be up to 255 characters, but that includes the full path. A lot characters are wasted if the default storage location is used: "C:\Documents and Settings\USER\My Documents\"." I will grant that this may have been a limitation in the past, but XP has had NTFS from the start, and NTFS is by far the most common windows FS today.
Of course - this is a feature, not a bug.
Henderson_Presentation_2005.doc is HENDER~1.doc,
Henderson_Presentation_2006.doc is HENDER~2.doc,
Henderson_Presentation_2006 (unedited).doc is HENDER~3.doc.
Clearly, we are reaping the benefits of a well-thought-out platform here.
The problem with that is that it goes to the first one in alphabetical order. So if you had c:\program files\Microsaucer and c:\program files\Microsoft it will go to microsaucer.
C:ONGRTLNS.W95
If Jesus wants me it knows where to find me.
That is what the -- option is for. It signifies that there will be no further options, so anything following it that starts with '-' will be interpreted as a filename. rm -- -funny-named-file will do the trick.
Bogtha Bogtha Bogtha
Well, if MS hadn't decided to use \ as a directory separator, you could just backslash it. But no. MS had to be retarded.
Goten Xiao
cmd.exe's completion is a bit strange if you're used to bash, but once you get to know it, you can get around quite well.
Example:
Before Windows XP, you had to activate the tab character by changing a registry key. XP has this set by default.
WWTTD?
I just hit the file name issue trying to sync some stuff between unix / Windows XP using rsync.
The case insensitivity was annoying and the limited char set on XP was no good.
Again, you would think they would have fixed this on XP.
I want documents not files. Sometimes multiple files make up one document (webpage + stylesheet + media), sometimes there are multiple documents in one file (zip).
When will anyone come up with a persistant storage system which allows me to make random tags to documents and groups of documents. Drop the folders and give me 'search queries' on content and tags. Automatically save all data and don't bother me with giving it a name... When it's important I will give it the proper tags until then just remember it for me.
Do I have to name the paper before printing?
What I cannot create, I do not understand
Or perhaps more simply:
./-annoying_file
rm
---GEC
I'm but the humble pupil, seeking to snatch the scratchbuilt pebble from the master's fully articulated hand
In Terminal.app, you can create file names with colon, but such character is mapped to a forward slash when seen in Finder. On the other hand, you can use forward slash in Finder, and it is mapped to a colon in the command line.
Historically, Mac OSes use colon to separate folder names in a path.
There is a subtle restriction in HFS+. All files in HFS+ have their names in normalized unicode, and in order to normalize in the first place, file names must be in valid UTF-8 encoding. You cannot use random character string for file names.
There is no such restriction for UFS on Mac OS X. I think UFS supports roughly the same characters as in BSD and Linux and any other Unices. If you're transferring files from Linux with names in a legacy encoding, you can create a UFS disk image and convert file names to UTF-8 before copying them to HFS+.
I once had a signature.
I know this might sound a bit offtopic, but since the post mentioned windows filesystems, I felt it might be a good place to throw this question...
Not many people know or have even used this, but NTFS has support for multiple streams of data in a single file, which is something that borrows concepts from object-oriented filesystems. This is scarcely, if at all, included in the regular windows documentation (it is documented in the MS knowledge base http://support.microsoft.com/kb/105763/). I thought it to be a nice idea for, say, media files, to store the audio in one stream the the video in another, or adding subtitles or metainformation in different streams in a very standard way. But for some reason nobody used that, not even Microsoft who designed the feature.
Does anybody have a clue as to why this has not been used?
www.meneguzzi.eu/felipe
My favorite shell-expansion moment: when I was a new Unix user long, long ago (freshly coming over from VMS), I wanted to remove one funny-named file in a directory. I discovered that rm had this cool switch "-i" that would prompt for removal on each file. Great! I'd just say "yes" to the file named *, or whatever I'd accidentally created. So, being a VMS user (and thus used to switches that went anywhere on the command line), I typed this:
$ rm * -i
Have you read my blog lately?
Bring on the modding, my karma can take it.
Ah the magic words for +5
From here:
Use Short File Names
Every time you create a file with a long file name, NTFS creates a second file entry that has a similar 8.3 short file name. A file with an 8.3 short file name has a file name containing 1 to 8 characters and a file name extension containing 1 to 3 characters. The file name and file name extension are separated by a period.
If you have a large number of files (300,000 or more) in a folder, and the files have long file names with the same initial characters, the time required to create the files increases. The increase occurs because NTFS bases the short file name on the first six characters of the long file name. In folders with more than 300,000 files, the short file names start to conflict after NTFS uses all the 8.3 names that are similar to the long file names. Repeated conflicts between a generated short file name and existing short file names cause NTFS to regenerate the short file name from 6 to 8 times.
To reduce the time required to create files, use the fsutil behavior set disable8dot3 command to disable the creation of 8.3 short file names. (You must restart your computer for this setting to take effect.) For more information about disabling 8.3 short file names, see "MS-DOS-Readable File Names on NTFS Volumes" later in this chapter.
If you want NTFS to generate 8.3 names, improve performance by using a naming scheme in which long file names differ at the beginning of the name instead of at the end.
For more information about short file names, see "File Names in Windows XP Professional" later in this chapter.
liqbase
...you suck at scripting.
...
Typical shell scripting idioms like:
for $each in *glob.pattern* ; do
command "$each"
mv "$each" "$(echo \"$each\" | sed -e s/stuff/replace)"
done
The only extra quoting necessary is in commands with variable substitution. And (while it may seem confusing), that syntax works even when the filenames have quotes internally. The double quotes identifies the contents to be treated as a single token with interpolation to be performed before passing on to ``command'', which is what you wanted.
Also the $() syntax is your friend. But remember to give it ""s too, you don't want it to expand it AND THEN tokenize it.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON