Linux/Mac/Windows File Name Friction

Damn this is exciting! by windowpain · 2006-07-10 01:47 · Score: 1, Insightful

Slow news day?

--
Insert witty sig here.

I RTFA by grasshoppa · 2006-07-10 01:53 · Score: 4, Insightful

But in this particular case, the summary has as much meat as the story, with the added benefit of saying it in a paragraph instead of several ( and even that's too long ).

For those of you who haven't read it, here it is: Windows, Linux and Mac OS X all support long file names, albeit differently. Linux is case sensitive, the others are not.

Tada! Two sentances. I imagine, were I a perl coder, I could have done it in half of one, but there you go.

--
Mod me down with all of your hatred and your journey towards the dark side will be complete!

Re:spaces bad, special chars bad by Chris+Graham · 2006-07-10 02:00 · Score: 5, Insightful

You have some good points, but I really can't agree with two of your complaints... "the semantic mapping of the extension to filetype, WTF" It seems far better to me than mime-types or magic strings. Mime-types fail due to not being actually encoded on-filesystem, and magic strings require users to use a hex editor to try and identify an alien file type. "the case insensitive nature of file names" Case sensitivity is a big usability issue for people, so burdening the few (the programmers) so that the majority (the users) don't get confused, is a fair trade of IMHO.

If this had been fark... by Siberwulf · 2006-07-10 02:02 · Score: 1, Insightful

There would be about 40 "Why the hell was this greenlit".

That aside, isn't Slashdot "News for Nerds" ????

How about posting more info about why WinFS was scrapped, rather than how windows/linux/mac has been for the past X years.

Bring on the modding, my karma can take it.

Bah - OS Vendor support of long filenames by Rauser · 2006-07-10 02:02 · Score: 5, Insightful

So, your OS supports long filenames, huh? Then why doesn't the vendor use them for all the cryptically named shared libraries, scripts, etc. that clutter up any modern os system directory?

They way I look at it, the day I look at something like "d3d8.dll" or whatever drek is fermenting in \WINDOWS32\ and it is actually named with a descriptive filename, then that OS will truly support long filenames.

Not sure where the Linux crown compares, but OS X is getting better with each revision. Classic Mac OS had this one down (mostly) cold.

--
The white zone is for loading and unloading only. If you need to load or unload go to the white zone. It's a way of life

Article is incorrect by joshv · 2006-07-10 02:03 · Score: 5, Insightful

From the wikipedia entry on NTFS:

"Though the file system supports paths up to ca. 32,000 Unicode characters with each path component (directory or filename) up to 255 characters long, certain names are unusable, since NTFS stores its metadata in regular (albeit hidden and for the most part inaccessible) files; accordingly, user files cannot use these names."

The article incorrectly states "Windows file names can be up to 255 characters, but that includes the full path. A lot characters are wasted if the default storage location is used: "C:\Documents and Settings\USER\My Documents\"." I will grant that this may have been a limitation in the past, but XP has had NTFS from the start, and NTFS is by far the most common windows FS today.

Amiga by Dan+East · 2006-07-10 02:07 · Score: 2, Insightful

Amiga has had long filename support since it was first released in 1985.

Dan East

--
Better known as 318230.

Re:c:\progra~1\Micros~1\Powerp~1 by stupidfoo · 2006-07-10 02:09 · Score: 4, Insightful

The problem with that is that it goes to the first one in alphabetical order. So if you had c:\program files\Microsaucer and c:\program files\Microsoft it will go to microsaucer.

Re:NTFS WTF? by fruity_pebbles · 2006-07-10 02:12 · Score: 1, Insightful

NTFS paths can be up to 32767 characters. Or so i read - I'm too lazy to try it myself.

Re:Long filename horror story by mausmalone · 2006-07-10 02:24 · Score: 5, Insightful

Perhaps that's a bit long of a file name, but it's at least descriptive. I can't tell you how many times I've gotten files titled Agenda 01.doc when they should be more like Tech Committee Agenda 2006-05-01.doc -- it's not excessively long, but with a file name like that I know EXACTLY what's in that file.

--
-=-=-=-=-=
I'd rather be flamed than ignored.

Re:Any way to turn off Joliet support in Windows X by GotenXiao · 2006-07-10 02:25 · Score: 3, Insightful

Well, if MS hadn't decided to use \ as a directory separator, you could just backslash it. But no. MS had to be retarded.

--
Goten Xiao

Re:spaces bad, special chars bad by Tom · 2006-07-10 02:49 · Score: 4, Insightful

It seems far better to me than mime-types or magic strings.

Seems, yes. Is? No way in hell.

The problem is that extensions are part of the filename, i.e. they are arbitrary. Mapping arbitrary data to meta information is stupid at best, dangerous usually and in combination with hidden extensions and automatic execution it is a blatant disregard of even the most basic security procedures.

aka "lookhereiamcertainlynotavirus.jpg.exe"

--
Assorted stuff I do sometimes: Lemuria.org

rsync by gatzke · 2006-07-10 02:50 · Score: 3, Insightful

I just hit the file name issue trying to sync some stuff between unix / Windows XP using rsync.

The case insensitivity was annoying and the limited char set on XP was no good.

Again, you would think they would have fixed this on XP.

Re:Unusual characters in filenames by MS-06FZ · 2006-07-10 03:17 · Score: 4, Insightful

Or perhaps more simply:

rm ./-annoying_file

--
---GEC
I'm but the humble pupil, seeking to snatch the scratchbuilt pebble from the master's fully articulated hand

Re:Any way to turn off Joliet support in Windows X by Anonymous Coward · 2006-07-10 03:22 · Score: 1, Insightful

> If I can turn off Joliet comprehension I'll have access to the files in their original ISO9660 8.3 glory.

Use IsoBuster. (The program is shareware, but has a free mode with less functions. The free mode is enough to access the 8.3 files)

colon in Mac OS X file names by pikine · 2006-07-10 03:29 · Score: 3, Insightful

OS X supports up to 255 characters and can use the same characters as Linux, except for a colon (:).

In Terminal.app, you can create file names with colon, but such character is mapped to a forward slash when seen in Finder. On the other hand, you can use forward slash in Finder, and it is mapped to a colon in the command line.

Historically, Mac OSes use colon to separate folder names in a path.

There is a subtle restriction in HFS+. All files in HFS+ have their names in normalized unicode, and in order to normalize in the first place, file names must be in valid UTF-8 encoding. You cannot use random character string for file names.

There is no such restriction for UFS on Mac OS X. I think UFS supports roughly the same characters as in BSD and Linux and any other Unices. If you're transferring files from Linux with names in a legacy encoding, you can create a UFS disk image and convert file names to UTF-8 before copying them to HFS+.

--
I once had a signature.

International support by b1t+r0t · 2006-07-10 03:34 · Score: 2, Insightful

There's a whole new dimension of fun when your file names include non-Roman characters, such as Japanese.

First of all, there is the matter of which encoding the file names are in. Lots of Japanese Windows installs and their utilities still use Shift-JIS for file names. OS X, on the other hand, uses Unicode, and typically expects UTF-8 for file names from programs. In fact, it not only expects it, it enforces it, returning an error when attempting to use a file name which is invalid UTF-8.

Many command utilities that deal with archive files utterly fail on OS X when given archives using Shift-JIS file names, and many others improperly translate it as 8-bit ISO Latin I. A few (such as the command line RAR archiver) are actually smart enough to make a system call to translate the file name from Shift-JIS to UTF-8.

And then there is the issue of Shift-JIS MP3 tags. If you open those with iTunes, not only do they get interpreted as ISO Latin I, but irreversably so if you do something that writes them back to the .mp3 file. (They get written back as a UTF-8 representation of the ISO Latin.) I've had luck in the past using a hex editor and SimpleText in Classic to convert them with much work, but I'm not sure what I'll do with the new Intel Macs that don't support Classic.

--

--
"Open source is good." - Steve Jobs
"Open source is evil." - Microsoft

Think about it... by ratboy666 · 2006-07-10 04:26 · Score: 2, Insightful

The purpose of the "OS" (its actually not the OS here, but lets use that term to make the following discussion clear) is to provide the set of tools needed to implement your "paradigm" (again, not true, but it will do).

Your way of thinking.

As it turns out, having multiple "files" composing a "document" is easily mapped in a hierarchical layout. As a simple idea, put all the files into a node and call that node the name of the document.

The "OS" should not impose upon the applications, but should provide ready services that map well into what the application(s) want.

Unix further provides "hard" and "soft" links to allow you to do (for example) sharing. As an example; you have a boilerplate logo image. It can be hard linked into your documents.

"Random" (I do not think you really want random) can be accomplished with soft links.

Content searching? Either "find" or "grep" will do (ok, for up to several hundred megabyte of content -- and if you have hit THAT limit, let me know -- its a separate discussion).

You will have noticed that I have (so far) eschewed GUI tools in this discussion. The blatent omission of find/grep and other tools has mystified me. Either it is hard to do (semantic mapping of symbolic language to pictures) -- which is true, or the GUI designers are deliberatly dumbing things down.

It would be nice to have an "assembler" in file open boxes: I would like to be able to say "Please open a file containing project in the name, whose contents include September 10, which was last modified in 2002, of type ASCII text".

Now, all the tools to do this are included in the "CLI" interface: find, grep, ls, file. But, when we hit the GUI, these tools vanish. "NO SYMBOLIC REASONING FOR YOU - STICK TO THE CONCRETE" is the slogan.

Since the "classic" Unix GUI is basically X supporting XTERM, which in turn launches applications, the CLI is still there. But in modern Linux, Unix, OS X environments, most users are never exposed. And, in turn call for more "paradigms" to be created.

And, this is HARD. Witness Microsofts failure with "WinFS". Witness that the largest jump has been to Plan9, which extends the Unix way (by putting more stuff under this control). Witness the success of mapping things like "/proc". There really HASN'T been a new "paradigm" that offers more.

The problem is that trying to utilize the filesystem is lost in the GUI translation. Apple indexes files, by content, for GUI consumption. This is NOT a new breakthrough -- Unix has had "locate" which is most of the way there for ages. Indexing by content? Again, not new. Merging these ideas is fair, and I wish that Apple had based the kit on CLI for maximum portability.

Ratboy

--
Just another "Cubible(sic) Joe" 2 17 3061

Re:c:\progra~1\Micros~1\Powerp~1 by RetroGeek · 2006-07-10 04:34 · Score: 2, Insightful

This happened because backup processes the files in alpha order by their long file name. And Windows does not allow a program to specify the short file name.

So "Program Files" becomes progra~1 and "Program Access" becomes progra~2.

"Program Access" gets backed up first with "Program Files" backed up next.

Upon restore, "Program Access" is first, and thus becomes progra~1, then "Program Files" which becomes progra~2.

Oops!

And many MANY applications still use 8.3 to locate files, including many MS applications.

So if you are going to do file backups, you should really do disk images, as file by file backups may cause serious system failure upon restore.

Disclaimer: AFAIK, this was the situation up to Win2K. It may have been resolved since then.

--

- - - - - - - - - - -
I am a programmer. I am paid to produce syntax not grammar. Deal with it.

Re:spaces bad, special chars bad by Mattintosh · 2006-07-10 07:23 · Score: 2, Insightful

Magic strings are the "right way", or at least close to it.

Have you ever looked at the first 4 bytes of a Java .class file? It's CA FE BA BE. Guess what... even if it somehow gets named foobar.OMGWTFIsThisFileType, the JVM can still pick it out as a Java bytecode file. Why? How? All Java bytecode files always start with CAFEBABE. If it starts with CAFEBABE, the JVM can semi-safely assume that this is a valid bytecode file. But... what if some other file "collides" with that signature?

All Mac files (until 2001, with in introduction of OSX) had "type and creator codes" in a "resource fork". Now, if you just flatten the resource fork into a data header format, you suddenly have a standardized file header with type and creator metadata in them. But what if your FS doesn't support "resource forks"?

Do things the "Lisa" way. Apparently, the Apple Lisa had some sort of database-like file system (in 1983 - WinFS, eat your heart out) that would assign a file ID, but display a filename, and track a dozen or so metadata fields per file. So in the UI, in any given container, you could have 10 files named identically, but they wouldn't conflict with each other. They would also have a full complement of needed metadata maintained within their file-system wrapper (essentially, their file headers). This was arguably the "next level of file system" after the Unix-style hierarchy (and its timing was appropriate in 1983). It's similar to how MP3 files have ID3 tags embedded in them. Those ID3 tags are just metadata values. If every file in the file system had a minimum "simple set" of metadata tags in the header information, this would work beautifully. Someone should make a general standard for this sort of thing and write support for it into Linux. Apple could probably be persuaded to support it (especially if you allowed them to put their name on the standardization effort). Then MS would probably jump on the bandwagon and say they invented it. Let them (who cares? All we want is decent file metadata).

Your argument about file extensions, though, is not only naive, it's also incorrect. Extensions are part of the filename. I, as a user, can meddle with them in the same manner as the rest of a filename. They are completely arbitrary and can be removed entirely if I so choose. They are not metadata. And if I wanted to pack up a binary file with its own headers or signature and send it over a network, it would work perfectly fine. And if I were to design a file system that would work over a network, I would make the file header format a standard. And it would be every bit as reliable as any other system, except when data got to the recipient it would be guaranteed to have its metadata, rather than an arbitrarily-modified filename that may have lost its file type in the transfer.

I will agree with you about mime-types. Mime-types, as you note, are not reliable because they aren't stored with the file. They're more of a way for a server to tell a client what they're downloading before they download it. They work well for that, but are certainly not a good way of defining file metadata.

Wrong about utf-8 by spitzak · 2006-07-10 09:13 · Score: 2, Insightful

Despite your naive assumption that something with "16" in it's name is better than something with "8", the facts are that UTF-16 cannot handle as many characters:

UTF-16 as originally designed handles 0xffff characters.

Because that was not enough characters, UTF-16 was modified to have "surrogate pairs". Usually claimed to now handle 0x10ffff characters, but in fact they fail to subtract the surrogate half-characters (0x800). Also this deleted the only plausable claim that UTF-16 is better than UTF-8, in that characters all are the same number of bytes long (it is in fact worse, because the variable-length characters are much more rare, so bugs in handling them are much less likely to be detected and thus more catostrophic when they do happen).

UTF-8 as originally designed handles 0x7fffffff characters.

Because of the UTF-16 braindamage, the standards for UTF-8 were modified to say that all encodings after 0x10ffff are illegal, so literally UTF-8 was downgraded to match UTF-16. It still is false to say that you can losslessly translate from UTF-8 to UTF-16, due to the surrogate pairs, so they are not equivalent even with this limitation.

The one positive benifit of the "Unix wars" was that it stopped a whole lot of politically-correct idiots from forcing "wide characters" on everybody, and thus Plan9's UTF-8 could take hold. Unfortunatly Microsoft completely ignored all the proof that wide characters were a very bad idea and went and did it themselves in Windows. Still not as bad as if Unix had done it too...

well..... by ce33na66 · 2006-07-10 14:26 · Score: 2, Insightful

Long file names in windows is kinda hokey. If you are at the command line, then you are stuck with the 8.3 format. Ending a directory name in "~1" is not my idea of long file name support.

Slashdot Mirror

Linux/Mac/Windows File Name Friction

22 of 638 comments (clear)