The Mac, Metadata, and the World
Rick Zeman writes: "ArsTechnica has posted yet another compelling article, this time on metadata, its history and the future of metadata storage as seemingly indicated by Apple in OS X. Extensions==Bad!"
← Back to Stories (view on slashdot.org)
So far, there are at least 3 fallacies in the "Fundamentals" section:
:-).
1) A file's size is not metadata: A file can best be defined as an ordered set of bytes (or bits, or words, or whatever atomic unit your system uses), and the size of that set is intrinsic to it, not external.
2) A file's modification time is conceptually unrelated to its contents. For example, most systems consider a file "modified" even when its contents are replaced by totally identical contents, and some systems provide means to change a file's contents without changing its modification time. Generally, systems use the modification time to note the time of an action that the user would see as causing a file to be modified, which is not always the same thing as noting the time that a file's content are actually changed. I know of no system that records the later time.
3) A file's type can change at will, not just to increase or decrease the "accuracy" of the typing. It's rare that a file would be useful when viewed as data of two or more independant data types, but there's nothing intrinsic in the concepts of files, their types, or metadata, to prevent this. Thus, for example, hacker can get some perverse enjoyment from writing source code that works simultaneously in multiple programming languages.
In general, the author's categorization of metadata into "immutable" and "mutable" is nonsensical. File metadata, by definition, is independent of file data, and is therefore mutable independantly of it. Sometimes systems create tighter links between metadata and data, for example when Photoshop causes files created with it to be of a certain type, or when users makes sure the names of files important to them are in uppercase, but that's a characteristic of the system (Photoshop or user conventions in these examples), not an intrinsic characteristic of data and metadata... And in the introduction, the author warns against reading the "Fundamentals" section with an eye on system implementations
I'm going to guess the author reaching beyond logic to make this categorization so as to give file typing a role distinct and more important than file naming. Needless to say, this is counter-productive.
Unix pipes. How else are you going to get file type metadata if it isn't in-band. That is what the magic number is all about. Pipes, stdin, stdout, etc.
I think this is purely an application level problem and and not a filesystem problem.
It still matters in the gui world too. If we ever develop GUI drag and drop style graphics filters and such, say a webcam output into a filter into something else, that info is still in-band.
How would you represent the file type of a named pipe, or a socket?
The UNIX file-system is brilliant compared to DOS, but ONLY compared to DOS. It is still designed for command-line users convenience. I am NOT criticizing the command line, I use it daily under OpenBSD, Linux, WinNT4, Win2K, and Mac OS X. It is nice to have the control of a CLI, as well as the ability to run scripts.
HOWEVER, the system of making things conveniently obvious for the CLI results in engineering decisions that give the OS less flexibilities. GUIs can provide TREMENDOUS ammounts of information BECAUSE the user decides when to get that information.
For example, the filename and type need easy access for the user. For a GUI user, they need the filename and the type deciding the application binding. For a CLI user, including the type with the filename makes it easier to manipulate.
While you could setup ls (or dir) with many flags to pick and choose the information, you create a minor mess. Additionally, things like changing the type to a list from a database is one thing for a GUI with a dropdown box, it's a nightmare to implement in a CLI. If you designed for the CLI, you made a tradeoff.
Additionally, UNIX was developed in a hardware environment more restricted than the DOS world. Early machines used in development are nothing compared to modern machines.
Take the NTFS file system. If you are on an NT4 machine, or a Win2K machine, (running NTFS of course, not braindead FAT/FAT32) you see filenames as normal. Inside the properties, there are MANY more options. Do it on a Win2K machine, and you see more information than on an NT4 machine if you look closely.
The UNIX approach is old and dated. Microsoft has moved on, it's important for the UNIX community to do so as well. ACLs (implemented on NT) are FAR more flexible than users/groups. Private user groups are an ugly hack to handle the user/group system. The whole UNIX model needs to be modernized. There are ACL UNIX systems, but they aren't the mainstream.
I love the power of UNIX-based server, they give me tremendous capabilities. A proper CLI is awesome. But let's not kid ourselves. Beating Win95/Win98/WinME at ANYTHING was never impressive, they were ugly hacks onto DOS that has its roots in the 8086 processor. Everytime people toute the advantages of Linux, they compare it to Win9x. Beating a legacy desktop OS in terms of uptime, etc., is NOT impressive. Compared to Win2K, Linux's technical advantages are pretty minor. There are some, but not many. Compared to the BSDs or commercial UNIXes... well, Linux doesn't look that impressive. It has advantages and drawbacks, different engineering decisions.
The problem with UNIX is an LCD (lowest common denominator) and designed by committee problem. Having a common API that programmers can target is tremendous, it helps with portability. However, failing to keep moving that API foward is a mistake.
As it stands there are many applications that only work on one variant. Extending the UNIX common API once or twice a year to encompass vendor extensions would be a tremendous boost, and allow UNIX to escape this trap. If Sun has a great idea and incorporates it into Solaris, their ISVs should take advantage of it. The rest of the UNIX world should have it within a year (or two at most) so ISVs can port to other UNIXes. As it stands, you either write to an old standard OR to a particular UNIX. Neither is a good choice.
Alex
A lot of things would be better if the 'lower' OSes would just pay attention to MIME types. But there's one obvious situation where it falls apart.
Joe Mac User makes an HTML document referencing a bunch of JPEG and Flash images. The JPEG and Flash files don't have extensions in their names. He sends his HTML directory to his Windows-loving friend. Assuming that the Windows or Mac apps payed attention to the file types (either Just In Time on the Mac to add extensions, or the Windows app payed attention to MIME), the user's documents would have appropriate extensions added to them. The Windows user's HTML is busted.
While it royally bites that I have to put up with extensions in OS X, I can understand why Apple did this.
You non-tech-savvy computer user (I'd think that's 80% of computer users out there), are damn clueless, and would be completely unable to fix that HTML example.
Moderators should have to take a reading comprehension test.
I don't think Windows does a bad job at storing data type information. It just doesn't try to. What Windows stores in the filename is file format information. A song tablature, ASCII art and C++ source code are very different things, but you can call them all TXT's and operate on them with no problem at all. The author really messes up things a bit in this matter. You can have, say, two LZW-compressed palleted images. One as a GIF and other as a TIFF. Pretty much the same data type, but with different headers/tags, different LZW max. prefix length, maybe different byte-order. Same for a JPEG TIFF and a JIFF. Actually, what is the point in saying Image/gif one you can't have Sound/gif or Text/gif?
I really don't think Apple came up with an extensionless filename scheme purely out of conceptual considerations. Anyone who has ever tried to educated someone on how to use a computer for the first time knows that file extension can be confusing! The Mac was built to be easy. I would go as far as to say it was built to reach people who were afraid of computers. The fact is that some other people do need a command prompt, and that interface does benefit from file extensions.
Now, Linux is not following Windows at all on this. Fire up Konkeror and see how it identifies most files, extension notwithstanding. Or try #man file.
But, what do I know?
I'm afraid you misunderstood his definition of immutable. In this example, you changed the data, and what was originally a plain text file became an HTML file. His definition of immutable was that if the file data changed, then its type did not change.
Also, it didn't mean that the metadata need be unchangable, since it could be changed to reflect greater precision, or if it was wrong in the first place. For example, an html file is a text file (but more). So it is entirely reasonable to change the type from text to html (provided it actually is).
slashdot.pl and slashdot.txt should NOT collide on my desktop...
I agree that slashdot.pl and slashdot.txt should not collide, but that is just because they are part of the name. They should also not be required to be a given type.
How hard is it to change a bash script to a different shell? Change the first line.
I agree that metadata should be readily accessible. The only reason it is tough on a mac is because it was intended to be difficult, so that new users would have trouble shooting themselves in the foot.
How would you like it if you had to name all your executable perl scripts ending with .pl?
You don't, because the operating system specifies an (optional) header section to every executable file, which allows it to determine which program to run the file with. This is metadata, of the magic number variety. It is data added to the beginning of the file, for the sole purpose of determining its type (ok, in this case it also specifies the path to the perl executable and any flags to be passed it, but ignore that for a moment).
The reason we have such magic numbers (which are also in most other standard file types, ps, gif, jpeg, etc) is because there are no common operating systems which support file types, so applications are on their own, and are forced to include what is properly (in my opinion) metadata in the file data itself. As long as we are going to store this data, why not have it in a standard location where it can be used by the rest of the operating system?
Actually, I lied. I don't hate MacOS; I just wanted to get your attention by yelling about it. Now that you're here, though, I have to say that I LOVE the MacOS, and have ever since I first used it, before it was even called MacOS. I started with System 7, which was so attractive and easy to use that it's still my bar for measuring other interfaces.
.wav opens in a player, while another opens in an editor or burner. Well, I think the solution offered by Windows and by some *nix environments is better, easier, simpler, more elegant. A simple context menu, brought up by right or center-clicking, provides any options you could want. That way to open something in my viewer application, I just double-click--I know on my Windoze box that all image files (except .psd) will automatically open in my viewer, ACDSee (which recently became available for Mac, too)--no surprises, no metadata editors needed. If I want to edit it, I just right-click and choose the command "Edit" from the menu, which is set to open images with Photoshop. Same with .wav and other such--double-clicking opens in WinAMP, right-clicking and choosing "edit" opens in SoundForge. You can create any action, and choose any app to be associated with that action, for each file type--and then a list of all the possible actions for that file type will be displayed when you right-click a given file. But it will open in whatever your set to be your standard viewer, by default, if double-clicked. Much better than relying on hidden metadata. But even better and simpler than having to set up the actions and associations in the Folder Options dialog, is just using the Send To sub-menu that is brought up on right-click--just drop shortcuts to the apps you usually use into the Windows\SendTo folder, and those apps will appear on the Send To submenu when you right-click. That way I can easily open any file with any application, by using only one right-click and one left-click. In terms of launching files, it's like having the flexibility of a CLI, but within the ease-of-use of a GUI. That's one feature the Windows GUI actually got right, and got right very early on. MacOS can keep its metadata, but this is easier, simpler, better. I love the Send To submenu, though it's usually under-utilized by most people.
:-)
But if there is one thing I intensely dislike about MacOS, it's the metadata. I know I'm practically alone in the Mac camp, but I hate metadata. I have always thought it was just a space-hogging pain in my ass.
Now, the space issue is no longer a big concern since we have such big, cheap drives that a little filesystem metadata isn't such a burden on capacity. But back in the days of floppies I was pissed that I could fit so few files on a floppy when my friend with DOS could fit noticeably more. I was especially annoyed that even when I formatted a disk as a PC floppy, the Mac would still waste my space by creating and hiding from me files and folders on the disk to constiture the resource forks. I wanted every kilobyte, which counts when you're cramming a lot of small files onto a lot of small disks.
But of course this is no longer the big issue it used to be. But if I were storing large numbers of files and running out of space on a Mac, I'd still silently curse all that metadata wasting my capacity.
The part that still bothers me, now that capacity is no longer a substantial issue, is that in Windows or *nix I can instantly change file types from the interface, but not with Mac. It comes up a lot--many times a day. Click a filename, change three letters, and a text file is recognized as a script or batch file to be executed rather than opened. A click and three letters, and a file I just downloaded from USENET goes from text to UUencoded so that when I double-click it will be decoded for me. A click and three letters is all it takes to change a file's type and its application association from the GUI, without having to resort to some clunky special editor. And it's even better if I need to change the type/association of a great number of files--just open a CLI and type a quick line, and it's all done. What a pain it would be to have to use a metadata editor instead of just manipulating three letters in filenames. Simple file extensions put more power over the file within easy, simple, even automatable reach.
The advantage of metadata is something many Mac users, and theoretists like this article's author, seem to believe in, but I cannot see it. For instance, it's thought a great advantage that you can set a file to open with any application, despite the filetype. I hate downloading things on a Mac because of this. Some idjit will have a file set to open in an application I don't have, and the computer may be too stupid to know that I always open that file type in Application X. A dialog pops up on any reasonably modern MacOS to help, but it's still a big pain in the ass compared to having a PC automatically know what I open that file type with. Even more annoying is when I really do have the application the file is set to open with installed, but I always want that file type to open in a different app. This most often happens with graphics files--I do not under any circumstances want to have Photoshop or Graphic Converter open a graphics file, just because that's what it was created in. I have a simple image viewer for viewing images. If I want to edit them, *then* I open them in Photoshop. Same for Premiere and others--I do not want a big, slow editor to open my files just because that's what they were created with; we have smaller-footprint and more versatile file viewers for that.
The other part of it is that the "simplistic" (sometimes the most simple designs are the most elegant, while the more complex are just gaudy) file typing systems also solve the problem of opening certain files of a given type in one application but others of the same type in another application. Metadata proponents always point out how "great" it is to have one, for example, JPEG open in JPEGview or whatever, while another JPEG opens in Photoshop; one
I hate to say it, but the metadata folks are IMHO going the wrong way. I want more power and flexibility within my clicks, not less. I hate having to edit metadata when a simple three-letter change is all that would be needed in *nix and 'doze. And as I said, the advantages of metadata in terms of application/file association are entirely negated by the right-click menu and its Send To submenu in Windows, and similar functionality in some *nix GUIs. Metadata may have good uses, but none I can think of that can't be done more simply and elegantly. I also dislike the idea of my filesystem hiding things fom me, which unfortunately is exactly what MacOS does and what the newer NTFS in Win2k and up can do (I believe Ars had an article when Win2k came out about the new NTFS and some of the still-largely-unused metadata fields). Ext2 or FAT32 all the way, baby--and before you poo-poo FAT32, it may have almost no modern features, but it is straightforward, simple, and actually very fast in performance (thanks to the fact that it implements no real modern features); I recall it beating out NTFS in terms of raw speed in an old Ars article. Poor crash recovery is its main weakness.
I like to keep things as easy to manipulate as possible. And contrary to what many make the mistake of thinking, file extensions are not just easy for CLIs--as I said, it makes sense in a GUI too, since it can be directly manipulated from within the GUI's file browser, without having to open the file in a metadata editor. It also makes the type of file crystal-clear--especially important if you don't want to accidentally run an executable that has an icon to make it look like a file. Unless OS X has some way which I haven't noticed to visually set executables apart from other file types, even when they're on the desktop or somewhere else that doesn't show details, I can't wait for someone to create lots of OS X viruses that have common file icons. That's already a case in the Windows world, where you'll find files called Report.doc.exe that have Word icons, but if you notice the trailing extension you won't mistakenly execute them (though the "show extensions for all file types" option isn't the Windows default anyomore, alas). How can you tell by a glance in OS X, or any other place where metadata rules instead of file extensions?
Oh well. Windows may not have a lot right--but it does have its use of simple file extensions and simple context menus right. I always hated editing resource forks. It's just another *unnecessary* layer getting between a man and his hardware. Tell me one very useful thing that can be done with filesystem metadata, that can't be done easier and put more in direct control of the user. And before you say "labeling," like MacOS prior to X used to have--that's what folders/directories are for.
Chasing Amy
(We all chase Amy...)
"The more corrupt the state, the more numerous the laws"-Tacitus
The fact that you can run Unix apps may have removed a reason for you to avoid Mac OS, but it is not a compelling reason to switch in and of itself. If Mac OS X acts "just like Unix", why would you switch to it from Unix? Obviously there was some other compelling reason to switch--something that differentiates it from other OSes that are also Unix or Unix-like. Those differences are what make people switch. Features that are the same merely remove those features form the decision making process.
P.S.-If you read any of the reader mail from my OS X reviews, you'd know that I'm really a PC bigot ;-)
I think a lot of the metadata haters just don't know what metadata is doing for them. Every time you hear someone say "Mac OS is more elegant", they really mean that it doesn't keep popping up dialog boxes telling you off or warning you you're about to break something. Much of the time this is because that information is being stored and so applications just generally appear to know more about what's going on, so they don't have to ask the user.
Not having a satisfactory method for modifying metadata is hardly an argument against having metadata; it's just an argument for having a satisfactory method of modifying metadata. There are countless utilities on the Mac for doing this. All Mac OS X needs is to show a field for file type along with the field for filename and permissions and such in the file Inspector.
The argument the article makes is that we shouldn't just throw away all the metadata that's already attached to files just because it's inconvenient to store it on legacy filesystems.
You don't understand the difference between TYPE and CREATOR. Imagine the following.
dickbreath@toybox:~/dudes > ls -la
total 31337
-rw------- dickbreath users TEXT NPAD file1.txt
-rw------- dickbreath users TEXT NPAD file2.txt
-rw------- dickbreath users TEXT WORD file3.txt
-rw------- john yum JPEG WORD file4.txt
-rw------- sean yum JPEG GIMP file5.txt
dickbreath@toybox:~/dudes >
There are 5 files. Several of them have been MIS-named! Notice that "ls" has been cleverly modified to indicate the file TYPE and CREATOR metadata.
file1.txt is a text file. (type TEXT) When you doubleclick it, it will open in Notepad. (creator NPAD)
file3.txt is also text. (type TEXT) But when you double click it, it will open in -- surprise! -- Word!
file4.txt is not text at all (type JPEG) although the filename might decieve some into thinking it was a text file. But when you've NEVER had to use this stupid ".txt" naming suffix thing, you wouldn't be decieved. In fact, you would wonder why on God's green earth whyone would put ".txt" on the end of a filename? The icon wuold clearly show it is jpeg, belonging to word.
file5.txt is also not text (type JPEG), but surprise, it opens in a *different* application, this time, the GIMP! (Note type is JPEG, creator is GIMP)
Finally, the icon displayed for a file is determined by the application. Each application has a database of icons to assign. The icon displayed is determined by the unique COMBINATION of type and creator.
For instance, if GIMP can open JPEG, GIF, and PSD, then you might have a "family" of similarly styled gimp icons, yet each icon is visually distinct enough to make clear that the file is jpeg, gif, or psd. But another app, such as ImageView, might also have it's own uniquely styled family of similar looking icons, but have "jpeg", "gif", and "psd" variations of those icons.
When a file is GIF/ImageView, it gets the "gif" icon from the ImageView application. When a file is GIF/GIMP, it gets the "gif" icon from the GIMP application. The icon visually distinguishes what kind of data it is, and what application is going to open it.
But you can always grab a GIF/ImageView, file and drag-drop it onto GIMP. No sweat. In fact, if you then save the document from GIMP, the creator will be changed -- but type will still be GIF.
I apologize, if I come off as frustrated that such an advanced concept, invented such a long time ago, is still so relatively unknown by so many people who are so technically brilliant. And a lot of it is entrenched thinking. "Well, this is how we've always done it!" We laugh at MS for lack of innovation, yet I hear many here talk about not liking GUI's despite their now finally commonly accepted advantages, yet some of us stay stuck in the stone ages when it comes to how unix has always done things.
Finally, other posters under this topic have complained about how hard it is to change the filetype compared to the filename. Really? They type "mv" to change Finally, other posters under this topic have complained about how hard it is to change the filetype compared to the filename. Really? They type "mv" to change the name, and "chown" and "chmod", but they can't change the filetype or creator? You have to (in KDE) right click, Properties to change the filename. Would it be so hard in the same dialog to edit the type and creator as well as the filename?
I bet the same programming genius who could modify "ls" to display the filesystem's type/creator could also write new "chtype" and "chcrtr" commands.
the name, and "chown" and "chmod", but they can't change the filetype or creator? You have to (in KDE) right click, Properties to change the filename. Would it be so hard in the same dialog to edit the type and creator as well as the filename?
I bet the same programming genius who could modify "ls" to display the filesystem's type/creator could also write new "chtype" and "chcrtr" commands.
I'll see your senator, and I'll raise you two judges.