Slashdot Mirror


The Mac, Metadata, and the World

Rick Zeman writes: "ArsTechnica has posted yet another compelling article, this time on metadata, its history and the future of metadata storage as seemingly indicated by Apple in OS X. Extensions==Bad!"

7 of 307 comments (clear)

  1. Linux? by interiot · · Score: 4, Interesting
    The glaring message I got from this was: Windows implements file type metadata quite badly.

    And the glaring question was: why is Linux blindly following Windows? Linux's file type handling is still in a somewhat early stage, it wouldn't be inconceivable for the paradigm to change.

    1. Re:Linux? by David+Roundy · · Score: 2, Interesting

      Because linux is based on unix, and has blindly followed unix, not windows. Also, linux supports a vast number of file systems, which means that the metadata would have to either be stored on all those file systems, or the OS would have to be able to live without it, which would probably lead to it being ignored by most of the software.

      Unless metadata is implemented consistently, its use can do more harm than good. ("I copied this jpeg (named 'picture') to my windows partition and back again, and now I can't view it!")

  2. Poor technical expertise from a Mac Apologist by Anonymous Coward · · Score: 2, Interesting

    In a lot of ways this is a pretty good article, but there are a surprising number of instances when the author seems to have bent himself into thinking about a limited number of filesystems. Barring anything academic, experimental, or "fancy," it's pretty clear he's never tried to think about UNIX linked-list style filesystems: within the framework of his discussion, I would assert that a file's name is not part of its essential metadata in a UNIX-style FS. Why? All of the file information is contained in the file's inode and data blocks (the immediate decomposition into metadata and data being obvious). A name for the file is just an entry in a "directory," which is just another file. A given file might be listed in hundreds of different directories, nevertheless, there's still only one file. It might have a different name in each directory it's in; in which case it hardly makes sense to talk about "the filename," unless one is willing to assert that inode + data blocks don't constitute a file, and that each instance of a reference to a particular inode is to be considered a file.

    Furthermore, the examples of "immutable" metadata (ill-considered vocabulary in the first place, I think) are poorly considered. File size can be altered without altering the underlying data on BSD-style unices that provide truncation and extension system calls. Modification time often gets changed on many systems without any change to the underlying data: many, if not most, kernels will change mtime any time a file is opened for write or append even if no subsequent writes are done to the file. "File type" is essentially a nonsense notion on most UNIX filesystems (and DOS too, given the weak representation), a file's type being an interpretation an imaginary multipurpose file handler is to give the data. In such situations, "file type" is decided either by regexp matching of the file name (which can be anything, remember) and judicious use of magic (man magic if you don't get it). In many cases, this doesn't produce an unambiguous answer: 'file blah' produces 'blah: data' with amazing frequency. Arguably this just means that UNIX filesystems don't have an adequate mechanism to express the idea of "file type," but I would argue that regardless, the notion of file type is at least partially bogus. There's nothing to stop me from interpreting data many differnt ways: an XPM is something I can edit with an ordinary text editor, and hence a file of type "text," but it can also define pixmaps, so depending on what I want to do with it, it might be of at least two file types. Similarly, I can try to view a raw audio file as a compiled pixmap, or, to recapitulate the famous joke, 'cat /boot/vmlinuz > /dev/audio'. The results of such voluntary file polymorphism aren't always useful, but they sometimes are.

    There are further aspects of the article which are either incorrect, or at least fail to reflect my personal experience, but for the most part it's simply repetition of previous errors. It seems abundantly clear to me that the author is a thoughtful and well-educated person whose primary computing experience has been with Macs and post-DOS MS machines: and while he may have used UNIX-like operating systems, he doesn't know much about data representation of filesystems on them, and clearly hasn't considered more modern developments like filesystems with journaling or ACLs instead of permission bits.

    Perhaps my criticism is a little too sharp, I would like to emphasize that I liked much of the article and I laud the author for thinking about some important concepts in detail, but I feel the viewpoint adopted is one unnecessarily limited by the author's personal experience.

  3. Linux thoughts by iabervon · · Score: 4, Interesting

    Linux has traditionally not bothered very much with file type. The user generally knows what to do with the file, and does so. What look like extensions are actually just generically part of the filename; there are conventions for them, but they are no more strict than the conventions for filenames in general (Makefile is probably a makefile, README is plain text, foo.c is C source, etc.).

    An important thing to realize is that file type, like, for instance, size, can be determined from looking at the data. In fact, many programs look at data files and determine the file format from the data; "file" does a pretty good job of detecting non-human-readable formats, even without knowing any information at all about the file type.

    Where this all breaks down, of course, is when the user wants to omit the program name. On a Mac, you normally double-click on a data file to open it (and hope to get a program that does what you want). On *nix, you traditionally have to specify the program-- and much of the time, you select a different program depending on the desired result: for foo.c, I could use emacs, or gcc, or I might want gcc -M (get dependencies), or even wc (to see how big it is), not to mention less or grep or etags.

    I think part of the Mac fascination with file type is due to the monolithic program structure; you find the file, and then you open a single program that does to it anything that you will ever do to it. In this model, there is a right program, and which program is right is based on file type. Windows clearly suffers greatly from having this model but not having a more reliable fashion of determining file type than Linux.

    Incidentally, has anyone else noticed that the MacOS scheme is equivalent to having 4 character extensions which aren't displayed, with the corresponding problem of having malicious executables named README.txt (or even README)?

    1. Re:Linux thoughts by iabervon · · Score: 3, Interesting

      My point was that, under Linux, "creator" isn't very useful. Most of my files are created by "emacs" (or "cp" or "sed" or something, when I copy a template), but what I normally do with them is compile them. The filename extension only matters a bit (saves having to tell the compiler what language it is explicitly); having a type code would have the same effect, but having the creator wouldn't help at all.

      MacOS and Windows are designed such that you tend to use the same program to deal with a given file, no matter what you're doing with it. If you have a JPEG you made with the GIMP, you'll view it in the GIMP. If you're going to compile a source file, you edit it in the compile's IDE, and you view it in the IDE. *nix is designed such that you use different programs for different operations (edit, view, compile, render, etc), and use the same program for a given operation for a number of different file types (C, HTML, English text, etc).

      Of course, Windows gets the worst of both worlds-- you have monolithic applications which do everything to a given program, but you don't have the creator metadata, so it picks a program badly.

      I think it would be nice to have file types under Linux; currently, there are a number of partial solutions: emacs has an "Edit this file in -mode" directive, most binary types have magic numbers (e.g., GIF89a, , ^?ELF, JFIF, etc), and some programs look at some extensions. Of course, there would have to be a number of different types associated with a given file (Java source, UTF-8, plain text, etc), and it would have to be simple to specify the type of a new file when you create it, which is currently done partially by naming it in accordance with a convention and partially by putting in data which looks like a certain type, both of which you'd want to do anyway.

  4. Re:I HATE the MacOS and its stupid metadata! HATE by Fred+Ferrigno · · Score: 4, Interesting

    This isn't a problem with metadata, just a problem with MacOS' file typing.

    BeOS handled all this very well. Double click to open with the default app. Right click to see a list of every program on your hard drive that opens that kind of file or files like it. (IE, a text editor would show up as an option for an HTML file.) Choose another option and open a dialog to set a file-specific preference.

    I must have said "BeOS did it better" about six times today. I feel like an Amiga user.

  5. Object-oriented filesystems by nwetters · · Score: 2, Interesting

    The current problem is that filesystems don't make it easy to store properties of a file. The HFS made a brave attempt by dividing files into content and properties (using data and resource forks), but it still didn't objectify the filesystem. For example, creating children of a particular file necessitated converting the file into a folder and then wondering what the hell you were going to do with the folder properties - were they going to be placed within the folder or within the parent folder.

    A solution would be an object-oriented filesystem, that allows every file to have children without nasty conversions, and implements a simple store for properties (a Berkeley DB file would seem a natural solution).