The AppStore is for CONSUMERS, there will never be a full lockdown because forcing every software writer to release through the AppStore would kill OS X as a development platform. Even XCode requires a whole bevy of gnu utilities. OS X is a full fledged UNIX and as such, you'll always be able to do *Nixy things such as wget/curl a file, gunzip, configure and make.
I believe this is true for the time being. However, using words like "never" and "always" is a bit short-sighted. Desktop and laptop computers have traditionally been fairly open platforms in terms of what the user is allowed to do - but there is no reason to assume this will continue to be the case. If someone wants to change that, it will be a slow, difficult process to change user expectations to a point where they accept that loss of control - but it can be done. People have already accepted mobile phones as a fairly closed platform, and some contend that phone use is displacing most "personal computer" use - which means that the experience people get with their phones is redefining users' expectations of interaction with their computers.
OS X is currently a "full fledged UNIX" - this can change. XCode requires a bunch of GNU stuff - that can change. What do they gain from further restricting their platform? They gain a greater ability to simplify the user experience (which is a good thing for many users) and redefine various aspects of the OS that could be hard to do otherwise... And they gain status as a gatekeeper for the platform, a middleman who can extract money for every piece of software sold on the platform - much like what they enjoy on the iPhone platform, or what game console manufacturers enjoy.
One possible approach would be to give developers the same level of control they have now - but marginalize them. Charge them an extra $300 for the version of OS X that lets them do developerry things, or block developer machines from accessing the app store (apart from developer tools) - things like that. Things that would yield the desired level of control over most Mac systems, simply because most users wouldn't want the disadvantages (additional cost or reduced capabilities) that come with a development-capable machine.
I hesitate to say "Apple could do such-and-such" because I feel like that conveys the idea that I think this is likely to happen in the near future. My point is that it could, and it's silly to assume that it won't. The landscape of computing is changing, as it is bound to do over time. It's easy to assume that the status quo is some static, unchangeable thing, but it really isn't. Within the bounds of what users are willing to accept (even grudgingly, at first), the company in control of the platform can do whatever they like.
There's nothing wrong with the sandboxing model per se. It's probably the only way to make our computers more secure. That Apple is moving in that direction should not be surprising: they make idiot-ready software (also known as good software)
I take exception to this.
"idiot-ready" software is good software... for "idiots".
(Of course, they're not really idiots, most of them - they're regular people who desire a simple level of interaction with their computer. But I'm just running with the "idiot-ready" terminology there.)
That approach to software design is "one size fits most" - but it's not "one size fits all" because the limitations of a simple UI will inevitably interfere with (or at least fail to support) something that someone is trying to do. When your expectations and skills pass a certain threshold, a simple UI is not necessarily a good UI.
It's too cold for sand worms. Besides, sand worms would have to contend with the Ice Warriors, and they'd probably find the waters of Mars to be rather inhospitable as well.
i was there a few months ago and they still had the BASIC stamp :/
Well, of course they did, nobody bought it!:)
Personally, I always thought it was kind of cool that they had a microcontroller development kit in the store, even if it was the BASIC stamp. Maybe I should revisit that opinion, since the BASIC stamp is so damn old at this point... But when I first saw it, it was at a time when I'd really lost all confidence in Radio Shack as a place for electronics hobbyists, and I took this as kind of a positive sign.
Nowadays, though, Arduino is clearly something they have to be on board with if they want to address hobbyists.
Admitting you don't know something is admirable. Not knowing can be remedied. It is a lot better than pretending you know something when you don't.
Being actually proud of not knowing something is unfathomable to me. Nobody goes around boasting they can't read, or don't know how to use flush toilets. I don't know whether to pity them or despise them or both.
Pride in ignorance is part of the basis of anti-intellectualism and the backlash against science. There's a feeling of insecurity that comes from confronting the fact that other people know a lot more more than you do about how the world works, and a knee-jerk reaction to dismiss such people as elitists, or claim that the products of generations of research are worth no more than the knowledge and beliefs they possess. Some people can't tolerate the feeling of being inferior to someone else (and I can relate to that, personally) and respond by trying to boost themselves up while dragging others down.
The process of boosting themselves up often doesn't involve real self-improvement, rather just an assertion that they are good enough already. Hence, pride in ignorance.
It seems to me that by putting metadata into the filesystem, you're creating some big problems with compatibility: different filetypes need different metadata. For instance, a PDF file might have information on author, title, etc. A jpeg file might have EXIF camera settings. Having the filesystem deal with metadata seems like it's pushing this stuff down into the OS, where it really should be left up to apps.
But what we're dealing with now is metadata very much as an OS-level concept. I think right now the implementations have a bit of a "strapped-on" feel, but it's going to become more and more central to how OS UI works.
Also, what if people decide they want different metadata? Back when jpegs were first made, they didn't include EXIF data, but now they frequently do thanks to the proliferation of digital cameras. Presumably, the standard was modified to allow this. But changing the standard is easy with metadata encapsulated in the file itself; just have a version number in the file heading saying what standard the file conforms to, and apps will read this and interpret the data accordingly. Changing an application or two is a lot easier than patching the whole OS to deal with a change in metadata standards.
It really isn't. At least if you're patching the OS, you can just make the change once, instead of again and again for every application that uses the file type. How many applications work with video files, or images?
OS-level metadata also tends to be very flexible. xattr support on Linux, for instance, lets you store name/value pairs with whatever name/value you want. There are limitations (I think a maximum value size limit measured in kilobytes, at least on some filesystems - so it's not like a full "file fork" implementation at this point) - so pretty much, if you want to add a new field, you just add a new field. The same is actually true of most forms of metadata stored within file contents as well - at least the modern ones. I think if a metadata system doesn't have the approximate flexibility of XML then it's pretty much rejected.:)
Or, what about files made by specialized programs that not many people use? Why should OS makers have to deal with metadata standards for every filetype in existence? The way it is now, only people who write apps dealing with a particular filetype have to deal with metadata standards.
That is not the way it is now. Desktop indexing (present in Windows, OS X, and at least optionally in Linux) monitors the filesystem, re-scanning the in-file metadata when a file is modified, so it can build a central database for quick searches. So the indexing system needs to know how to read these different file types.
Taking the metadata out of the file creates a lot of complexity, without any significant gain that I can see. Your examples of bittorrent files and files with changed metadata not md5-matching others just doesn't seem to be enough of a problem to warrant all these changes. In fact, these problems can be easily fixed by fixing the tools that use these files; BitTorrent, for instance, could be modified so that certain popular filetypes (e.g. video files like avi and mkv) are recognized by the tools and the metadata ignored when creating an md5sum. Modifying a tool that only some people use is a lot easier than modifying an entire OS.
I wouldn't exactly call that "easy", personally...:) Maybe you're right and my examples could be better. But it addresses a general issue that metadata is not conceptually part of the file contents - that's the whole point of metadata. Like the filename, it's just there to tell you what's in the file. If you change the filename or date stamp, it doesn't affect the file's contents. So by the same logic I'd say search tags and so on shouldn't be part of file contents either.
But you could come up with a million different examples of data, and how they are handled has to be on a application level because only the application knows how to deal with the data.
Well, I take your point, that it doesn't necessarily make sense for the OS to get too heavily involved in what would normally be application-level decisions. However, I think more flexibility in the structure of files could be useful. Give applications the tools and let them decide how to use them. Providing a feature like file forks is awkward because most software doesn't currently deal with it (as you point out) and there's a little bit of a technical challenge in implementing it and a logistical problem in getting it into all the various filesystems (the ones where it's possible to do so, anyway) - but it is not an insurmountable problem, nor is it a slippery slope into ever-increasing complexity, or a path leading to an unavoidable fate of excessive OS involvement in application file storage strategies. It is one useful organizational tool, allowing an application to have multiple "sequential byte range" abstractions within something that's treated as a single unit on the filesystem. The difference in implementation between file forks and directories would be very minor. The major difference would be in how UIs treat the "forked file".
And there is a reason the files are linear series of data, because that is what HDs are as well.
To the extent that this is true now, it is becoming less so over time. Hard disks aren't linear by nature - they have multiple platters, for starters, so they'd be more like multiple contiguous ranges. Then there's firmware in the drives that maps around bad sectors, quietly substituting other areas of the disk. One can certainly still treat the drive as a linear range of storage space, and it's a convenient way of dealing with the disk, but it's an abstraction that we're very quick to abandon... Filesystems, for starters. If you have a directory of files, you don't want to think about that in terms of sequential storage on disk. You want to be able to copy and move and erase and create them and never care about where on the disk they go. The filesystem layer hides the abstraction of the disk as a sequential thing, and then reintroduces it at the file level.
And there's no guarantee that the "sequential" file data even will be "sequential" on-disk. We try to minimize the fragmentation of files, but the "sequential" nature of the file is, again, really just an abstraction. We could exploit that - tell the filesystem to insert a block of disk space into the middle of a file, and the filesystem wouldn't really have to move any data around to perform an insertion - but as far as I know that's not a supported operation at the application level on any OS. So instead we read all the data out of later parts of the file into RAM, then write it back to disk somewhere else - all for an insertion operation that could be handled much better by the filesystem.
Not that a file cannot be broken apart into different sections to fit/optimize performance but at the application level they have to be considered linear series of data if only because every programming language of earth is set up to read files linearly.
Things change.
I mean, I get your point here, too. I wasn't a Mac user back in the day, but from what I've seen interoperability was a bitch because of file forking. And that would still be true today.
But I can't accept "because that's the way it's been for 30 decades" as an argument for why a design choice is good. Things will change in the future. I don't know how, or when, but it's bound to happen. Being a part of that change, rather than being left behind by it, requires openness to new ideas. Even the most fundamental concepts of computing, sooner or later, will be subject to revision.
And all this shows why you don't need the filesystem to track metadata, all you have to do is embed it into the file.
Well, my post really wasn't addressing that question at all. My post was about whether using metadata, as opposed to directory structure and filename, to find data was a reasonable sort of UI, or if people's tendencies to be lazy about writing metadata would undermine that too much. My point was that if the system is well-designed around the use of metadata, users will tend to keep their metadata well-ordered, because in that case it's actually useful and easy to do so.
To address the point of whether metadata should be part of the file structure, or adjacent to it - I think there are advantages to each approach. Presently, there's a lot of infrastructure that's just not geared to dealing with metadata that's not stored as part of the file itself. But if you copy an MP3 file, the ID3 tag will be preserved, because it's there in the file structure. So at present that's a definite advantage, and not one to be underestimated.
There are disadvantages to bundling the metadata: for starters, if you have two files with identical data but different metadata, tools like "diff" or "md5" would reflect that difference. Or if you modify a bit of metadata, you're changing the file's modification time as well. That could be undesirable. Suppose you download a file via bittorrent and tag it according to your preferences - you won't be able to seed the torrent from that file, because its checksum will have changed. Or what if you want to tag an HTML file with metadata, or some other file type for which metadata either isn't supported, isn't adequate, or you just plain don't want it there in the file contents? The reason why it's called metadata is because it's data in reference to the primary data of the file... not part of the primary data of the file. I don't claim that this, by itself, is a conclusive argument in favor of filesystem-level metadata but I hope you take my point that there is a logical basis supporting its separation from the primary data stream.
There's also the maintenance issues around supporting each new type of metadata for each new file format as it's introduced - and if, as part of the OS design, you make some decision about metadata or how it's used that doesn't fit well with how it's stored in a particular file format, then resolving that disparity could be a bit of a headache for the implementers as well as anyone who has to use that UI. If you provide metadata as part of the filesystem, its format can be changed to suit the way it's being used, and these changes can be transparent to applications and users.
But I think the bigger issue, and the main thrust of the articles and the main focus of current work in improving utilization of metadata in the UI, has more to do with how metadata is presented to the user, rather than whether it's stored in the file or adjacent to it. The indexing systems in present use can use both approaches: metadata within files for file formats the indexing system is designed to specifically support, and filesystem-level metadata for others. From a pragmatic standpoint that's probably the way to go: use file-level metadata where it's appropriate, use filesystem-level metadata where it's appropriate, and just do your best to resolve disparities as they crop up. It's hard to effect this kind of change, so there are always advantages to an approach that provides a smooth transition.
I looked at this problem once, in the context of distributed file systems. I'd divide files into the following categories, with slightly different semantics for each. This can be fitted into the standard UNIX/DOS/Windows model, but it resolves some issues that result in programs doing elaborate workarounds to get correct file semantics.
Unit files A unit file is only meaningful once it has been completely written and closed. Once closed, it will never be rewritten, only replaced. Most files are unit files. The file system should guarantee that 1) unit file replacement by a new version is an atomic operation, 2) unit files not properly closed do not replace old versions, so that when a program fails while writing a unit file, the old version remains undamaged, and 3) a unit file being written is not visible to other processes until closed.. This was a feature of some mainframe operating systems. Because the UNIX file system concept doesn't have this, there's much fussing around with ".part" files and renaming strategies to achieve unit file semantics.
Managed files Managed files are written and read in a random access fashion, but only by programs which understand their format. Managed files typically contain databases of some type. The file system should guarantee that 1) managed files can be opened for full or partial exclusive use, and 2) additional functions for insuring that file writes are flushed from cache to disk are available.
Managed files have the most complex semantics, but not many programs use them. The ones that do typically go to a lot of trouble to get the file semantics right, to maintain database integrity. Read what SQLite needs from a file system to get a sense of how managed files need to work.
Scratch files Scratch files do not outlive the process group that created them, and are invisible outside that process group. They can be read and written freely, but do not have permanent existence in the file system. The file system should guarantee that when the process group goes away, so does its scratch files, so as not to clutter up the file system. This would stop the accumulation of abandoned temporary files.
We have everything or nearly everything we need to implement this - though admittedly it could be a bit cumbersome...
"Scratch files" can be created by creating a file, opening it, and unlinking (removing) it. The file continues to exist until the last open filehandle to it goes away. But since it's unlinked, it has no permissions and no one else can open it. But if you want to grant a filehandle to that file to another process, you can either fork() and the new process will get a copy of the filehandle, or you can send an open filehandle to another process via... a Unix Domain socket? (IIRC. I know there's a mechanism, I don't remember if Unix Domain Sockets are it. The dbus system uses filehandle passing for certain things, I believe.)
"Unit files" would be more or less the same thing: create the new version of the "unit file" with a temporary name and unlink it from the filesystem. Processes opening the file will see the old copy - until you're ready to commit the new version, at which point you unlink the old version and link the new version to the old filename. Anyone who opened the old version will continue to see it until the last open filehandle to it is closed. The main bit that's missing, I guess, is that this isn't an atomic operation. If you unlink the old file and then somehow fail to link the new version to the old filename, the file momentarily doesn't exist (until you link the old one back again...) And you don't get the "commit all or revert all" behavior you describe for a batch operation.
It's worth noting that "Unit files" take the same amount of resources as the current scheme but you lose the ability to peek at the new version before it's committed...
If only we had Steve Jobs to solve this problem for us.:-(
Hey, he had his chance. How long was he with Apple after his return? Ten years or something? He ushered in OS X and the shift to Intel - both of them representing fairly extensive breaks with previous Apple products, either of which could have been a great time to tackle something like this...
Apple brought us Spotlight, I guess. It's representative* of the way things have been going: not redefining the filesystem, but building a database of file information and using it for searches.
(* I don't know who pioneered the concept or what implementations came first, so I say only that Spotlight is an example of filesystem indexing.)
People who say that hierarchical filesystems suck probably have a big mess on their table in real life.
I have never tried organizing my table hierarchically. Tell me, where do I put my cell phone in that case? Do I group it with my desk phone, because it's a phone, or with my computer, because my phone is also a little computer that's plugged into my big computer?
When I download something from the web, it goes into ~/Downloads. I don't have to waste time telling the system what it is, I don't have to figure out where the system put it and if I want to see all the files I downloaded I just 'ls ~/Downloads'.
But sooner or later you're probably moving them somewhere, right? The analogous procedure in a "database" filesystem would be to recategorize the item from "uncategorized recently downloaded item" to something that'll be more useful in the long term. One important thing to remember is that right now tagging and categorizing a file is a chore because that's not something a lot of UI is designed to do... and stashing a file in a directory hierarchy is relatively easier because it's something we're used to, and something UIs are currently geared toward. That's also the only reason sticking files in a database could reasonably be called "hiding" them. It's just a question of what the UI is geared toward. (In the case of folks like me, and you apparently, the "UI" in this case would include the command shell and various mechanisms used to address files in other programs as well...)
My personal feeling is that the usefulness of a "database filesystem" would be greatest in (and perhaps limited to) certain domains. Media files, hell yes. Source code... Well, apart from version control I really don't think so.:) But having the hierarchy indexed could certainly be useful ("find me the files that implement this method of this virtual base class - because some knob on the programming team doesn't believe in grouping class definitions in a sensible way"... of course some IDEs already do this independently of any system-wide indexing support)
The thing with media files is that you tend to wind up with a huge collection of 'em. They usually don't need to reference one another the way source files do, they're self-contained and there's not always a good way to name them. Video and music files can usually be named with their title, of course, and it's not hard to put them into a hierarchy, but the hierarchy isn't always useful. If what I want is to play "Doppelganger" - I'm not likely to have to disambiguate a request like that. I can specify the full path from the base directory where I keep all my music, but I could as easily skip that. (And it's not that hard in a shell that supports the ** notation for "find" like searches during globbing...) But if I take a bunch of photos, there may not be any value to giving them filenames at all. It's more useful to organize photos by things like date and tags. "IMG_0286.JPEG" isn't useful in any way, and coming up with unique titles for images that may not even merit a title would get a bit crazy. The filename becomes an afterthought in that case - a necessary "evil" if it's something you have to deal with manually. (If it's dealt with automatically, the filename could still be useful as a unique identifier.)
I just want one thing: a file system that is part database for fast file searches. I don't want to manually build indexes or any other bullshit just look at the file table and give me my fucking file. Even if you had 100,000 files with file names of 256 characters, its only 2.5 MB, how long does that take to parse? Maybe I don't understand file systems but even a 10 MB file table should only take a few seconds to scan. When I do a search of a directory or entire disk with tens of thousands of files it sometimes takes a minute or two. The disk is thrashing away as if the program is looking all over for the file names. Shouldn't they all be in one place pointing to where they are on disk?
We pretty much already have this. The way it works in practice is that there's some service on the machine that provides indexing, maintaining a central database of metadata. When a file changes, the metadata is re-scanned and the index updated. Then you can use the index to search for things.
I know this exists for Linux but I don't know to what extent it's actually supported by applications. (I never use the feature.) On Windows, these days, file manager windows show columns containing metadata fields (unless you turn that off - haven't had much luck so far, actually) and you can't swing a dead cat without hitting a search field.
It's not incorporated at the filesystem level, but for the most part it doesn't really need to be. As long as the feature is there, and you can rely upon it being there, and applications actually take advantage of it and work with it, it's just as good. Well, very nearly.
One thing I think could be improved is that sometimes file names just aren't meaningful. Filenames from a digital camera, for instance, tell me very little - and renaming the files, while keeping the names unique and making them meaningful is not always easy. I could have two different 003.jpeg's in two different directories with completely different contents. If I move the contents of one directory into the other, I don't really want one to overwrite the other, because that would be dumb. That filename is entirely meaningless, the only reason the file even has a name is because the filesystem requires files to have unique names. But that could be addressed at the UI level (and has been, on Windows anyway, which is how you wind up with things like "003 copy copy copy copy.jpeg") so it doesn't necessarily require a change to the underlying file paradigm.
Wouldn't it be possible to make a "universal" file container, in that any other file type could be imbeded with a text file that listed: what type of file it is, what program it is associated with, owner, creation/mod dates, and especially, tags and other types of metadata?
Do you know that they tried this already? In 1985? (Well, I'm speaking specifically of IFF - but there were other efforts. Mac's file forks were kind of the same sort of thing, except that they maintained the abstraction all the way down to the filesystem layer.)
Now, just because they tried it already and more or less failed doesn't mean it couldn't work... But they were in a much better position in 1985 to make this work than they are now (we've gone too long and come too far without a "universal format", it'd be nearly impossible to get people to embrace that kind of change now...) so I think it's kind of a lost cause.
I found it absolutely fascinating, personally, when I read one of the original documents on IFF. The ambition, the hubris perhaps, with which they were trying to guide the future of personal computing. They weren't just seeking to create "a" format, they were aiming for it to be the format. And it would have been capable of just about everything you suggest - embed a FORM of whatever you like in a LIST, put in descriptive chunks, etc... I believe Amiga embraced the concept to a fairly high degree.
There are various historical and technical reasons why it didn't really pan out. I think one of the big ones is simply that IFF wasn't the right format for everything. Perhaps no one format can be. Among other things, IFF required four-byte payload sizes appear at the start of each chunk. That limits a chunk (and therefore a file) to 4GiB maximum (not such a big deal in 1985 or even 1995... But these days it'd be an unacceptable limitation) - but another problem is that sometimes you need to write out some data and you just don't know how big it's gonna be. Streaming audio and video are a pretty good example. You can discretize the stream, populate it with known-size chunks, but you don't know the size of the whole stream until it ends.
I think general-purpose data formats are a good thing - but I believe it's very important to consider that there may be cases where a particular format just isn't right for the problem. And that brings us back more or less to the current scenario, in which different applications tend to have totally distinct file formats, not even sharing an overall containment structure. From that perspective, it's wasteful to continue re-inventing metadata storage for each new file format that comes along, and wasteful to implement all these different methods of reading metadata out of different application-specific file formats. There's also the danger that we will want to change the format of the data in the metadata fields (just as we shifted from "whatever local variant of ASCII your region uses" to mostly using UTF-8 - which still isn't necessarily adequate for all regions, incidentally) Another all-new text encoding so soon after Unicode's introduction isn't too likely, but the OS, in defining how these metadata fields are defined and used, could change the requirements that go beyond what the container format can provide (for instance, storing data that goes beyond the limit of a particular format's "metadata region" size limit, or storing something that's better encoded in some binary form other than text. Decoupling the encoding of metadata from the definition of file formats eliminates a bunch of redundant work and leaves us more room to change what metadata contains and how those contents are used, as we get a better idea of how, ultimately, it will be used as the dust settles around this whole issue.
A file is simply a linear series of data. Period. End of story.
Engineers and interface designers have the ability to determine how these abstractions are implemented at the low level, and how they are presented to the user at the conceptual level. While the abstractions of the past and present are very useful, it is not sensible to assume they will continue to be the best course in the future.
Let's look at an example in kind of the middle ground, between implementation and conceptualization. Suppose you have a file containing plots of some value with respect to time.
Now, each plot, by itself, is very linear by nature. However, the plots together are not. They run in parallel. You could simply concatenate them or interleave them to turn them into a serial file, but the data is not linear by nature. As a result you pay a certain price, when you edit that data, maintaining that serial format. Suppose you want to add data to the end of the time sequence for each plot? If the plots are simply concatenated, you have to shift the contents of the file around each time. If the plots are interleaved, then you introduce a bunch of file seeks when you read the data back out. If you decide each plot should be a separate file, then each is no longer tightly coupled to the series of time values, unless that time sequence is duplicated in each file... And the collection is no longer a "unit" on the filesystem. The data is separated into multiple units, in that case, for reasons that have no relationship with the nature of the data itself or the ways it is intended to be used.
Therefore, I think there is a certain merit to the idea of separating the concept of creating "contiguous, linear" allocations of disk storage from the concept of creating a "unit" in the directory tree. Forked files allow you to shift the problem of expanding or shrinking these allocations to the filesystem layer - arguably a more appropriate place for it than in the application itself.
Why should I have to "save file" in an editing application.
Because, uh, when you totally screw things up without realising you want to be able to abandon the current version and go back to the last good version? Because when you're editing on a laptop with an HDD you don't want it perpetually spinning and sucking up your battery power? Because saving is freaking slow in many applications even on an SDD?
In all likelihood the application is already doing "auto-saves" anyway, so it can do recovery if something goes unexpectedly and badly wrong. And then, in terms of resource usage, there's not much difference between an application that auto-saves when you close the window and one in which you manually hit "save" before closing the window. Pretty much the only technical snag is if a program tries to read the file while the application is still editing it - the other program may not get the latest version of the file. But mandatory locking on Windows pretty much blocks this scenario anyway, and advisory file locking could be used to provide the same sort of behavior on Linux. Or if you really want another application to be able to open the file while it is being edited, and be guaranteed the most up-to-date version - that is not an insurmountable problem.
Resource usage isn't the issue here. The issue is user interface. Users have been trained to respect a difference between "in-memory" and "on-disk" data. But that doesn't mean it's necessarily the best choice moving forward, just that it's what people are used to. The situation could be changed (and already has been changed, on some platforms) as long as users were made to understand the change in paradigm.
PalmOS did this: and while this was in part just natural for early iterations of the platform (everything was in RAM anyway) it was also part of an effort to streamline the UI. I don't know if the same was true of Newton, but I believe it's true of current smart-phones as well.
There are other problems with auto-save, problems that weren't addressed on PalmOS: for instance, what if the user makes a mistake, which winds up getting auto-saved to the file? If they had been using a more traditional application that required explicit user action to save the data, then maybe (even if the change was beyond the reach of their "undo" history) they would be able to revert to the copy of the file on-disk. Unless they reflexively hit "save" at some point, in which case they're, again, boned.
The solution, probably, is file versioning incorporated into the app itself. At a very simple level this could mean "undo" history is very long, and saved to disk. (This raises other problems, of course - we have seen cases where someone released a document that contained metadata that they didn't want to release... Certainly releasing a document that included a lengthy history of your edits could be embarrassing and possibly dangerous... So teaching people to strip that data out of a file before they publish it, and incorporating that into the UI is important in that case.)
Look them up. They already allow you to attach arbitrary metadata to a file. Most modern filesystems and user-level utilities support them already. They're even used as the underpinnings for security mechanisms such as POSIX ACLs and SELinux. Sure, there are issues with performance when you have *lots* of xattrs on a file, and that's a fruitful area of research, but we sure don't need some brand-new Microsoft-invented thing to deal with metadata.
The issue isn't the underlying mechanism that provides the capability for assigning arbitrary metadata to a file: rather, the important issue is how we treat that metadata in the UI.
You could think of it like this: it's not necessarily about redefining the filesystem-level notion of what a "file" is, but rather about establishing conventions for how we treat files and work with metadata.
Hmm, all the ID3 tags on my MP3s are in order. Get your files from good people, and you'll get good metadata.
Yeah seriously. And you know what? When I had an MP3 file whose metadata wasn't in order... It messed up the sorting in iTunes (back when I used iTunes) - and so I'd invariably edit the metadata to fix it.
If you have a UI that incorporates the idea of metadata and relies on it, and helps you work with it, you're much more likely to maintain it properly.
We’ll end up with 10 different standards, and no one will bother keeping metadata accurate on all their files. At best metadata is useful for a single person on a small subset of files where they find it useful. Everything else, the only metadata anyone is going to care about (and be bothered to enter) is title, which is served fairly effectively by the file name.
Metadata becomes a lot more useful as your collection of files grows, and as the UI develops to better take advantage of the information. To a certain degree this has already happened (and it kind of makes me wonder where they've been the last several years) - though of course you don't see it in every program yet.
Often, yes, as you say, title is all you're likely to need - or else the few other pieces of information you need (album name, track or episode number) are easy to incorporate into the file name or directory structure. But there may be cases where a piece of media doesn't have a title. This is often the case with large numbers of photos: there may not be a basis for providing a unique title to each image. So photo management software deals with this by encouraging organization and search via metadata (basic stuff like the date, but also tagged events, locations, and individuals). These files probably have filenames, but they're probably meaningless stuff like IMG_1234.JPG - just a sequence number provided by the camera. If you think about how you'd want file operations to relate to file name in that case, the filename actually doesn't come into consideration.
Consider, for instance, if you're moving one collection of files into another directory. And in that target directory there happens to be another IMG_1234.JPG. Do you overwrite that other file? The traditional answer would be "yes", since files are uniquely identified by their filenames. But the filenames have no particular value in this case: they are purely artificial. In this particular case that's probably not what the user wants. If anything they'd want the files to overwrite only if they're the same file.
There are other cases where there may be a sensible choice for the filename of each file, but it doesn't necessarily make a lot of sense to have it as a unique identifier. For instance, suppose you have a directory full of videos, and each has its title as its filename. Now suppose there's different variations of a few of the files: maybe two different versions of the same movie (a fansub and an official release) - maybe two different encodes (one for best quality, others to play on specific devices - and remember that different encodes could use the same container format, so the difference isn't necessarily ".AVI" vs. ".MKV" or whatever)... You could incorporate that information into the filename or directory structure - but it makes the filenames increasingly artificial, and the directory increasingly burdened with additional directories - different files treated as different items in the directory when in fact there is good reason to treat them as different versions of the same thing.
And then there's other cases where the traditional system of filenames works just fine and change will probably only foul things up - source code directories and so on in which it's very convenient to have a simple way to uniquely identify a particular piece of data, without having to address complicated questions like "but which one?" At most, you'd want something analogous to (or integrated with) a versioning system - you could specify "which one" if you wanted to but most of the time that question would already be answered.
So I think there are problems with the current system of files. There are cases where there is no useful information stored in the filename, and changing the filename to be both unique and informative is a much more cumbersome process than populating the file's metadata with relevant tags. There are cases where multiple "files" may just be different renderings of the same thing, in which case it may not be useful to treat them as separate entities at all, but rather as versions of the same thing. So I think it is worth thinking critically about how we approach the issue in the future.
I think it was pretty obvious that Netflix and Redbox were doomed once you start to see ads like "rent it 28 days before (insert competitor name here)!"
Right, can't underestimate the importance of that kind of value. By the time the competitor gets the movie, the world could be overrun by zombies.
The AppStore is for CONSUMERS, there will never be a full lockdown because forcing every software writer to release through the AppStore would kill OS X as a development platform. Even XCode requires a whole bevy of gnu utilities. OS X is a full fledged UNIX and as such, you'll always be able to do *Nixy things such as wget/curl a file, gunzip, configure and make.
I believe this is true for the time being. However, using words like "never" and "always" is a bit short-sighted. Desktop and laptop computers have traditionally been fairly open platforms in terms of what the user is allowed to do - but there is no reason to assume this will continue to be the case. If someone wants to change that, it will be a slow, difficult process to change user expectations to a point where they accept that loss of control - but it can be done. People have already accepted mobile phones as a fairly closed platform, and some contend that phone use is displacing most "personal computer" use - which means that the experience people get with their phones is redefining users' expectations of interaction with their computers.
OS X is currently a "full fledged UNIX" - this can change.
XCode requires a bunch of GNU stuff - that can change.
What do they gain from further restricting their platform? They gain a greater ability to simplify the user experience (which is a good thing for many users) and redefine various aspects of the OS that could be hard to do otherwise... And they gain status as a gatekeeper for the platform, a middleman who can extract money for every piece of software sold on the platform - much like what they enjoy on the iPhone platform, or what game console manufacturers enjoy.
One possible approach would be to give developers the same level of control they have now - but marginalize them. Charge them an extra $300 for the version of OS X that lets them do developerry things, or block developer machines from accessing the app store (apart from developer tools) - things like that. Things that would yield the desired level of control over most Mac systems, simply because most users wouldn't want the disadvantages (additional cost or reduced capabilities) that come with a development-capable machine.
I hesitate to say "Apple could do such-and-such" because I feel like that conveys the idea that I think this is likely to happen in the near future. My point is that it could, and it's silly to assume that it won't. The landscape of computing is changing, as it is bound to do over time. It's easy to assume that the status quo is some static, unchangeable thing, but it really isn't. Within the bounds of what users are willing to accept (even grudgingly, at first), the company in control of the platform can do whatever they like.
There's nothing wrong with the sandboxing model per se. It's probably the only way to make our computers more secure. That Apple is moving in that direction should not be surprising: they make idiot-ready software (also known as good software)
I take exception to this.
"idiot-ready" software is good software... for "idiots".
(Of course, they're not really idiots, most of them - they're regular people who desire a simple level of interaction with their computer. But I'm just running with the "idiot-ready" terminology there.)
That approach to software design is "one size fits most" - but it's not "one size fits all" because the limitations of a simple UI will inevitably interfere with (or at least fail to support) something that someone is trying to do. When your expectations and skills pass a certain threshold, a simple UI is not necessarily a good UI.
Nicely done.
nuff said
It's too cold for sand worms. Besides, sand worms would have to contend with the Ice Warriors, and they'd probably find the waters of Mars to be rather inhospitable as well.
- NO! Self-indulgence forever! -- Ok, but social acceptance will be illusive.
You have to be careful of that illusory social acceptance...
i was there a few months ago and they still had the BASIC stamp : /
Well, of course they did, nobody bought it! :)
Personally, I always thought it was kind of cool that they had a microcontroller development kit in the store, even if it was the BASIC stamp. Maybe I should revisit that opinion, since the BASIC stamp is so damn old at this point... But when I first saw it, it was at a time when I'd really lost all confidence in Radio Shack as a place for electronics hobbyists, and I took this as kind of a positive sign.
Nowadays, though, Arduino is clearly something they have to be on board with if they want to address hobbyists.
Admitting you don't know something is admirable. Not knowing can be remedied. It is a lot better than pretending you know something when you don't.
Being actually proud of not knowing something is unfathomable to me. Nobody goes around boasting they can't read, or don't know how to use flush toilets. I don't know whether to pity them or despise them or both.
Pride in ignorance is part of the basis of anti-intellectualism and the backlash against science. There's a feeling of insecurity that comes from confronting the fact that other people know a lot more more than you do about how the world works, and a knee-jerk reaction to dismiss such people as elitists, or claim that the products of generations of research are worth no more than the knowledge and beliefs they possess. Some people can't tolerate the feeling of being inferior to someone else (and I can relate to that, personally) and respond by trying to boost themselves up while dragging others down.
The process of boosting themselves up often doesn't involve real self-improvement, rather just an assertion that they are good enough already. Hence, pride in ignorance.
It seems to me that by putting metadata into the filesystem, you're creating some big problems with compatibility: different filetypes need different metadata. For instance, a PDF file might have information on author, title, etc. A jpeg file might have EXIF camera settings. Having the filesystem deal with metadata seems like it's pushing this stuff down into the OS, where it really should be left up to apps.
But what we're dealing with now is metadata very much as an OS-level concept. I think right now the implementations have a bit of a "strapped-on" feel, but it's going to become more and more central to how OS UI works.
Also, what if people decide they want different metadata? Back when jpegs were first made, they didn't include EXIF data, but now they frequently do thanks to the proliferation of digital cameras. Presumably, the standard was modified to allow this. But changing the standard is easy with metadata encapsulated in the file itself; just have a version number in the file heading saying what standard the file conforms to, and apps will read this and interpret the data accordingly. Changing an application or two is a lot easier than patching the whole OS to deal with a change in metadata standards.
It really isn't. At least if you're patching the OS, you can just make the change once, instead of again and again for every application that uses the file type. How many applications work with video files, or images?
OS-level metadata also tends to be very flexible. xattr support on Linux, for instance, lets you store name/value pairs with whatever name/value you want. There are limitations (I think a maximum value size limit measured in kilobytes, at least on some filesystems - so it's not like a full "file fork" implementation at this point) - so pretty much, if you want to add a new field, you just add a new field. The same is actually true of most forms of metadata stored within file contents as well - at least the modern ones. I think if a metadata system doesn't have the approximate flexibility of XML then it's pretty much rejected. :)
Or, what about files made by specialized programs that not many people use? Why should OS makers have to deal with metadata standards for every filetype in existence? The way it is now, only people who write apps dealing with a particular filetype have to deal with metadata standards.
That is not the way it is now. Desktop indexing (present in Windows, OS X, and at least optionally in Linux) monitors the filesystem, re-scanning the in-file metadata when a file is modified, so it can build a central database for quick searches. So the indexing system needs to know how to read these different file types.
Taking the metadata out of the file creates a lot of complexity, without any significant gain that I can see. Your examples of bittorrent files and files with changed metadata not md5-matching others just doesn't seem to be enough of a problem to warrant all these changes. In fact, these problems can be easily fixed by fixing the tools that use these files; BitTorrent, for instance, could be modified so that certain popular filetypes (e.g. video files like avi and mkv) are recognized by the tools and the metadata ignored when creating an md5sum. Modifying a tool that only some people use is a lot easier than modifying an entire OS.
I wouldn't exactly call that "easy", personally... :) Maybe you're right and my examples could be better. But it addresses a general issue that metadata is not conceptually part of the file contents - that's the whole point of metadata. Like the filename, it's just there to tell you what's in the file. If you change the filename or date stamp, it doesn't affect the file's contents. So by the same logic I'd say search tags and so on shouldn't be part of file contents either.
Concepts don't always m
But you could come up with a million different examples of data, and how they are handled has to be on a application level because only the application knows how to deal with the data.
Well, I take your point, that it doesn't necessarily make sense for the OS to get too heavily involved in what would normally be application-level decisions. However, I think more flexibility in the structure of files could be useful. Give applications the tools and let them decide how to use them. Providing a feature like file forks is awkward because most software doesn't currently deal with it (as you point out) and there's a little bit of a technical challenge in implementing it and a logistical problem in getting it into all the various filesystems (the ones where it's possible to do so, anyway) - but it is not an insurmountable problem, nor is it a slippery slope into ever-increasing complexity, or a path leading to an unavoidable fate of excessive OS involvement in application file storage strategies. It is one useful organizational tool, allowing an application to have multiple "sequential byte range" abstractions within something that's treated as a single unit on the filesystem. The difference in implementation between file forks and directories would be very minor. The major difference would be in how UIs treat the "forked file".
And there is a reason the files are linear series of data, because that is what HDs are as well.
To the extent that this is true now, it is becoming less so over time. Hard disks aren't linear by nature - they have multiple platters, for starters, so they'd be more like multiple contiguous ranges. Then there's firmware in the drives that maps around bad sectors, quietly substituting other areas of the disk. One can certainly still treat the drive as a linear range of storage space, and it's a convenient way of dealing with the disk, but it's an abstraction that we're very quick to abandon... Filesystems, for starters. If you have a directory of files, you don't want to think about that in terms of sequential storage on disk. You want to be able to copy and move and erase and create them and never care about where on the disk they go. The filesystem layer hides the abstraction of the disk as a sequential thing, and then reintroduces it at the file level.
And there's no guarantee that the "sequential" file data even will be "sequential" on-disk. We try to minimize the fragmentation of files, but the "sequential" nature of the file is, again, really just an abstraction. We could exploit that - tell the filesystem to insert a block of disk space into the middle of a file, and the filesystem wouldn't really have to move any data around to perform an insertion - but as far as I know that's not a supported operation at the application level on any OS. So instead we read all the data out of later parts of the file into RAM, then write it back to disk somewhere else - all for an insertion operation that could be handled much better by the filesystem.
Not that a file cannot be broken apart into different sections to fit/optimize performance but at the application level they have to be considered linear series of data if only because every programming language of earth is set up to read files linearly.
Things change.
I mean, I get your point here, too. I wasn't a Mac user back in the day, but from what I've seen interoperability was a bitch because of file forking. And that would still be true today.
But I can't accept "because that's the way it's been for 30 decades" as an argument for why a design choice is good. Things will change in the future. I don't know how, or when, but it's bound to happen. Being a part of that change, rather than being left behind by it, requires openness to new ideas. Even the most fundamental concepts of computing, sooner or later, will be subject to revision.
And all this shows why you don't need the filesystem to track metadata, all you have to do is embed it into the file.
Well, my post really wasn't addressing that question at all. My post was about whether using metadata, as opposed to directory structure and filename, to find data was a reasonable sort of UI, or if people's tendencies to be lazy about writing metadata would undermine that too much. My point was that if the system is well-designed around the use of metadata, users will tend to keep their metadata well-ordered, because in that case it's actually useful and easy to do so.
To address the point of whether metadata should be part of the file structure, or adjacent to it - I think there are advantages to each approach. Presently, there's a lot of infrastructure that's just not geared to dealing with metadata that's not stored as part of the file itself. But if you copy an MP3 file, the ID3 tag will be preserved, because it's there in the file structure. So at present that's a definite advantage, and not one to be underestimated.
There are disadvantages to bundling the metadata: for starters, if you have two files with identical data but different metadata, tools like "diff" or "md5" would reflect that difference. Or if you modify a bit of metadata, you're changing the file's modification time as well. That could be undesirable. Suppose you download a file via bittorrent and tag it according to your preferences - you won't be able to seed the torrent from that file, because its checksum will have changed. Or what if you want to tag an HTML file with metadata, or some other file type for which metadata either isn't supported, isn't adequate, or you just plain don't want it there in the file contents? The reason why it's called metadata is because it's data in reference to the primary data of the file... not part of the primary data of the file. I don't claim that this, by itself, is a conclusive argument in favor of filesystem-level metadata but I hope you take my point that there is a logical basis supporting its separation from the primary data stream.
There's also the maintenance issues around supporting each new type of metadata for each new file format as it's introduced - and if, as part of the OS design, you make some decision about metadata or how it's used that doesn't fit well with how it's stored in a particular file format, then resolving that disparity could be a bit of a headache for the implementers as well as anyone who has to use that UI. If you provide metadata as part of the filesystem, its format can be changed to suit the way it's being used, and these changes can be transparent to applications and users.
But I think the bigger issue, and the main thrust of the articles and the main focus of current work in improving utilization of metadata in the UI, has more to do with how metadata is presented to the user, rather than whether it's stored in the file or adjacent to it. The indexing systems in present use can use both approaches: metadata within files for file formats the indexing system is designed to specifically support, and filesystem-level metadata for others. From a pragmatic standpoint that's probably the way to go: use file-level metadata where it's appropriate, use filesystem-level metadata where it's appropriate, and just do your best to resolve disparities as they crop up. It's hard to effect this kind of change, so there are always advantages to an approach that provides a smooth transition.
I looked at this problem once, in the context of distributed file systems. I'd divide files into the following categories, with slightly different semantics for each. This can be fitted into the standard UNIX/DOS/Windows model, but it resolves some issues that result in programs doing elaborate workarounds to get correct file semantics.
Unit files A unit file is only meaningful once it has been completely written and closed. Once closed, it will never be rewritten, only replaced. Most files are unit files. The file system should guarantee that 1) unit file replacement by a new version is an atomic operation, 2) unit files not properly closed do not replace old versions, so that when a program fails while writing a unit file, the old version remains undamaged, and 3) a unit file being written is not visible to other processes until closed.. This was a feature of some mainframe operating systems. Because the UNIX file system concept doesn't have this, there's much fussing around with ".part" files and renaming strategies to achieve unit file semantics.
Managed files Managed files are written and read in a random access fashion, but only by programs which understand their format. Managed files typically contain databases of some type. The file system should guarantee that 1) managed files can be opened for full or partial exclusive use, and 2) additional functions for insuring that file writes are flushed from cache to disk are available.
Managed files have the most complex semantics, but not many programs use them. The ones that do typically go to a lot of trouble to get the file semantics right, to maintain database integrity. Read what SQLite needs from a file system to get a sense of how managed files need to work.
Scratch files Scratch files do not outlive the process group that created them, and are invisible outside that process group. They can be read and written freely, but do not have permanent existence in the file system. The file system should guarantee that when the process group goes away, so does its scratch files, so as not to clutter up the file system. This would stop the accumulation of abandoned temporary files.
We have everything or nearly everything we need to implement this - though admittedly it could be a bit cumbersome...
"Scratch files" can be created by creating a file, opening it, and unlinking (removing) it. The file continues to exist until the last open filehandle to it goes away. But since it's unlinked, it has no permissions and no one else can open it. But if you want to grant a filehandle to that file to another process, you can either fork() and the new process will get a copy of the filehandle, or you can send an open filehandle to another process via... a Unix Domain socket? (IIRC. I know there's a mechanism, I don't remember if Unix Domain Sockets are it. The dbus system uses filehandle passing for certain things, I believe.)
"Unit files" would be more or less the same thing: create the new version of the "unit file" with a temporary name and unlink it from the filesystem. Processes opening the file will see the old copy - until you're ready to commit the new version, at which point you unlink the old version and link the new version to the old filename. Anyone who opened the old version will continue to see it until the last open filehandle to it is closed. The main bit that's missing, I guess, is that this isn't an atomic operation. If you unlink the old file and then somehow fail to link the new version to the old filename, the file momentarily doesn't exist (until you link the old one back again...) And you don't get the "commit all or revert all" behavior you describe for a batch operation.
It's worth noting that "Unit files" take the same amount of resources as the current scheme but you lose the ability to peek at the new version before it's committed...
I'm not
If only we had Steve Jobs to solve this problem for us. :-(
Hey, he had his chance. How long was he with Apple after his return? Ten years or something? He ushered in OS X and the shift to Intel - both of them representing fairly extensive breaks with previous Apple products, either of which could have been a great time to tackle something like this...
Apple brought us Spotlight, I guess. It's representative* of the way things have been going: not redefining the filesystem, but building a database of file information and using it for searches.
(* I don't know who pioneered the concept or what implementations came first, so I say only that Spotlight is an example of filesystem indexing.)
People who say that hierarchical filesystems suck probably have a big mess on their table in real life.
I have never tried organizing my table hierarchically. Tell me, where do I put my cell phone in that case? Do I group it with my desk phone, because it's a phone, or with my computer, because my phone is also a little computer that's plugged into my big computer?
When I download something from the web, it goes into ~/Downloads. I don't have to waste time telling the system what it is, I don't have to figure out where the system put it and if I want to see all the files I downloaded I just 'ls ~/Downloads'.
But sooner or later you're probably moving them somewhere, right? The analogous procedure in a "database" filesystem would be to recategorize the item from "uncategorized recently downloaded item" to something that'll be more useful in the long term. One important thing to remember is that right now tagging and categorizing a file is a chore because that's not something a lot of UI is designed to do... and stashing a file in a directory hierarchy is relatively easier because it's something we're used to, and something UIs are currently geared toward. That's also the only reason sticking files in a database could reasonably be called "hiding" them. It's just a question of what the UI is geared toward. (In the case of folks like me, and you apparently, the "UI" in this case would include the command shell and various mechanisms used to address files in other programs as well...)
My personal feeling is that the usefulness of a "database filesystem" would be greatest in (and perhaps limited to) certain domains. Media files, hell yes. Source code... Well, apart from version control I really don't think so. :) But having the hierarchy indexed could certainly be useful ("find me the files that implement this method of this virtual base class - because some knob on the programming team doesn't believe in grouping class definitions in a sensible way"... of course some IDEs already do this independently of any system-wide indexing support)
The thing with media files is that you tend to wind up with a huge collection of 'em. They usually don't need to reference one another the way source files do, they're self-contained and there's not always a good way to name them. Video and music files can usually be named with their title, of course, and it's not hard to put them into a hierarchy, but the hierarchy isn't always useful. If what I want is to play "Doppelganger" - I'm not likely to have to disambiguate a request like that. I can specify the full path from the base directory where I keep all my music, but I could as easily skip that. (And it's not that hard in a shell that supports the ** notation for "find" like searches during globbing...) But if I take a bunch of photos, there may not be any value to giving them filenames at all. It's more useful to organize photos by things like date and tags. "IMG_0286.JPEG" isn't useful in any way, and coming up with unique titles for images that may not even merit a title would get a bit crazy. The filename becomes an afterthought in that case - a necessary "evil" if it's something you have to deal with manually. (If it's dealt with automatically, the filename could still be useful as a unique identifier.)
I just want one thing: a file system that is part database for fast file searches. I don't want to manually build indexes or any other bullshit just look at the file table and give me my fucking file. Even if you had 100,000 files with file names of 256 characters, its only 2.5 MB, how long does that take to parse? Maybe I don't understand file systems but even a 10 MB file table should only take a few seconds to scan. When I do a search of a directory or entire disk with tens of thousands of files it sometimes takes a minute or two. The disk is thrashing away as if the program is looking all over for the file names. Shouldn't they all be in one place pointing to where they are on disk?
We pretty much already have this. The way it works in practice is that there's some service on the machine that provides indexing, maintaining a central database of metadata. When a file changes, the metadata is re-scanned and the index updated. Then you can use the index to search for things.
I know this exists for Linux but I don't know to what extent it's actually supported by applications. (I never use the feature.) On Windows, these days, file manager windows show columns containing metadata fields (unless you turn that off - haven't had much luck so far, actually) and you can't swing a dead cat without hitting a search field.
It's not incorporated at the filesystem level, but for the most part it doesn't really need to be. As long as the feature is there, and you can rely upon it being there, and applications actually take advantage of it and work with it, it's just as good. Well, very nearly.
One thing I think could be improved is that sometimes file names just aren't meaningful. Filenames from a digital camera, for instance, tell me very little - and renaming the files, while keeping the names unique and making them meaningful is not always easy. I could have two different 003.jpeg's in two different directories with completely different contents. If I move the contents of one directory into the other, I don't really want one to overwrite the other, because that would be dumb. That filename is entirely meaningless, the only reason the file even has a name is because the filesystem requires files to have unique names. But that could be addressed at the UI level (and has been, on Windows anyway, which is how you wind up with things like "003 copy copy copy copy.jpeg") so it doesn't necessarily require a change to the underlying file paradigm.
Wouldn't it be possible to make a "universal" file container, in that any other file type could be imbeded with a text file that listed: what type of file it is, what program it is associated with, owner, creation/mod dates, and especially, tags and other types of metadata?
Do you know that they tried this already?
In 1985? (Well, I'm speaking specifically of IFF - but there were other efforts. Mac's file forks were kind of the same sort of thing, except that they maintained the abstraction all the way down to the filesystem layer.)
Now, just because they tried it already and more or less failed doesn't mean it couldn't work... But they were in a much better position in 1985 to make this work than they are now (we've gone too long and come too far without a "universal format", it'd be nearly impossible to get people to embrace that kind of change now...) so I think it's kind of a lost cause.
I found it absolutely fascinating, personally, when I read one of the original documents on IFF. The ambition, the hubris perhaps, with which they were trying to guide the future of personal computing. They weren't just seeking to create "a" format, they were aiming for it to be the format. And it would have been capable of just about everything you suggest - embed a FORM of whatever you like in a LIST, put in descriptive chunks, etc... I believe Amiga embraced the concept to a fairly high degree.
There are various historical and technical reasons why it didn't really pan out. I think one of the big ones is simply that IFF wasn't the right format for everything. Perhaps no one format can be. Among other things, IFF required four-byte payload sizes appear at the start of each chunk. That limits a chunk (and therefore a file) to 4GiB maximum (not such a big deal in 1985 or even 1995... But these days it'd be an unacceptable limitation) - but another problem is that sometimes you need to write out some data and you just don't know how big it's gonna be. Streaming audio and video are a pretty good example. You can discretize the stream, populate it with known-size chunks, but you don't know the size of the whole stream until it ends.
I think general-purpose data formats are a good thing - but I believe it's very important to consider that there may be cases where a particular format just isn't right for the problem. And that brings us back more or less to the current scenario, in which different applications tend to have totally distinct file formats, not even sharing an overall containment structure. From that perspective, it's wasteful to continue re-inventing metadata storage for each new file format that comes along, and wasteful to implement all these different methods of reading metadata out of different application-specific file formats. There's also the danger that we will want to change the format of the data in the metadata fields (just as we shifted from "whatever local variant of ASCII your region uses" to mostly using UTF-8 - which still isn't necessarily adequate for all regions, incidentally) Another all-new text encoding so soon after Unicode's introduction isn't too likely, but the OS, in defining how these metadata fields are defined and used, could change the requirements that go beyond what the container format can provide (for instance, storing data that goes beyond the limit of a particular format's "metadata region" size limit, or storing something that's better encoded in some binary form other than text. Decoupling the encoding of metadata from the definition of file formats eliminates a bunch of redundant work and leaves us more room to change what metadata contains and how those contents are used, as we get a better idea of how, ultimately, it will be used as the dust settles around this whole issue.
A file is simply a linear series of data. Period. End of story.
Engineers and interface designers have the ability to determine how these abstractions are implemented at the low level, and how they are presented to the user at the conceptual level. While the abstractions of the past and present are very useful, it is not sensible to assume they will continue to be the best course in the future.
Let's look at an example in kind of the middle ground, between implementation and conceptualization. Suppose you have a file containing plots of some value with respect to time.
Now, each plot, by itself, is very linear by nature. However, the plots together are not. They run in parallel. You could simply concatenate them or interleave them to turn them into a serial file, but the data is not linear by nature. As a result you pay a certain price, when you edit that data, maintaining that serial format. Suppose you want to add data to the end of the time sequence for each plot? If the plots are simply concatenated, you have to shift the contents of the file around each time. If the plots are interleaved, then you introduce a bunch of file seeks when you read the data back out. If you decide each plot should be a separate file, then each is no longer tightly coupled to the series of time values, unless that time sequence is duplicated in each file... And the collection is no longer a "unit" on the filesystem. The data is separated into multiple units, in that case, for reasons that have no relationship with the nature of the data itself or the ways it is intended to be used.
Therefore, I think there is a certain merit to the idea of separating the concept of creating "contiguous, linear" allocations of disk storage from the concept of creating a "unit" in the directory tree. Forked files allow you to shift the problem of expanding or shrinking these allocations to the filesystem layer - arguably a more appropriate place for it than in the application itself.
Why should I have to "save file" in an editing application.
Because, uh, when you totally screw things up without realising you want to be able to abandon the current version and go back to the last good version? Because when you're editing on a laptop with an HDD you don't want it perpetually spinning and sucking up your battery power? Because saving is freaking slow in many applications even on an SDD?
In all likelihood the application is already doing "auto-saves" anyway, so it can do recovery if something goes unexpectedly and badly wrong.
And then, in terms of resource usage, there's not much difference between an application that auto-saves when you close the window and one in which you manually hit "save" before closing the window.
Pretty much the only technical snag is if a program tries to read the file while the application is still editing it - the other program may not get the latest version of the file. But mandatory locking on Windows pretty much blocks this scenario anyway, and advisory file locking could be used to provide the same sort of behavior on Linux. Or if you really want another application to be able to open the file while it is being edited, and be guaranteed the most up-to-date version - that is not an insurmountable problem.
Resource usage isn't the issue here. The issue is user interface. Users have been trained to respect a difference between "in-memory" and "on-disk" data. But that doesn't mean it's necessarily the best choice moving forward, just that it's what people are used to. The situation could be changed (and already has been changed, on some platforms) as long as users were made to understand the change in paradigm.
PalmOS did this: and while this was in part just natural for early iterations of the platform (everything was in RAM anyway) it was also part of an effort to streamline the UI. I don't know if the same was true of Newton, but I believe it's true of current smart-phones as well.
There are other problems with auto-save, problems that weren't addressed on PalmOS: for instance, what if the user makes a mistake, which winds up getting auto-saved to the file? If they had been using a more traditional application that required explicit user action to save the data, then maybe (even if the change was beyond the reach of their "undo" history) they would be able to revert to the copy of the file on-disk. Unless they reflexively hit "save" at some point, in which case they're, again, boned.
The solution, probably, is file versioning incorporated into the app itself. At a very simple level this could mean "undo" history is very long, and saved to disk. (This raises other problems, of course - we have seen cases where someone released a document that contained metadata that they didn't want to release... Certainly releasing a document that included a lengthy history of your edits could be embarrassing and possibly dangerous... So teaching people to strip that data out of a file before they publish it, and incorporating that into the UI is important in that case.)
Look them up. They already allow you to attach arbitrary metadata to a file. Most modern filesystems and user-level utilities support them already. They're even used as the underpinnings for security mechanisms such as POSIX ACLs and SELinux. Sure, there are issues with performance when you have *lots* of xattrs on a file, and that's a fruitful area of research, but we sure don't need some brand-new Microsoft-invented thing to deal with metadata.
The issue isn't the underlying mechanism that provides the capability for assigning arbitrary metadata to a file: rather, the important issue is how we treat that metadata in the UI.
You could think of it like this: it's not necessarily about redefining the filesystem-level notion of what a "file" is, but rather about establishing conventions for how we treat files and work with metadata.
Hmm, all the ID3 tags on my MP3s are in order. Get your files from good people, and you'll get good metadata.
Yeah seriously. And you know what? When I had an MP3 file whose metadata wasn't in order... It messed up the sorting in iTunes (back when I used iTunes) - and so I'd invariably edit the metadata to fix it.
If you have a UI that incorporates the idea of metadata and relies on it, and helps you work with it, you're much more likely to maintain it properly.
We’ll end up with 10 different standards, and no one will bother keeping metadata accurate on all their files. At best metadata is useful for a single person on a small subset of files where they find it useful. Everything else, the only metadata anyone is going to care about (and be bothered to enter) is title, which is served fairly effectively by the file name.
Metadata becomes a lot more useful as your collection of files grows, and as the UI develops to better take advantage of the information. To a certain degree this has already happened (and it kind of makes me wonder where they've been the last several years) - though of course you don't see it in every program yet.
Often, yes, as you say, title is all you're likely to need - or else the few other pieces of information you need (album name, track or episode number) are easy to incorporate into the file name or directory structure. But there may be cases where a piece of media doesn't have a title. This is often the case with large numbers of photos: there may not be a basis for providing a unique title to each image. So photo management software deals with this by encouraging organization and search via metadata (basic stuff like the date, but also tagged events, locations, and individuals). These files probably have filenames, but they're probably meaningless stuff like IMG_1234.JPG - just a sequence number provided by the camera. If you think about how you'd want file operations to relate to file name in that case, the filename actually doesn't come into consideration.
Consider, for instance, if you're moving one collection of files into another directory. And in that target directory there happens to be another IMG_1234.JPG. Do you overwrite that other file? The traditional answer would be "yes", since files are uniquely identified by their filenames. But the filenames have no particular value in this case: they are purely artificial. In this particular case that's probably not what the user wants. If anything they'd want the files to overwrite only if they're the same file.
There are other cases where there may be a sensible choice for the filename of each file, but it doesn't necessarily make a lot of sense to have it as a unique identifier. For instance, suppose you have a directory full of videos, and each has its title as its filename. Now suppose there's different variations of a few of the files: maybe two different versions of the same movie (a fansub and an official release) - maybe two different encodes (one for best quality, others to play on specific devices - and remember that different encodes could use the same container format, so the difference isn't necessarily ".AVI" vs. ".MKV" or whatever)... You could incorporate that information into the filename or directory structure - but it makes the filenames increasingly artificial, and the directory increasingly burdened with additional directories - different files treated as different items in the directory when in fact there is good reason to treat them as different versions of the same thing.
And then there's other cases where the traditional system of filenames works just fine and change will probably only foul things up - source code directories and so on in which it's very convenient to have a simple way to uniquely identify a particular piece of data, without having to address complicated questions like "but which one?" At most, you'd want something analogous to (or integrated with) a versioning system - you could specify "which one" if you wanted to but most of the time that question would already be answered.
So I think there are problems with the current system of files. There are cases where there is no useful information stored in the filename, and changing the filename to be both unique and informative is a much more cumbersome process than populating the file's metadata with relevant tags. There are cases where multiple "files" may just be different renderings of the same thing, in which case it may not be useful to treat them as separate entities at all, but rather as versions of the same thing. So I think it is worth thinking critically about how we approach the issue in the future.
Matrix by Dreamworks? Did not know that.
My ridiculous and probably un-funny joke is not strictly bound by facts and general matters of reality.
But, yeah, I kind of forgot we're talking specifically about a Netflix-Dreamworks deal here rather than Netflix in general. My bad.
I think it was pretty obvious that Netflix and Redbox were doomed once you start to see ads like "rent it 28 days before (insert competitor name here)!"
Right, can't underestimate the importance of that kind of value. By the time the competitor gets the movie, the world could be overrun by zombies.
Yes but by then the deal will have changed and you will only be able to see videos from before 1999.
On the bright side, that gets you all the Star Wars films, plus The Matrix... They really should have made some sequels to that.
All about Dreamworks...