Rethinking the Nature of Files
An anonymous reader writes "Two recent papers, one from Microsoft Research and one from University of Wisconsin (PDF), are providing a refreshing take on rethinking 'what a file is.' This could have major implications for the next-gen file system design, and will probably cause a stir among Slashdotters, given that it will affect the programmatic interface. The first paper has some hints as to what went wrong with the previous WinFS approach. Quoting the first paper: 'For over 40 years the notion of the file, as devised by pioneers in the field of computing, has proved robust and has remained unchallenged. Yet this concept is not a given, but serves as a boundary object between users and engineers. In the current landscape, this boundary is showing signs of slippage, and we propose the boundary object be reconstituted. New abstractions of file are needed, which reflect what users seek to do with their digital data, and which allow engineers to solve the networking, storage and data management problems that ensue when files move from the PC on to the networked world of today. We suggest that one aspect of this adaptation is to encompass metadata within a file abstraction; another has to do what such a shift would mean for enduring user actions such as "copy" and "delete" applicable to the deriving file types. We finish by arguing that there is an especial need to support the notion of "ownership" that adequately serves both users and engineers as they engage with the world of networked sociality. '"
I'm sorry, but MS issuing a paper on the "issues of file ownership" and the cloud sends a little chill up my spine. Makes me think that engineering may not be the only impetus behind their paper. It also makes me wonder if someone isn't looking to take a little more "ownership" of what has traditionally been considered *my* data.
It's bad enough I'm already forced into "buying" software and media that I can never resell. Now they want my fucking Word files too I guess.
SJW: Someone who has run out of real oppression, and has to fake it.
Sounds familiar...
I couldn’t make it through the first paper. It came across as meandering and very academic. Didn’t try the second
Either way, of all the stuff that is currently broken, files are one of the few things that still mostly work. Yes would be nice to have more standardization and maybe metadata, but I don’t foresee it happening. And yes users sometimes get confused, but the generally figure stuff out.. and nothing described in the article seemed any more intuitive and would probably be just as miss-understood by users.
We’ll end up with 10 different standards, and no one will bother keeping metadata accurate on all their files. At best metadata is useful for a single person on a small subset of files where they find it useful. Everything else, the only metadata anyone is going to care about (and be bothered to enter) is title, which is served fairly effectively by the file name.
I've always thought it would be useful if you could mark as file as automatically deleting at a certain date. If you create a temporary file, it would be nice to flag it as "delete after 60 days" so it doesn't need attention in the future. (The same functionality would be really useful for emials...I want to save this email until after the event (or whatever it's about) and then have it automatically deleted.) I once saw the file functionality on a custom Cray operating system in the 1977.
The current understanding of a file is too conducive to local storage and user ownership for giant corporations who want to assume control of our data and rent it back to us for monthly fees or advertising intrusions.
The delete function is a feature. It means I do not want that data to exist any more. I wonder why Google or Facebook might have a problem with that.
How is this any different from Files-11 (VMS native FS), NTFS, or HFS+?
We suggest that one aspect of this adaptation is to encompass metadata within a file abstraction
this before? Are resource forks coming back into vogue?
I am Slashdot. Are you Slashdot as well?
really?
How is this any different from Files-11 (VMS native FS), NTFS, or HFS+?
(re-posting my AC comment, logged in this time)
Is a file the (a)original data or (b) the original data + annotations data in other databases. What does a user expect when he/she creates a "copy" of a file. I have never seen a discussion like: "I downloaded a picture from facebook and now all the likes and comments are missing!" Suppose it will pop-up in the near future..
Have you ever heard of Unix? You know, that strange system were files are more than just collections of bytes.
Devices can be files, IPC can be files, even kernel hooks can be modeled by files...
Video of some good progressive thrash music
In the *NIX world, we don't have much problems with files, as everything is a file. But it's clear that when in Windows, a directory move is not atomic (each child is moved one after each other), I can understand they say current implementation is broken.
3... 2... 1...
Meta data would really help keep the porn collection sorted.
I like fuzzy folder structures where I can tag, or label files and find them in any tag/label.
Like one does with g-mail or photo managing software. If I have schematics for the pentagon- I want to be able to tag those files as "Pentagon" and "Schematics" and "Operation Zesty Lemon". No matter which tag I look under I can retrieve my files easily.
"That's the way to do it" - Punch
A file is essentially just a collection of data - no more and no less. To try and add attributes to that makes little sense and seems as futile as trying to say that each collection of molecules should have a tag saying what it is, who it belongs to and what it's for. Sure, you can add abstractions and structure on top of the basic form, but when you do that you are adding a layer - not redefining the basic building block.
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
Every so often, someone steps up to the plate to get rid of the file metaphor because people can't find their files.
But they don't need to abstract away the notion of files.
Here's what to do: Give us an unlimited Most Recently Used (MRU) list. That's both for files and folders. Not the 9 or so in OpenOffice. How much space would it take to save some inodes?
You should be able to go back in time and answer the question "What file was a I working on a week ago?"
If you do that, you might not even need continual disk-thrashing full indexing.
I'm not a lawyer, but I play one on the Internet. Blog
I don't see any need to change. - although the 3 letter filename extension to determine the type of file is getting a bit long in the tooth. (I was using an OS and filesytem in the late 80's that didnt have that problem.
... then I'll start looking for analogies other than a "file" (or something bsimilar like a notebook) to use with computers.
Thin about it. The objects we use most often books, physical files, CDs, musical instruments, notecards, kitchen gadgets, etc. All have a discrete identity that makes their representation by a file on a file system quite intuitive.
Only when reality starts presenting itself as something other than individual entities with their own discrete identity will most people move to a different paradigm.
Smells like MS is laying the foundation for a whole new tangle of patents.
I never thought I would see the "Paradigm Shift" return to the common corporate lexicon. Of course, there is also the "Paradigm Shift for Paradigm Shift's sake."
This, right here, is the kind of blue-sky thinking that can create a paradigm shift that will empower key contributors to cover all directions of the compass in the realization of the critical program objectives. The kind of solution that will be the result of joined-up thinking will easily land and expand across all verticals in a process-oriented organization. However it will be a key component of the storyboard to collect the buy-in from key stakeholders to ensure 100% coverage in gating milestones.
I read the entire paper (the second article), which was essentially an analysis of Apple software that concludes that "Apple write a lot to the hard drive and we don't know why" and "this raises more questions than it answers".
Can someone please explain if either article is actually proposing an applicable solution, or simply stating "things need to change!" like a 19-year-old Occupy Wall Street protester?
My compiler does not create OBJ files, just BINary directly. Actually, you usually go source code --> memory ready for execution.
My bad.
Vescere bracis meis.
test
They kinda sorta work, *if* you manage 100 file extensions. Forget the Ribbon, the other disaster from Office 2007 was the 'glorious basterd' new file names, docx xlsx and the others. But of course 'file extensions are too hard for users' so those differences get hidden. One of my 'mission critical' programs from work FINALLY added support for those filenames ... *this past April*.
So yeah, there's probably a scorpion barb in the Microsoft article.
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
including especially the question mark, quotation mark, colon, forward slash and asterix. For clarity and accuracy, any punctuation mark that's commonly used in everyday writing should be available for use in filenames going forward.
There isn't an issue with files. Files are essentially the atomic structures of the filesystem -- the dividing points between different pieces of content. You can add all the abstraction you want, but if you can't find Piece of Information X at the end of it, it's still a worthless abstraction. Redesign the file system, sure, but the nature of files isn't in question here, but rather how they're accessed.
First, they're using bloated programs on poorly optimized file systems and they then complain about performance.
Second, a better optimization would result if you took in to account what the file type was.You'd lose some compatibility, but you'd gain a surprising amount of performance. The solution has been sitting around for decades: Anyone remember the infamous Record Management System from DEC? It existed as a layer between the kernel and the user space.
It would answer the concerns of these researchers, but it would require a massive rewrite of all the programs that use the file systems.
We're headed back to the future...
Nearly fifty percent of all graduates come from the bottom half of the class!
Pretend for the moment that Microsoft has found a way of storing data that completely does away with the directory tree and file concept that have been a basic piece of operating system design since the 1970's. Now, to make Windows do that would require major changes and break backwards compatibility, so why would they possibly want that?
Well, imagine another family of operating systems had made files the key component of what they do, so much so that it makes practically everything look like a file, whether a network socket or a hardware device. It's even gone the extra mile on compatibility to support using a wide variety of other OS's file systems, including Windows' preferred file system, so that those who want to run multiple OS's on the same machine can do so relatively painlessly.
Now imagine that Microsoft wants to break that compatibility in an attempt to maintain its market position. Now, they're first try (which is a lot cheaper) is to occasionally redesign their filesystem so that the other family of operating systems has to adjust their compatibility layers. But those jerks seem to be keeping up with you, reverse-engineering what you did. So now, to really break compatibility, you have to go after the concept of having a file system, so that instead of something coherent that the other OS can build a driver for, you need special proprietary code to turn the gobbledygook on disk into something a user can read.
Of course, the only part of this that's really imaginary is that last bit. But my guess is that what they're aiming for is "Want to read data from a Windows machine? You need a copy of a certain Windows DLL running, which will only run on Windows."
I am officially gone from
While XML is annoying, it shows us the importance of both data and data describing that data (including "meta-data").
My guess is that we'll take a page from object-oriented computing and in the future, see data as stored only within object types, with associated description data and possibly transformation data (something like XSLT).
In particular, this would open up all file formats to the end user, as understanding the structure of a data object is a lot more sensible that hand-coding a parser for binary files.
The influence of the semantic web, object-oriented thinking, and the inevitable inclusion of high-capacity databases as part of the operating system (we already see this with LAMP as a popular platform not only for development, but for daily use) will drive this change.
Personally, I think it's about time. A file is a low-level format, basically a giant string of data between two points. We should not be using files as end users; that's for the operating system. And at the same time, we'd like our data to be there in a form we can manipulate, not dependent on file-types and specific applications.
Back in the 80s, there was more of this thinking but no one got it to catch on. The original Macintosh file system used a "data fork" and a "resource fork" for objects included with the file. There were other experiments, most notably Talient and OpenDoc (http://en.wikipedia.org/wiki/OpenDoc).
A good discussion of what open data formats might mean can be found here:
http://www.malcolmgroves.com/blog/?p=633
Do NOT "improve" the file. I'd like to continue to be able to use my computer and other devices.
Please do not read this sig. Thank you.
Look them up. They already allow you to attach arbitrary metadata to a file. Most modern filesystems and user-level utilities support them already. They're even used as the underpinnings for security mechanisms such as POSIX ACLs and SELinux. Sure, there are issues with performance when you have *lots* of xattrs on a file, and that's a fruitful area of research, but we sure don't need some brand-new Microsoft-invented thing to deal with metadata.
Slashdot - News for Herds. Stuff that Splatters.
Why do we even think in terms of files or more particularly "file operations"?
Why should I have to "save file" in an editing application. That's a hold over from the days of slow mass storage, where you don't want to take up time in the middle of your other work.
It took me three tries to get any meaning out of that 'quote from the first paper' mentioned above. Seems too much verbiage is spent on trying to prepare my brain to agree with the ideas before it actually tells me what the idea is.
As for the paper itself, I am nonplussed. A "file" is a sequence of bytes, with a defined start location, and length, recorded on a storage device to be retrieved from that storage device at a later date. What the paper describes doesn't change this idea, it insisting that every "file" should have a wrapper around it and users should not be able to access the "file" without the wrapper.
.txt, .doc?
There _has_ to be!
Otherwise the Mac OS X engineers will look like idiots for dismantling the Mac system of file data types in favor of using file suffixes for file content identification.
"Copy," "delete," and "ownership" being three points they're trying to address? Why does this sound like a submarine attempt to embed some sort of IP protection in the lowest levels and very concepts of files on computers, framing it all as merely a technical re-engineering of the "file" concept?
Liberty in your lifetime
A file is simply a linear series of data. Period. End of story.
I don't care where you store the ownership rights, the metadata, or what new fancy things you want to be able to do with files; That is not a ground breaking new concept.
Troll is not a replacement for I disagree.
Personally, I've found that the biggest issue with all the "metadata" systems that try to improve on the basic file/folder system is that they don't transfer anywhere. Send the file once through Samba, NFS, email, FTP, rsync or whatever and the metadata is lost. The only systems that actually get used are those that are embedded in the file, like EXIF for JPG, ID3 for MP3 and so on.
The stupid thing is that we didn't make that a generic part of all file formats, a simple key-value list appended to the file would do. But today that'd break almost everything, plus most things working on the file system would have to know that each file has a data and metadata part. Maybe use a compatibility layer for metadata-unaware applications, where they only see the data part?
That way we really could have a standard form of metadata. It might not cover every use but it'd sure cover a lot. Copy the file, copy the metadata (if you want, of course). Of course most of these researchers seem to want to get rid of the file altogether and replace it with some sort of cloud service, but I'd rather not. I'd rather know where I have my stuff and be able to put it where I want.
Live today, because you never know what tomorrow brings
away from my data!
need a free COBOL editor for Windows?
Yes I would. If I deliberately transmit a message to someone else, then I have no expectation of being able to 'untransmit' that message. The logic error here is thinking that files are like objects. They are not (only), they are also like messages. Big business wants files to be like objects so they can own them. Everyone knows they can't do it, and this effort will fail like all others, due to the nature of reality. Files are not objects.
Korma: Good
and the standard open, close, read, write, seek have been replaced by something else, there is no need to update the concept of files.
The filesystem is an important backend component of almost everything which is done in operating systems. Therefore, like TCP/IP, it should be layered, relatively stupid and stateless. This way, the capacity of it is easy to extend, reliability is easy to achieve, and the backing technologies will enjoy economies of scale.
History has borne this out. Don't fuck with this.
As far as "types," the problem is is that files can claim to be one type, but really be another type. Trusting files to represent what they say they represent is a security vulnerability. Since any decent program should be verifying data coming from an untrusted source such as a random file from a random location, you might as well let the program determine the type by looking at magic numbers (or gasp, supporting a standard such as XML) anyway.
So now on top of wondering if my backup DVDs will still be readable in 20 years, and if I'll have the right program to interpret the file, I have to wonder if the very concept of a "file" will remain stable over that time?
Ownership is stupid when it comes to files. Or to many other things. If a developer has made a 90% of config file on my system, is it his or mine? But that is not the question. The question is: Who *lovemaking* cares?
Nae king! Nae laird! Nae yurrupiean pressedent! We willna be fooled again!
"Quoting the first paper: 'For over 40 years the notion of the file, as devised by pioneers in the field of computing, has proved robust and has remained unchallenged. Yet this concept is not a given, but serves as a boundary object between users and engineers. In the current landscape, this boundary is showing signs of slippage, and we propose the boundary object be reconstituted. New abstractions of file are needed, which reflect what users seek to do with their digital data, and which allow engineers to solve the networking, storage and data management problems that ensue when files move from the PC on to the networked world of today."
They pretty much peppered the report with bullshit and buzz words to make "meta data" and "internet based storage" sound all new and shiny for the brain dead market droids and managers.
This reminds me of that MIT operating system hoax that was going to take current file system ideas and throw them out the window. Face it, how else do you organize bits of information? The concept of a file is simple: an organized arrangement of bits that contains data which can be moved, re-sized or deleted. How do you change that? The only thing that can change is the method in which they are stored on physical media (file system) or cataloged and indexed.
I just want one thing: a file system that is part database for fast file searches. I don't want to manually build indexes or any other bullshit just look at the file table and give me my fucking file. Even if you had 100,000 files with file names of 256 characters, its only 2.5 MB, how long does that take to parse? Maybe I don't understand file systems but even a 10 MB file table should only take a few seconds to scan. When I do a search of a directory or entire disk with tens of thousands of files it sometimes takes a minute or two. The disk is thrashing away as if the program is looking all over for the file names. Shouldn't they all be in one place pointing to where they are on disk? Maybe I don't understand file systems in general, someone care to explain?
And one thing that just popped into my mind is a better method to tag and store files. When I download a file or save a document/image/whatever I shouldn't have to dig through a huge directory hierarchy. I should be able to type the name of a directory and something along the lines of Google's auto complete or intellisense will begin to auto complete my search, regardless of what volume its stored on. As I type vacation.. it should list all directories beginning with that string or tag. Maybe I am ignorant of similar functionality for Windows and Linux. The tags and file/directory names should be system wide and accessible to all programs and commands that interact with files, not just a built in shell.
A file is an outdated concept. We should have objects, with attributes and relations (pointers) to other objects.
We should have an object-oriented database system, not files.
...through your fingers... Whenever M$ speaks of "ownership", I see M$ as thinking of it as theirs and you are renting it, even if you wrote the document from the ground up. I'll just pass on their Kool-aid....
Yes, that's how it's done. The name is also metadata that isn't even in the file, available for any file. And the directory it's in is metadata! And with symbolic or hard links, you can have the same file name metadata and content in different "directory" contexts!
Oh, PS, when you see a creation date later than the modification date, it's been copied and if you want to know which one is the most recent edit, you check the modification date: the copy won't supersede an earlier edit.
But I guess you don't like anything.
One argument the paper makes is the ability to export Facebook photos off Facebook, presumably onto your own personal hard drive. It would be a file with metadata that includes friend tags, comments, etc. Doesn't this miss the entire point they try to make in rethinking files for the Cloud? With everything available, wouldn't we lose the "export" grammar in the first place, not to mention personal hard drives?
I like the Unix approach most. I love it for 30 years already, beautiful and simple.
Looks to me like they are wanting to model "files" after "things", essentially abstracting away basic computing.
It will make IP much more of a reality than it is today.
Lessig was right on with "Code".
Blogging because I can...
I can see this as being valuable for corporate and academic use, as often having direct metadata can really help, especially if small changes to a file can wipe the metadata completely.
For that matter, if you take a look at certain types of files, such as ESRI shapefiles, picking out and parsing metadata is a chore, since it's all in XML and the schema is hardly ever 100% consistent. Having some form of implementation for metadata to be tagged into the files would be useful for this sort of thing. It would make projects such as GIS in the cloud a much more feasible system to create, support and scale. I used to work on a project where we needed to upload a lot of Geographical/geospatial data to a server, and the hardest part was always collecting and re-working the metadata. I think a structure like this could work, provided the service itself doesn't imply that users should own what they are using.
While I agree that cloud style data shouldn't be implemented for regular users (not counting services like dropbox, where it's a minor component), I do think that this sort of file restructuring can be beneficial for businesses and academics, and likely save a lot of money.
My first thoughts when reading this (which is not to discount the fact that I've thought about the subject many times before, including concepts like the resource fork in older mac systems, etc) are:
Why have files at all? Files are only there as abstractions because we are familiar with the physical concept of files and documents.
I'm sure I'm not the only one to think down these lines. Are there "files" in the volatile memory of your computer? Generally, no.. there are blocks of memory, with address pointers and such to chain them together in way they can be found. First of, instead of translating back and forth between memory and disk files, why not just have one huge addressing space and store everything in that, and let the system decide what to move to persistent storage. The hardrive, or whatever, is just a big cache/persistent store that you rarely think about (of course you'd want to be able to 'hint' the system about what should be persisted right now). Once you've done this, there are no files anymore.
Of course, you need to be able to locate things. And we have tools for this. They're called databases. Whether its a relational SQL database or a key-value store NoSQL type or even something else, they don't use the file analogy. The actual persisted storage unit would probably just be a database directly stored to the media using whatever formatting was optimal (rather than stuffing a database into a file..).
Instead of opening a file, you pose a query and the database finds the data for you. Really, not much different than now. open('filename','r') is really just a query too and could still work the same way in such a system for compatibility.
There's a lot of work to get to where this would be efficient I think, and requires changing how we think about some things. Letting go of preconceived notions of physical files etc. But that's happening more and more. Things are going all digital. It may be 10 years or more but I think it will happen...
At some point there will be no more "files" or "directories", there will just be information, and questions about the information.
-- Senior Software Engineer, Attorney appearance services, locallawyerapp.com.
Just wait until Gnome 3 and Unity start doing away with files! They've ruined the desktop, so they need somewhere to go next. Why not design a new user interface that doesn't use files? They'll make computers unusable, or die trying! They're on a mission.
Microsoft won't be adding this to Win8, by the way, because they've been trying to use a relational database for a decade or more as the Windows file system, and it's never gotten off the ground. That's probably music to the ears of Unity and Gnome 3, though, since they can make a huge mess disaster catastrophe out of the idea!
Ownership is just microsoftspeak for access control, it's a security feature. And the idea of metadata and access control in a filesystem is not really new.
While they might be new for MS these are certainly not new ideas. But at least a move in the right direction.
PhD's of course. Just because they have done all this study, doesn't mean the idea is necessarily bad.
I agree, one infinitely long tape ought to suffice for everybody.
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
http://en.wikipedia.org/wiki/Plan_9_from_Bell_Labs
Geeks like to think that they can ignore politics, you can leave politics alone, but politics won't leave you alone.-rms
So if some one sends me a .txt from a Windows machine file I need to find a Windows machine to open it, I can't open the damn thing on Linux with vim anymore.
I'd like to take this opportunity to point out the brilliance of the "file" command (in *nix). All its smarts, plus all the details mentioned in its manpage, are all I ever needed to know about any file's technical details. This BS from Microsoft is re-inventing the wheel, badly and foolishly, with suspiciously strange priorities. No surprise there.
The "file(1)" manpage is a great read, including potshots at SysV, BSD, and mention that it (or at least Debian's version) was written by a fellow Canuck (Ian F. Darwin).
FYI, a point & click interface to manpages:
xman -notopbox -bothshown &
Enjoy the odd behaviour of the Athena Widget Set's scrollbars. :-)
"Tongue tied and twisted, just an Earth bound misfit
Microsoft plans to reinvent the file system......again
Keep It Simple Stupid. A file is a container for digital data. Add a unique identifier (file name) to locate it. Add external meta data to describe it if you wish. But why does it have to be more complicated when it does nothing more than hold unspecified unstructured digital data? All that these more complicated proposed systems have accomplished is to spend a lot of money with nothing yet to show for it, and delay new operating systems for years.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
Is this just an attempt to rescue Bill Gates' besmirched reputation as an technical visionary?
Have you got your LWN subscription yet?
If only we had Steve Jobs to solve this problem for us. :-(
I looked at this problem once, in the context of distributed file systems. I'd divide files into the following categories, with slightly different semantics for each. This can be fitted into the standard UNIX/DOS/Windows model, but it resolves some issues that result in programs doing elaborate workarounds to get correct file semantics.
The file system could unduplicate unit files based on their content. One approach would be that the real name of a unit file is its cryptographic hash; any other name it has is an alias. Backup programs can usefully use such information.
The ability to close and commit a group of unit files as an atomic operation would be useful. This should be the last step of an "install", so that if anything goes wrong, the install is automatically backed out, like a database rollback.
This is the default form of file.
This is how UNIX/Linux files ought to work in "append" mode.
Managed files have the most complex semantics, but not many programs use them. The ones that do typically go to a lot of trouble to get the file semantics right, to maintain database integrity. Read what SQLite needs from a file system to get a sense of how managed files need to work.
Scratch files Scratch files do not outlive the process group that created them, and are invisible outside that process group. They can be read and written freely, but do not have permanent existence in the file system. The file system should guarantee that when the process group goes away, so does its scratch files, so as not to clutter up the file system. This would stop the accumulation of abandoned temporary files.
This is how UNIX and Linux should have worked. Today, programs struggle to get those semantics across platforms, but don't always succeed, leaving behind truncated files, partial failed installations, junk files, and database disasters where two programs accessed the same managed file.
As for metadata, the original MacOS "resource fork" concept was a good one. But the original implementation was botched. The resource fork was a badly implemented tree-type database store, one that was corrupted if a program failed to close the file properly. If the resource fork had been implemented so that at the end of each write, the resource fork was guaranteed to be in a usable state, the whole concept might have been more successful. It took a long time for Apple to fix this; the phrase "damaged resource fork" appears tens of thousands of times in Google until 2006.
New abstractions of file are needed, which reflect what users seek to do with their digital data, and which allow engineers to solve the networking, storage and data management problems that ensue when files move from the PC on to the networked world of today, and provide us with another trivial little idea that we can claim as our imaginary property, patent, lock down, control, and squeeze licensing fees out of in anything that implements what we are going to try to spew out.
This space unintentionally left blank.
Would you be opposed to a DRM scheme that would allow you to totally and irrevocably delete a picture you posted to Facebook because it allows you to retain total ownership ?
Yes, because it's a fairy tale. Anybody can take a picture of a screen using a camera.
Is anyone outside m$ really considering letting them define what a freaking file is from now on after this?. But let's not get ahead of ourselves here. Surely once Microsoft has bullied their shit into everything once again, we can all trust them and no one will end up having to pay any kind of extortion racket like this, and this, this, this and this.
I haven't noticed anyone bring this up yet, but I thought this was the main goal of the Nepomuk project? (in KDE)
Well, maybe not do away with files themselves but rather how we store and access them. Rather than digging around in folders, we tag new files when we save them and search based on tags, i.e., what the file is used for or what it contains. I think that's a fantastic idea that needs more time to grow. I hate dealing with folder hierarchies, especially because I often run into the situation where a certain document can properly belong in one of several folders, and then I never know where to keep it so I don't lose it. The ability to throw all my documents in one folder and tag them with as many tags as necessary and then search for what I need (or rather, create "virtual folders" that sort things based on tags no matter their location) is really great step out of that annoyance.
We’ll end up with 10 different standards, and no one will bother keeping metadata accurate on all their files. At best metadata is useful for a single person on a small subset of files where they find it useful. Everything else, the only metadata anyone is going to care about (and be bothered to enter) is title, which is served fairly effectively by the file name.
Metadata becomes a lot more useful as your collection of files grows, and as the UI develops to better take advantage of the information. To a certain degree this has already happened (and it kind of makes me wonder where they've been the last several years) - though of course you don't see it in every program yet.
Often, yes, as you say, title is all you're likely to need - or else the few other pieces of information you need (album name, track or episode number) are easy to incorporate into the file name or directory structure. But there may be cases where a piece of media doesn't have a title. This is often the case with large numbers of photos: there may not be a basis for providing a unique title to each image. So photo management software deals with this by encouraging organization and search via metadata (basic stuff like the date, but also tagged events, locations, and individuals). These files probably have filenames, but they're probably meaningless stuff like IMG_1234.JPG - just a sequence number provided by the camera. If you think about how you'd want file operations to relate to file name in that case, the filename actually doesn't come into consideration.
Consider, for instance, if you're moving one collection of files into another directory. And in that target directory there happens to be another IMG_1234.JPG. Do you overwrite that other file? The traditional answer would be "yes", since files are uniquely identified by their filenames. But the filenames have no particular value in this case: they are purely artificial. In this particular case that's probably not what the user wants. If anything they'd want the files to overwrite only if they're the same file.
There are other cases where there may be a sensible choice for the filename of each file, but it doesn't necessarily make a lot of sense to have it as a unique identifier. For instance, suppose you have a directory full of videos, and each has its title as its filename. Now suppose there's different variations of a few of the files: maybe two different versions of the same movie (a fansub and an official release) - maybe two different encodes (one for best quality, others to play on specific devices - and remember that different encodes could use the same container format, so the difference isn't necessarily ".AVI" vs. ".MKV" or whatever)... You could incorporate that information into the filename or directory structure - but it makes the filenames increasingly artificial, and the directory increasingly burdened with additional directories - different files treated as different items in the directory when in fact there is good reason to treat them as different versions of the same thing.
And then there's other cases where the traditional system of filenames works just fine and change will probably only foul things up - source code directories and so on in which it's very convenient to have a simple way to uniquely identify a particular piece of data, without having to address complicated questions like "but which one?" At most, you'd want something analogous to (or integrated with) a versioning system - you could specify "which one" if you wanted to but most of the time that question would already be answered.
So I think there are problems with the current system of files. There are cases where there is no useful information stored in the filename, and changing the filename to be both unique and informative is a much more cumbersome process than populating the file's metadata with relevant tags. There are cases where multiple "files" may just be different renderings of the same thing, in which case it may not be useful to treat them as separate entities at all, but rather as versions of the same thing. So I think it is worth thinking critically about how we approach the issue in the future.
Bow-ties are cool.
At first glance, I thought they were talking about implementing something like the HURD, where every "file" could potentially be a service behind the scenes. Reading a little further, I thought they were talking about implementing something like the old Macintosh HFS resource forks.
But then I kept reading and realized they were just making some noise about DRM. Nice try, Microsoft; you almost had me going, there.
Secession is the right of all sentient beings.
The problem is that a filesystem has its own way of dealing with types of files.
I saw keep everything as binary, and allow for the tool opening that file (binary) to have a centralized view of dealing with that data.
I have a binary file which could be word, or could be html or could be pdf, but the file knows itself what it is.....this would just require that ALL file systems keep the files unfragmented, which is not possible at this point unless using linux distros.... the fat and ntfs that are 80% or more of the market share, fragment the files so that the OS knows the file info and where it resides.
Why break a file at all, if you do you have no way of knowing what is what....somewhere along the way the file could have recognizable pointer markers to let know the start and end of a file...within the file....so as to allow for a quick data loss recovery as well.....i think etx3 does this, if i remember correctly, but it has been so long i haven't touched linux, i feel almost like a virgin....fragile and vulnerable..... jk.
A file is a file, keep it together , even when you work with it. then rebuilding file structure is easier, as well as being able to keep info inside the file itself becomes doable as well.
The filesystem is a database with a schema that's baked into the operating system or (in the case of savvy operating systems) bolted onto it.
Oracle has made a mint by doing away with files and using bare disks to hold data in schemas their users develop (or buy), but all they're doing is generalizing disk access. They do this because dealing with both the filesystem schema and the user schema is redundant and a waste of time, and time is money to users with megajumbo databases. They also do it because it involves a lot of proprietary middleware they can overcharge for, even though it's pretty simple and even Larry Ellison could implement it.
If Microsoft wants to beef up the filesystem trope beyond directories and inodes and open/close/read/write, then let them. They've cut several versions of the Windows filesystem over the years, there's no reason they can't roll out a new one and let the market vote with its feet.
A file is simply a linear series of data. Period. End of story.
Engineers and interface designers have the ability to determine how these abstractions are implemented at the low level, and how they are presented to the user at the conceptual level. While the abstractions of the past and present are very useful, it is not sensible to assume they will continue to be the best course in the future.
Let's look at an example in kind of the middle ground, between implementation and conceptualization. Suppose you have a file containing plots of some value with respect to time.
Now, each plot, by itself, is very linear by nature. However, the plots together are not. They run in parallel. You could simply concatenate them or interleave them to turn them into a serial file, but the data is not linear by nature. As a result you pay a certain price, when you edit that data, maintaining that serial format. Suppose you want to add data to the end of the time sequence for each plot? If the plots are simply concatenated, you have to shift the contents of the file around each time. If the plots are interleaved, then you introduce a bunch of file seeks when you read the data back out. If you decide each plot should be a separate file, then each is no longer tightly coupled to the series of time values, unless that time sequence is duplicated in each file... And the collection is no longer a "unit" on the filesystem. The data is separated into multiple units, in that case, for reasons that have no relationship with the nature of the data itself or the ways it is intended to be used.
Therefore, I think there is a certain merit to the idea of separating the concept of creating "contiguous, linear" allocations of disk storage from the concept of creating a "unit" in the directory tree. Forked files allow you to shift the problem of expanding or shrinking these allocations to the filesystem layer - arguably a more appropriate place for it than in the application itself.
Bow-ties are cool.
Wouldn't it be possible to make a "universal" file container, in that any other file type could be imbeded with a text file that listed: what type of file it is, what program it is associated with, owner, creation/mod dates, and especially, tags and other types of metadata?
Do you know that they tried this already?
In 1985? (Well, I'm speaking specifically of IFF - but there were other efforts. Mac's file forks were kind of the same sort of thing, except that they maintained the abstraction all the way down to the filesystem layer.)
Now, just because they tried it already and more or less failed doesn't mean it couldn't work... But they were in a much better position in 1985 to make this work than they are now (we've gone too long and come too far without a "universal format", it'd be nearly impossible to get people to embrace that kind of change now...) so I think it's kind of a lost cause.
I found it absolutely fascinating, personally, when I read one of the original documents on IFF. The ambition, the hubris perhaps, with which they were trying to guide the future of personal computing. They weren't just seeking to create "a" format, they were aiming for it to be the format. And it would have been capable of just about everything you suggest - embed a FORM of whatever you like in a LIST, put in descriptive chunks, etc... I believe Amiga embraced the concept to a fairly high degree.
There are various historical and technical reasons why it didn't really pan out. I think one of the big ones is simply that IFF wasn't the right format for everything. Perhaps no one format can be. Among other things, IFF required four-byte payload sizes appear at the start of each chunk. That limits a chunk (and therefore a file) to 4GiB maximum (not such a big deal in 1985 or even 1995... But these days it'd be an unacceptable limitation) - but another problem is that sometimes you need to write out some data and you just don't know how big it's gonna be. Streaming audio and video are a pretty good example. You can discretize the stream, populate it with known-size chunks, but you don't know the size of the whole stream until it ends.
I think general-purpose data formats are a good thing - but I believe it's very important to consider that there may be cases where a particular format just isn't right for the problem. And that brings us back more or less to the current scenario, in which different applications tend to have totally distinct file formats, not even sharing an overall containment structure. From that perspective, it's wasteful to continue re-inventing metadata storage for each new file format that comes along, and wasteful to implement all these different methods of reading metadata out of different application-specific file formats. There's also the danger that we will want to change the format of the data in the metadata fields (just as we shifted from "whatever local variant of ASCII your region uses" to mostly using UTF-8 - which still isn't necessarily adequate for all regions, incidentally) Another all-new text encoding so soon after Unicode's introduction isn't too likely, but the OS, in defining how these metadata fields are defined and used, could change the requirements that go beyond what the container format can provide (for instance, storing data that goes beyond the limit of a particular format's "metadata region" size limit, or storing something that's better encoded in some binary form other than text. Decoupling the encoding of metadata from the definition of file formats eliminates a bunch of redundant work and leaves us more room to change what metadata contains and how those contents are used, as we get a better idea of how, ultimately, it will be used as the dust settles around this whole issue.
Bow-ties are cool.
I just want one thing: a file system that is part database for fast file searches. I don't want to manually build indexes or any other bullshit just look at the file table and give me my fucking file. Even if you had 100,000 files with file names of 256 characters, its only 2.5 MB, how long does that take to parse? Maybe I don't understand file systems but even a 10 MB file table should only take a few seconds to scan. When I do a search of a directory or entire disk with tens of thousands of files it sometimes takes a minute or two. The disk is thrashing away as if the program is looking all over for the file names. Shouldn't they all be in one place pointing to where they are on disk?
We pretty much already have this. The way it works in practice is that there's some service on the machine that provides indexing, maintaining a central database of metadata. When a file changes, the metadata is re-scanned and the index updated. Then you can use the index to search for things.
I know this exists for Linux but I don't know to what extent it's actually supported by applications. (I never use the feature.) On Windows, these days, file manager windows show columns containing metadata fields (unless you turn that off - haven't had much luck so far, actually) and you can't swing a dead cat without hitting a search field.
It's not incorporated at the filesystem level, but for the most part it doesn't really need to be. As long as the feature is there, and you can rely upon it being there, and applications actually take advantage of it and work with it, it's just as good. Well, very nearly.
One thing I think could be improved is that sometimes file names just aren't meaningful. Filenames from a digital camera, for instance, tell me very little - and renaming the files, while keeping the names unique and making them meaningful is not always easy. I could have two different 003.jpeg's in two different directories with completely different contents. If I move the contents of one directory into the other, I don't really want one to overwrite the other, because that would be dumb. That filename is entirely meaningless, the only reason the file even has a name is because the filesystem requires files to have unique names. But that could be addressed at the UI level (and has been, on Windows anyway, which is how you wind up with things like "003 copy copy copy copy.jpeg") so it doesn't necessarily require a change to the underlying file paradigm.
Bow-ties are cool.
When I download something from the web, it goes into ~/Downloads. I don't have to waste time telling the system what it is, I don't have to figure out where the system put it and if I want to see all the files I downloaded I just 'ls ~/Downloads'.
But sooner or later you're probably moving them somewhere, right? The analogous procedure in a "database" filesystem would be to recategorize the item from "uncategorized recently downloaded item" to something that'll be more useful in the long term. One important thing to remember is that right now tagging and categorizing a file is a chore because that's not something a lot of UI is designed to do... and stashing a file in a directory hierarchy is relatively easier because it's something we're used to, and something UIs are currently geared toward. That's also the only reason sticking files in a database could reasonably be called "hiding" them. It's just a question of what the UI is geared toward. (In the case of folks like me, and you apparently, the "UI" in this case would include the command shell and various mechanisms used to address files in other programs as well...)
My personal feeling is that the usefulness of a "database filesystem" would be greatest in (and perhaps limited to) certain domains. Media files, hell yes. Source code... Well, apart from version control I really don't think so. :) But having the hierarchy indexed could certainly be useful ("find me the files that implement this method of this virtual base class - because some knob on the programming team doesn't believe in grouping class definitions in a sensible way"... of course some IDEs already do this independently of any system-wide indexing support)
The thing with media files is that you tend to wind up with a huge collection of 'em. They usually don't need to reference one another the way source files do, they're self-contained and there's not always a good way to name them. Video and music files can usually be named with their title, of course, and it's not hard to put them into a hierarchy, but the hierarchy isn't always useful. If what I want is to play "Doppelganger" - I'm not likely to have to disambiguate a request like that. I can specify the full path from the base directory where I keep all my music, but I could as easily skip that. (And it's not that hard in a shell that supports the ** notation for "find" like searches during globbing...) But if I take a bunch of photos, there may not be any value to giving them filenames at all. It's more useful to organize photos by things like date and tags. "IMG_0286.JPEG" isn't useful in any way, and coming up with unique titles for images that may not even merit a title would get a bit crazy. The filename becomes an afterthought in that case - a necessary "evil" if it's something you have to deal with manually. (If it's dealt with automatically, the filename could still be useful as a unique identifier.)
Bow-ties are cool.
People who say that hierarchical filesystems suck probably have a big mess on their table in real life.
I have never tried organizing my table hierarchically. Tell me, where do I put my cell phone in that case? Do I group it with my desk phone, because it's a phone, or with my computer, because my phone is also a little computer that's plugged into my big computer?
Bow-ties are cool.
If only we had Steve Jobs to solve this problem for us. :-(
Hey, he had his chance. How long was he with Apple after his return? Ten years or something? He ushered in OS X and the shift to Intel - both of them representing fairly extensive breaks with previous Apple products, either of which could have been a great time to tackle something like this...
Apple brought us Spotlight, I guess. It's representative* of the way things have been going: not redefining the filesystem, but building a database of file information and using it for searches.
(* I don't know who pioneered the concept or what implementations came first, so I say only that Spotlight is an example of filesystem indexing.)
Bow-ties are cool.
Well that's actually a derivative work, an unlicensed derivative work.
In some cases, the law permits an unlicensed derivative work. For example, as my country's copyright law puts it: "The fair use of a copyrighted work, for purposes such as criticism, [...] is not an infringement of copyright." Granted, your ad example probably isn't a fair use.
And all this shows why you don't need the filesystem to track metadata, all you have to do is embed it into the file.
Well, my post really wasn't addressing that question at all. My post was about whether using metadata, as opposed to directory structure and filename, to find data was a reasonable sort of UI, or if people's tendencies to be lazy about writing metadata would undermine that too much. My point was that if the system is well-designed around the use of metadata, users will tend to keep their metadata well-ordered, because in that case it's actually useful and easy to do so.
To address the point of whether metadata should be part of the file structure, or adjacent to it - I think there are advantages to each approach. Presently, there's a lot of infrastructure that's just not geared to dealing with metadata that's not stored as part of the file itself. But if you copy an MP3 file, the ID3 tag will be preserved, because it's there in the file structure. So at present that's a definite advantage, and not one to be underestimated.
There are disadvantages to bundling the metadata: for starters, if you have two files with identical data but different metadata, tools like "diff" or "md5" would reflect that difference. Or if you modify a bit of metadata, you're changing the file's modification time as well. That could be undesirable. Suppose you download a file via bittorrent and tag it according to your preferences - you won't be able to seed the torrent from that file, because its checksum will have changed. Or what if you want to tag an HTML file with metadata, or some other file type for which metadata either isn't supported, isn't adequate, or you just plain don't want it there in the file contents? The reason why it's called metadata is because it's data in reference to the primary data of the file... not part of the primary data of the file. I don't claim that this, by itself, is a conclusive argument in favor of filesystem-level metadata but I hope you take my point that there is a logical basis supporting its separation from the primary data stream.
There's also the maintenance issues around supporting each new type of metadata for each new file format as it's introduced - and if, as part of the OS design, you make some decision about metadata or how it's used that doesn't fit well with how it's stored in a particular file format, then resolving that disparity could be a bit of a headache for the implementers as well as anyone who has to use that UI. If you provide metadata as part of the filesystem, its format can be changed to suit the way it's being used, and these changes can be transparent to applications and users.
But I think the bigger issue, and the main thrust of the articles and the main focus of current work in improving utilization of metadata in the UI, has more to do with how metadata is presented to the user, rather than whether it's stored in the file or adjacent to it. The indexing systems in present use can use both approaches: metadata within files for file formats the indexing system is designed to specifically support, and filesystem-level metadata for others. From a pragmatic standpoint that's probably the way to go: use file-level metadata where it's appropriate, use filesystem-level metadata where it's appropriate, and just do your best to resolve disparities as they crop up. It's hard to effect this kind of change, so there are always advantages to an approach that provides a smooth transition.
Bow-ties are cool.
While MS has its research department thinking up old thoughts everyone and his dog has had for the past 20 years, we already have metadata in half a dozen non-MS filesystems, and we have resource forks, extended attributes and user-presentation layers that will happily show the user a directory as the application contained inside because really that's what he cares about.
What we don't have is some of the other interesting ideas we had 40 years ago. Some of them went out rightfully, some of them we simply lost because they were good ideas that weren't ported to our modern operating systems.
So, you want to re-invent the file? How about you come up with one idea that's actually new? Because otherwise it's re-hashing, not re-inventing. :-)
Assorted stuff I do sometimes: Lemuria.org
But you could come up with a million different examples of data, and how they are handled has to be on a application level because only the application knows how to deal with the data.
And there is a reason the files are linear series of data, because that is what HDs are as well.
Not that a file cannot be broken apart into different sections to fit/optimize performance but at the application level they have to be considered linear series of data if only because every programming language of earth is set up to read files linearly.
Troll is not a replacement for I disagree.
It seems to me that by putting metadata into the filesystem, you're creating some big problems with compatibility: different filetypes need different metadata. For instance, a PDF file might have information on author, title, etc. A jpeg file might have EXIF camera settings. Having the filesystem deal with metadata seems like it's pushing this stuff down into the OS, where it really should be left up to apps. Also, what if people decide they want different metadata? Back when jpegs were first made, they didn't include EXIF data, but now they frequently do thanks to the proliferation of digital cameras. Presumably, the standard was modified to allow this. But changing the standard is easy with metadata encapsulated in the file itself; just have a version number in the file heading saying what standard the file conforms to, and apps will read this and interpret the data accordingly. Changing an application or two is a lot easier than patching the whole OS to deal with a change in metadata standards. Or, what about files made by specialized programs that not many people use? Why should OS makers have to deal with metadata standards for every filetype in existence? The way it is now, only people who write apps dealing with a particular filetype have to deal with metadata standards.
Taking the metadata out of the file creates a lot of complexity, without any significant gain that I can see. Your examples of bittorrent files and files with changed metadata not md5-matching others just doesn't seem to be enough of a problem to warrant all these changes. In fact, these problems can be easily fixed by fixing the tools that use these files; BitTorrent, for instance, could be modified so that certain popular filetypes (e.g. video files like avi and mkv) are recognized by the tools and the metadata ignored when creating an md5sum. Modifying a tool that only some people use is a lot easier than modifying an entire OS.
MacOS tried to do this very thing long ago, and finally gave it up. There's probably a good reason for that: the benefits weren't worth the costs.
But you could come up with a million different examples of data, and how they are handled has to be on a application level because only the application knows how to deal with the data.
Well, I take your point, that it doesn't necessarily make sense for the OS to get too heavily involved in what would normally be application-level decisions. However, I think more flexibility in the structure of files could be useful. Give applications the tools and let them decide how to use them. Providing a feature like file forks is awkward because most software doesn't currently deal with it (as you point out) and there's a little bit of a technical challenge in implementing it and a logistical problem in getting it into all the various filesystems (the ones where it's possible to do so, anyway) - but it is not an insurmountable problem, nor is it a slippery slope into ever-increasing complexity, or a path leading to an unavoidable fate of excessive OS involvement in application file storage strategies. It is one useful organizational tool, allowing an application to have multiple "sequential byte range" abstractions within something that's treated as a single unit on the filesystem. The difference in implementation between file forks and directories would be very minor. The major difference would be in how UIs treat the "forked file".
And there is a reason the files are linear series of data, because that is what HDs are as well.
To the extent that this is true now, it is becoming less so over time. Hard disks aren't linear by nature - they have multiple platters, for starters, so they'd be more like multiple contiguous ranges. Then there's firmware in the drives that maps around bad sectors, quietly substituting other areas of the disk. One can certainly still treat the drive as a linear range of storage space, and it's a convenient way of dealing with the disk, but it's an abstraction that we're very quick to abandon... Filesystems, for starters. If you have a directory of files, you don't want to think about that in terms of sequential storage on disk. You want to be able to copy and move and erase and create them and never care about where on the disk they go. The filesystem layer hides the abstraction of the disk as a sequential thing, and then reintroduces it at the file level.
And there's no guarantee that the "sequential" file data even will be "sequential" on-disk. We try to minimize the fragmentation of files, but the "sequential" nature of the file is, again, really just an abstraction. We could exploit that - tell the filesystem to insert a block of disk space into the middle of a file, and the filesystem wouldn't really have to move any data around to perform an insertion - but as far as I know that's not a supported operation at the application level on any OS. So instead we read all the data out of later parts of the file into RAM, then write it back to disk somewhere else - all for an insertion operation that could be handled much better by the filesystem.
Not that a file cannot be broken apart into different sections to fit/optimize performance but at the application level they have to be considered linear series of data if only because every programming language of earth is set up to read files linearly.
Things change.
I mean, I get your point here, too. I wasn't a Mac user back in the day, but from what I've seen interoperability was a bitch because of file forking. And that would still be true today.
But I can't accept "because that's the way it's been for 30 decades" as an argument for why a design choice is good. Things will change in the future. I don't know how, or when, but it's bound to happen. Being a part of that change, rather than being left behind by it, requires openness to new ideas. Even the most fundamental concepts of computing, sooner or later, will be subject to revision.
Bow-ties are cool.
It seems to me that by putting metadata into the filesystem, you're creating some big problems with compatibility: different filetypes need different metadata. For instance, a PDF file might have information on author, title, etc. A jpeg file might have EXIF camera settings. Having the filesystem deal with metadata seems like it's pushing this stuff down into the OS, where it really should be left up to apps.
But what we're dealing with now is metadata very much as an OS-level concept. I think right now the implementations have a bit of a "strapped-on" feel, but it's going to become more and more central to how OS UI works.
Also, what if people decide they want different metadata? Back when jpegs were first made, they didn't include EXIF data, but now they frequently do thanks to the proliferation of digital cameras. Presumably, the standard was modified to allow this. But changing the standard is easy with metadata encapsulated in the file itself; just have a version number in the file heading saying what standard the file conforms to, and apps will read this and interpret the data accordingly. Changing an application or two is a lot easier than patching the whole OS to deal with a change in metadata standards.
It really isn't. At least if you're patching the OS, you can just make the change once, instead of again and again for every application that uses the file type. How many applications work with video files, or images?
OS-level metadata also tends to be very flexible. xattr support on Linux, for instance, lets you store name/value pairs with whatever name/value you want. There are limitations (I think a maximum value size limit measured in kilobytes, at least on some filesystems - so it's not like a full "file fork" implementation at this point) - so pretty much, if you want to add a new field, you just add a new field. The same is actually true of most forms of metadata stored within file contents as well - at least the modern ones. I think if a metadata system doesn't have the approximate flexibility of XML then it's pretty much rejected. :)
Or, what about files made by specialized programs that not many people use? Why should OS makers have to deal with metadata standards for every filetype in existence? The way it is now, only people who write apps dealing with a particular filetype have to deal with metadata standards.
That is not the way it is now. Desktop indexing (present in Windows, OS X, and at least optionally in Linux) monitors the filesystem, re-scanning the in-file metadata when a file is modified, so it can build a central database for quick searches. So the indexing system needs to know how to read these different file types.
Taking the metadata out of the file creates a lot of complexity, without any significant gain that I can see. Your examples of bittorrent files and files with changed metadata not md5-matching others just doesn't seem to be enough of a problem to warrant all these changes. In fact, these problems can be easily fixed by fixing the tools that use these files; BitTorrent, for instance, could be modified so that certain popular filetypes (e.g. video files like avi and mkv) are recognized by the tools and the metadata ignored when creating an md5sum. Modifying a tool that only some people use is a lot easier than modifying an entire OS.
I wouldn't exactly call that "easy", personally... :) Maybe you're right and my examples could be better. But it addresses a general issue that metadata is not conceptually part of the file contents - that's the whole point of metadata. Like the filename, it's just there to tell you what's in the file. If you change the filename or date stamp, it doesn't affect the file's contents. So by the same logic I'd say search tags and so on shouldn't be part of file contents either.
Concepts don't always m
Bow-ties are cool.
I use my files to sharpen things - knives, mower blades, machete, hoes, spades, etc..
Simple, huh?
Actually I think it creates complexity for the OS and filesystem, but simplicity for the application. Apps don't need to worry about anything other than the contents of the file, and knowing how to structure the data appropriately.
Nah, they didn't give it up. HFS+ still uses it. What they did do is start relying on filename extensions to be more compatible with other platforms in a networked environment. Too much for my taste, actually. Filename extensions are a convenience, but they're also a relic that should have been ditched in 1984 (and was, by Apple). There is absolutely no reason the filesystem should have to rely on the filename to identify the file format.
They never were used by Unix/Linux, to my knowledge. The "file" command will quickly tell you what kind of file you're dealing with, regardless of its name or extension.
True, they've never held any special meaning to the filesystem/OS on Unix/Linux, but they have been in common usage for decades.
How many times have you found an old program distributed as a .tar.gz ?
Not this shit again :-(
Terminals, network PCs, cloud, whatever.
If companies want to put their stuff out "there," feel free.
I want my grubby hand on my files. AND I use whiteout on them. (PITA take the drives in and out of the enclosure all the time, though)
Vote monkeys into Congress. They are cheaper and more trustworthy.
The big problem with files is that they get disconnected from context far too easily, especially when you share them with others. This realization is why I want to build the inter-tubes protocol. It syncs up collections of files, deals with permissions, and makes a set of services available to get thumbnails of photos, etc
My use case goes like this:
I have 330,000 photos I've taken in the last 14 years. I'd like to share them. The current choices are
What I'd like to do instead is to give them a small file which contains permissions to access my tube containing my photos. It would be a very small file, with just a few cryptographic signatures, probably less than 20k. However, this would then allow the user to list all of my photos, and use the thumbnail service associated with it to pull across thumbnails of things (instead of the full size images).
If they then find a file they like, the can get the full size version. If they then add tags or comments to the file, those would get synced back to me via the tubes.
What do you all think of this idea?
Forget the Ribbon, the other disaster from Office 2007 was the 'glorious basterd' new file names, docx xlsx and the others. But of course 'file extensions are too hard for users' so those differences get hidden. One of my 'mission critical' programs from work FINALLY added support for those filenames ... *this past April*.
Are you minimum wage IT?
I'd HOPE you realise that renaming 'something.docx' into 'something.doc' isn't going to allow you to magically open it in Word 2003. DOCX is a ZIP file with XML files inside of it. DOC is a binary, legacy clusterfuck of OLE garbage; Microsoft themselves have trouble maintaining compatibility with their files between versions, at least DOCX makes it easier to import that crap into LibreOffice.
It is like citing hearsay. I love wiki for learning new things but I find it ridiculous to cite a source that is not peer-reviewed.
If metadata would be part and parcel of files on proper GNU/Linux filesystems, it would be so very much easier to find and browse your stuff. Now all we have is folders. And that's making the files dumb, like actual silly pieces of paper that can only be put into stupid folders to avoid a mess. But files are not clumsy physical objects but shining ideas.
Instead, give me all files related to Sarah on my harddisk, please. Now give me all indecent pictures of her. Now give me her pictures in 2007. Or give me all FLACs longer than 3 minutes without vocals. Give me all videos shot in (tagged with) Norway and featuring James... The possibilities to sort files are endless, having to only come up with one singular ontology (your strict directory tree structure) and use it for all your stuff all the time in absolute insane. Nobody cares where a file is, everybody just wants to find it every now and then.
Metadata adds massively value. Without it, a van Gogh is just an old painting.
There has to be an easy interface for searching for a specific file or for a specific theme when you need it. And there has to be a sensible browsing mode when you don't know what you're looking for or are trying to figure out what all there is. Now making smart searches is absolutely impossible. You're bound to miss some stuff and get embarrassing false positives.
Of course, you will have to input the metadata for it to be used but that could be a semi-automatic process. Just let us descend finally from the goddamn directory tree to the solid ground of smart metadata.
And Micro$oft and other cloudy rip off artists can go fuck themselves.
Or, what about files made by specialized programs that not many people use? Why should OS makers have to deal with metadata standards for every filetype in existence? The way it is now, only people who write apps dealing with a particular filetype have to deal with metadata standards.
That is not the way it is now. Desktop indexing (present in Windows, OS X, and at least optionally in Linux) monitors the filesystem, re-scanning the in-file metadata when a file is modified, so it can build a central database for quick searches. So the indexing system needs to know how to read these different file types.
Very dishonest argument.
Desktop search engines have it very easy. Just consider the possibility - they can simply run /usr/bin/strings on the general area of the file where metadata is likely to be found and index the resulting data. In any order whatsoever, without making a distinction between "comment" like metadata (e.g. John's photo) and actionable metadata (e.g. photo taken when camera is at an angle of 47 degrees from the vertical). Even doing so can make a very good desktop search engine. And not supporting specialized file formats is very much a possibility.
This is much much less difficult than the program which has to make sense out of the data. To show the image appropriately rotated according to a piece of actionable metadata for instance. And not supporting specialized file formats is not an option - the particular program is for that specialized file format.
There is no comparison. At all. Especially when the GP already talked about specialized file formats, "that not many people use".
So yes, it is true to a great extent that "only people who write apps dealing with a particular filetype have to really deal with the nitty-gritty of metadata standards."; my alterations in italics.
Bingo Dictionary - Pragmatist, n. A myopic idealist.
Kinda Snarky there AC. Sure I know you can't just rename the file, you have to Save-As back to the older version. But I'd send a contract to a colleague in 2003 and it would come back in 2007 that I'd have to backport again.
--Tao
This way you can make sure you do not copy your file to people Microsoft or the government does not want you to copy files to.
The file-system will check and see if the file that is being copied to it is allowed to be copied to it. And both file-systems check whether upon copy completion if the file in the original and/or source and/or destination storage device fs must become uncopyable, or whether it should be deleted after having been copied, even if you meant only to copy it.
In show DRM for every file ever created....... With other words the back door for the DRM no one wants and the MPAA wants everyone to have/use/abide-by.
That all files will end up in some form of container in which the metadata is embedded. There's probably already a patent for a similar system currently in use and the very idea has RIAA and MPAA people drooling all over themselves in anticipation of finally owning everything in the box.
The new right fascists are bilingual. They speak English and Bullshit.
Or, what about files made by specialized programs that not many people use? Why should OS makers have to deal with metadata standards for every filetype in existence? The way it is now, only people who write apps dealing with a particular filetype have to deal with metadata standards.
That is not the way it is now. Desktop indexing (present in Windows, OS X, and at least optionally in Linux) monitors the filesystem, re-scanning the in-file metadata when a file is modified, so it can build a central database for quick searches. So the indexing system needs to know how to read these different file types.
Very dishonest argument.
Desktop search engines have it very easy. Just consider the possibility - they can simply run /usr/bin/strings on the general area of the file where metadata is likely to be found and index the resulting data.
Great, so the table's gonna have a lot of entries for "JFIF". :) And a lot of good "strings" is gonna do on compressed source data...
But in fact, this isn't what current desktop search engines do: they recognize known file type and process them specifically, so they know which ID3 field is the artist and which is the title, etc.
Bow-ties are cool.
It's called a Dynamic Database:
http://c2.com/cgi/wiki?DynamicRelational
http://c2.com/cgi/wiki?MultiParadigmDatabase
"parent=rowID" references would make it hierarchical; or put another way, provide a hierarchical view.
they recognize known file type
Yes, for known. Whereas we are talking about "specialized programs that not many people use". Not likely to be known. And not a single desktop search engine "knows" about all file types.
For such types of files, the heuristic I suggested is still the best after a few filterings. And like I also said, for desktop search it is an easy possibility to ignore rare file types. But not for a program that's purpose is to read those rare file types.
And of course other arguments of mine that you didn't address.
Bingo Dictionary - Pragmatist, n. A myopic idealist.