WinFS' Spot on Back Burner Nothing New
osViews.com writes "Charles Arthur of Independant.co.uk has an interesting editorial which analyzes Microsoft's recently postponed 'WinFS,' the file system that Microsoft had been planning to implement in Longhorn. His editorial reminds us that this technology, previously referred to as the 'NT Object Filing System' was intended for a previous version of one of Microsoft's operating system's code named 'Cairo.' Microsoft first spoke of the 'NT Object Filing System' in 1992 and scheduled a beta release in 1996 and then a full release in 1997. But limitations cause it to continue being delayed."
Yeah, something like Tivo. Once you get it working and get used to it, you would feel like losing one hand without it.
Just my 2c
If programs would be read like poetry, most programmers would be Vogons.
Is it possible that NTFS's meta-data was the first foray into actually implementing this? Like WinFS, they might have started wanting to categorise everything and link it together but settled on GUIs to change ID3 tags, as well as other meta-data (like Word), planning to implement the search engine and filesystem service layer (WinFS) later on.
However, NTFS works fine as it is. Like the parent, I too question the need for WinFS when some of its features have been implemented over several iterations of Windows. Perhaps that's why they dropped it.
From what I know of WinFS, it really won't be all that important anyway. It is supposed to provide a way for all files to be treated the same by the OS (roughly) right? Thus making it easier for users to search, browse, or otherwise find these files?
Well, I don't know all of the juicy details of WinFS but I have played with the new Longhorn build. The search tool that is in the Alpha release (MSDN) is much improved over the current WinXP search. It was pretty cool, although some of it can be chalked up to eye candy. It still had a certain ease of use to it.
I doubt WinFS will ever be complete, personally. But I am sure some of the innovation and development benefits will still reach us as consumers. I know where I work, we spend time doing things the customers will never see. But they will still reap many of the benefits.
I'm working on a object file system right now, and it's really not easy.
It's a simple concept:
Store on a standard journaled b-tree (or similar) filesystem the binary data, and store in a database all sorts of meta-information about the data. Also if you want, store a reverse index of the textual info and maybe another 'index' of image features if it's an image.
Then if you want to get anything, no need to go through the filesystem's tree, you can hit the DB indexes and get info instantly.
The real problem is keeping all of this in synch, with almost flawless atomic operations. (of course it's pretty much impossible to be flawlessly atomic, but one should come as close as the current journaled filesystems are).
So if you're using 2 components, let's say, a filesystem and a SQL database, then you need to open a SQL transaction, do your inserts/updates/deletes, then do the filesystem operation, then do the SQL transaction commit. If anything fails, you can revert the SQL modifications and everything goes back to normal. But if the filesystem has problems, then you can't keep the damn DB synchronized, and at some point you'll have to resynch both.
On 100k files, no problem. On 200MM files (what I'm aiming for), you're pretty much screwed. Then you have to start thinking of a self-healing system with a constantly-running checker that must ensure that it's very resource-efficient, etc...
It's just a huge problem. Supposedly Apple is solving this by Q1 2005, but I wouldn't be surprised if we see a massive increase in filesystem corruption bugs for a while on OS X (unless the DB indexing piece is just that, an indexer that runs x times a day and isn't atomically joined to the filesystem operations).
And that's why it's taking so long. Accessing filesystems as SQL data has always been a dream of anyone who has had many files. They just never knew about it.
WinFS is the 'real' solution IMO to all things like iTunes playlist managers, and expensive Content Management Systems yadi yada.
Sure, no consumer is expected to actually use SQL statements, but that doesn't mean that user mode programs should *implement* SQL features. User mode programs should only be the 'translation' layer between the user's point and click GUI, and the OS' internal implementation of the db. Surely, anyone can see that collecting meta data from the file system, and duplicating it in usermode so that you can have search capabilities on it is wasteful.
This article wasn't news to me, I've actually been waiting for this damn WinFS since just about 1996... And by god, is it ever turning into Duke Nukem Forever, but you know what, it's such a cool feature that I still can't wait for it to come out... (figuratively speaking)
Let's put this in perspective. In '92 MS was looking at the Sybase source code and thinking about building a new filesystem around a database engine. Chicago AKA Win95 was almost out the door and it seemed reasonable to shoehorn this into Cairo (NT4). They were absolutely the dominant and fastest growing player.
I commented to a collegue in '93 (paraphrasing Robert Heinlein) that I did business with MS for the same reason I obeyed Newton's laws.
What happened around 1995? The internet became a commercial entity. Suddenly, MS needed to provide new applications (like IIS, IE, Outlook Express, an SMTP aware Exchange server, etc.) not just dork with cool OS technologies. A few years later, they are comfortable again after playing catch-up and start thinking about filesystems again, this time in "Longhorn". Again, they started talking about the capability two OS releases into the future.
However, this isn't a feature that is going to drive sales. MS needs to keep developers of home and office apps happy so they develop yet another new graphics system to replace DirectX. The perception of Windows security has never been lower and is starting to affect sales. IIS is losing ground again to Apache/Linux.
It's time to focus on revenue streams again and the revolutionary, expensive, difficult-to-build features get axed. It's probably not a bad idea. Think about the problems they've had with MS-SQL and ask yourself if you want a similar technology built into every teenager's game and grandmother's email box.
If someone was to create a basic version of WinFS... good enough for the average Joe to think that they might not need the fancy features of WinFS, and was to make it readily available (open sourced or at least really cheep) and easily installed people might be even LESS inclined to upgrade to Win2008 or when-ever MS releases this technology. Add on top of this an easy integration of search technologies for OS X and Linux (desktop) computers and WinFS might be less desirable.
Yeah. They also mentioned vaporware's early Atari history. It was really Atari that brought vaporware to the masses.
Anybody remember the Graduate keyboard for the 2600? How about the Mindlink?
The Atari 2700 with ergonomic wireless joysticks was ready for production then was killed. Let's see...what else? The 7800 keyboard was fully developed then killed. An advanced "Amy" soundchip for the 8-bit computers....yep! Oh yeah and then there was one of my favorites. They had an expansion cage ready to go that would let you add cards to the XL line of machines just like the Apple II. Come to think it, it was only a few odd ball third party devs that made use of the "Parallel Bus Interface" that Atari promised that soooo many nifty things were going to connect to.
Yeah, Atari got me salivating a few times back in the day before I finally learned my lesson.
IIRC "NT Object Filing System != WinFS"
WinFS is supposed to be based on SQL Server, when NTOFS was announced, MicroSoft hadn't yet acquired SQL Server.
I thought NTOFS was what morphed into the fast-find thingie that shipped with Office.
I don't need no instructions to know how to rock!!!!
Until the system can extract reasonable and meaningful metadata about the contents of files and documents, then it will still always be up to the user to do the bulk of this.
For better or worse, the most (only) "meta" tag 99.5% people use is the file name. Word has a very flexible metadata ability, but it is never used. It was turned on by default in Word 97, but probably quickly turned off by all users. Same for Excel, PPT, etc. This is one of the things that the Office Search (Find Office Files quickly!) keeps indexed. But of course, that is usually quickly turned off, also, because no one really uses it.
It's a user-space problem, not a system problem.
They should just instead include 'awk' or 'perl' (something a little more sophisticated than 'find -i') on the system, with some sort of natural language-to-regexp converter front-end for them.
No, "My Documents" is the windows quasi-equivalent to $HOME/docs.
Too bad MS needed to assume that computers only have one hard drive with one partition. While it is possible to hack one's registry to sort of do things the way Unix/Linux does, including creating symbolic links to different partitions rather invisibly, they just do what they can to make this a non-feature, unfortunately.
NT/XP system mamangement would be SOOOOOO much easier if the OS could be protected on a single partition, and ALL applications and their libraries be stored on another partition. Why? Since you probably can't move the Registry, a corrupt registry would, in an ideal world, simply necessitate recovering the Windows partition, either from backup or reinstallation. Then, an installation log on the applications partition could be consulted, which would effectively reinstall applications w/o having to find the original media, etc. Your documents (My Documents), spreadsheets, etc., would be stored on yet another partition (actually, C:\winnt\profiles), with the possibility of migrating user directories as needed, just like a Linux/Unix system.
But, no. $50 billion in cash, and arguably a good chunk of most of the best programmers and tech writers (PPT slide show developers...) in the world, it's not sexy or cool enough. Instead, one of the uber-geeks there comes up with something else, and gets SteveB and BillG to buy off on it.
Heck, they could probably even buy iFS from Oracle for some chump change and make 2006.
But iFS sort of has been a big flop, hasn't it, and it's NIH anyways.
Exchange4Linux does the same thing, but with Python and PostgreSQL. At first I hated the idea but man it does work remarkably well.
this article is remarkably similar in many respects to the recent one of Joe Barr at linuxworld. But he makes a more linuxy point -- linux cannot/should not compete against the non-existent figment of microsoft's imagination.
One correction - filesystems (at least most UNIX filesystems) are not constrained to tree structure; the leaf nodes may have any number of parents, i.e. a file may be in any number of directories simultaneously. (Use the "ln" command). And using ln -s you can practically place a directory in any number of parent directories.
I use this to organize my music collection alphabetically by artist, by genre, and by the date I got the music simultaneously. (I tend to be most interested in music I got recently, because I'm not tired of it yet).
I know people tend to organize files and directories in a tree structure anyways. If you ask me that's because people are happy to maintain the analogy of a physical item that can only be in one place at a time - so what does that mean for WinFS?
Every filesystem is a database at heart. They already contain other attributes like permissions, create and modify date etc. The place to store this stuff is in the FS because the database is already there. All you need to do is add some more stuff like extended description, a few topic reference fields, and and slap of a query engine on it. The query engine does not need to be real complex either. You can get away with little or no formating/sorting/grouping support as the user space app which performs the query should take care of that. All you need is basic bool logic and string comparision. Most of this code already exists out there under a free license, I am not saying it would be a copy past job but there are examples of required algorithms which developers can look at safely, without running afowl of and IP.
The one tough thing WINFS aims to do that would be simple in user space is it hopes to be able to look in files and gleen some atributes form them. This is great if you can hook into some of the libraries form office or adobe et al, it saves you from having to implement parseing for all that stuff. I am not quite sure how you solve that one at the FS level. I just fear a user space system will get real crufty real fast and break when major changes occur to the files and their real attribes on disk that the DB can't know about. Like if a mount point gets moved or everything is resotored form a tarball and the dates get changed/permissions change a little because someone was careless. I think overall getting the neccecary info form the user when new files are created would be a fair compromise, the only issues is rule one of DATA "crap in crap out".
Then there are all the problems that you mostly have to deal with wether you do it in the FS or as some user space hack/bloatware thing:
Note that file creation would constitute just that you would want/need for efficency archives to contain all that info for the file in them, so the user does not have to enter it. Makefiles and the like would have to be update to do magic and fill in that data for the output files. Then you naturally have to fix all the gui tool kits so their fileIO dialogs support that info, any apps with custom dialogs will need to be patched as will console apps. Some sort of default values would be need for apps that just can't resonably support collecting that info as well. I don't want to have to fill in values everytime I "cat" somethig, I mean to unlink moments later.
I think its clear there are lots of differcult usability problems to solve. Some could probably extend and of the major OSS filesystems to include some extra attributes and add a crude query system, its all a question of what do you really do with it once you have it. I am sure R&D at Microsoft is just as perplexed on that point as I am. I feel sory for them since the marketing dept has been pushing this as the next big thing for almost a decade now, the pressure must be intense.
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
Except it is relevant because Reiser4 has metadata built-in. WinFS is supposed to be built on top of NTFS but its (NTFS+WinFS) purpose is similar to that of Reiser4.
Time makes more converts than reason
Except it is relevant because Reiser4 has metadata built-in. WinFS is supposed to be built on top of NTFS but its (NTFS+WinFS) purpose is similar to that of Reiser4.
NTFS has always had metadata built in. That's not what WinFS provides.
Coming soon - pyrogyra
The key here is this: I am not at all interested in a system that fundamentally assumes I am stupid. I will be utterly devoted to a system that fundamentally assumes I am lazy.
WinFS and masses of metadata assumes the stupid and not the lazy. The reason I don't want to have complicated trees of directories is that i am too damn lazy to do so and maintain it. Requiring me to add masses of metadata instead of a directory heirarchy does not address the problem: I am lazy!
Such a system will work well for limited uses - anything that has self populating metadata (such as music collections where files will either come with suitable metadata attached, or if I rip a CD I'll automatically attach suitable metadata via FreeDB or what have you. Similarly for a certain amount of video etc.
Such a system will work passingly well when you have a reasonable amount of attached metadata automatically, for instance email.
It won't work well for general user created documents and the like.
In the end a lot of data is purely user created - from speadsheets and letters to photos downloaded off digital cameras.
Find a way for me to be lazy and still have quick and easy access to all of those, and then you'll have my interest.
Jedidiah.
Craft Beer Programming T-shirts
I don't know about you, but the features in WinFS (or the proposed features, at least) can't "manage music" any better than my current setup.
e ltic ... and then, within each of those directories, I've got these directories of band names; within them, directories of albums. How could WinFS provide anything above the current Windows file manager with its various views (tree, flat with previews, etc.) which could possibly suppliment this method of organization?
Root music dir: "music" - I know, that's pretty counter-intuitive.
Under that:
hardcore
emo
punk
techno
jazz
80's
c
rock
orchestral
themes
If people aren't going to bother organizing things into directories or giving intelligent file names to their data, they're sure as hell not going to bother with meta data. Unless WinFS has a full slew of data identification algorithms and a massive database of known matches, there's nothing that WinFS could offer here.
Organization is a mechanism we employ to help us find things. WinFS can not add to this ability, but simply provide a different mechanism for organization. I don't see it helping much, however, as humans have been grouping things into categories and sub categories since the beginning of time, and that is how we think. Unless we're talking about an obscene amount of data, where this would be a "poor man's database", I can't see any practical use for anyone with half a brain.
It would be better for MS to actually enforce the use of "My Documents".
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
You just illustrated my point.
/music/rock & alternative
/music/rock & alternative/G&R
/genre/artist/album/song.mp3
/personal /friends /relatives /work-related /corporate bullshit /boss is a jerk /the chick in the mail-room is hot ...etc...
If you have your music sorted by genre's the catch is to keep the folders general.
ie
From their there is no need to sub-classify the different types of Rock.
The key to this is Sane sorting. It is easy to over-classify your information. Any song should never be more then 4 levels deep.
This is an overly specific example. The same applies for any user-created files on the OS. (I think I heard the same rule-of-thumb about web-pages too).
Email outta whack, organize it.
mail -
/
/
/
/
/
This way all you eamil falls under 1 of 2 catagories. Just like a good Db design
I have been fighting with iTunes because it has forced me into using my files this way, and I hate it. But I am trying to learn. I am trying to like it.
It's just like linux for me. I wanna love it & use it, it's just so much work.
"The price good men pay for indifference to public affairs is to be ruled by evil men." ~Plato (427-347 BC)
Theoretically, you can download BeOS for free now (AFAIK, the link is broken though).
Otherwise, you can take a look at Mac OS X.4 when it comes out next spring (or grab the beta now).
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
iTunes is an excellent example of this. *(disclaimer: if all your ID3 tags are complete & accurate)*
iTunes allows you to search, play, and arrange your music very quickly.
iTunes is a great example since aside from using meta-tags, it also organizes music in Artist/Album directory. As long as you know the what CD the song is from, you can locate the file manually from Finder in a few mouse clicks. If not, then you resort to Cmd-F to do a search or use iTunes then Cmd-R it to get the file. It's up to the user. Flexibility is good.
What I don't want is one way only. I don't want to have to type search keywords or else sort through bunch of files to find what I want.
Now, in Microsoft defense, *shudder*, they do have to keep user files apart from the system files. In any multi-user environment, user files should be kept in user directories. You can't really avoid ~/ in front of Music or Pictures.
Commercial attempts at object-oriented filesystems predate Microsoft's concepts. They include not just Pink and Copland, but also Sun's Spring a research project always just short of product. All of these efforts suffered from being dead slow. So again we see attributions of an original idea to Microsoft that if one has a long enough memory has precursors.
It is also interesting to note that Jim Allchin was in charge of Cairo at Microsoft way back when. The curiousity is whether there have been real optimizations in speed for use in WinFS beyond those supplied by Moore's Law. Perhaps not enough to justify its immediate deployment.
Having a single My Documents folder does make one thing a lot easier though, and that's backup.
My "My Documents" folder contains basically my entire life--papers dating back through high school, address book and email archive, all my pictures, music, save games, application backups, drivers & other updates, backup of my Palm files, scans of important life documents (birth certificate, etc.) All neatly sorted of course.
I sync my My Documents folder between my desktop, laptop, and an external drive, plus occasional off-site DVD or CD backups of really important stuff. I take great comfort knowing that in the event of a major disaster, I can just grab my laptop and run, without worrying about what important data might be left behind.
I don't see it like this. File metadata andcontent are already tracked by journaling filesystems like XFS and Reiser and they do not result in "one blob of unmaintainable code." If implemented properly (and I am not proposing that WinFS would or will be), neither should this.
Again, I don't see how this is "inseparable" and from what. You should be able to specify when you create a filesystem what type of optional data you want indexed, if any. One good change that should result from this type of filesystem is making static directory trees obsolete; and defining a better, more intuitive interface to a filesystem.
These are valid arguments for today but may not hold up for future. There are a lot of features implemented in most layers of software today that would have been a big waste of resources just a decade ago.
I wrote the first design document for ReiserFS in 1984....
The nice thing about being slow in solving a hard problem is that others are also slow....
Just to use your example for one type of problem that NTFS/FAT16/FAT32 users have just now (although there are several types of problems if you think about it for a while).
You have some mp3s for a band called "Green Day" Do they go under emo, punk, rock (or even 'pop'). You may have strong feelings one way or the other as to which category they fall under, and therefore be able to save these files in one place and find them again at a later time. But will other people who use your computer/network have the same feelings about what kind of music green day play? How will other users now find those Green Day mp3s if they dont know which directory to look under?
This is at the heart of the arguments behind metadata and multiple inheritance, The reiserFS home page has lots of good information on the issues involved with file systems
When MS Access 1.0 was launched, MS's Access team said MS's ultimate vision was to have everything in the system relationally stored - which makes sense, see stuff like Gnome Storage.
...I forget) are poor implementations of SQL, and SQL isn't relational at all. SQL is a misimplementation of a few of the relational ideas carrying severe arbitrary limitations.
Problem is, MS Access (and MS SQL Server, and their engines Jet and...
Most probably MS will never come to push this until they get the relational theory right. But with the MS Access and MS SQL Server pushing the party line of 'SQL is relational, but objects are better', they most probably will never get there.
Perhaps Gnome Storage has a better chance, because PostgreSQL is such a nimble system. But it still is SQL. Rel looks like being a potentially conceptually better solution as far as the data language side goes, but it still needs a huge amount of work on the storage engine side.
Leandro Guimarães Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
Fact-index.com not only puts up Wikipedia content with Google ads, it's actually started making substantial financial donations to Wikipedia!
http://rocknerd.co.uk