Slashdot Mirror


Tux2: The Filesystem That Would Be King

To all but a handful of a handful, a perfect filesystem would be boring: files would be where you expect them to be, corrupt files would never be an issue, and a power outage would result in nothing more than a few moments of darkness rather than minutes or hours of lost work. Even so -- perhaps a clue that the perfect example is far, far away -- the news on the Linux filesystem front is pretty exciting of late. In a low-key technical session Friday morning at ALS, kernel hacker Daniel Philips announced to the world the minor revolution he's planning -- which could end up replacing Linux's old standby ext2fs (and it's coming replacement ext3fs) with his Tux2 filesystem. Though Tux2 is an ext2 cousin in many ways, it carries at least one crucial improvement: according to Philips, you can literally pull the plug on a system running Tux2 and expect not to lose files or spend minutes watching fsck crawl by.

Phillips, an expatriate Canadian now employed by Berlin-based Innominate AG, claims 25 years of computer programming experience. He's had stints in everything from database design and game programming to embedded controller system development, and in a dual life which may sound familiar to many computer programmers, Phillips worked through music school by hacking Fortran code. With that background, perhaps it's unsurprising that just a few years after first encountering Linux, and a year from joining the ranks of the kernel hackers (there's a +5 informative thread in Zack Brown's excellent Kernel Traffic), he's come up with what could be a sea change in Linux filesystems.

A Filesystem You Can Live With And Pull The Plug On The central point of a journaling file system is that in exhange for a small hit in performance, file integrity is assured by an ingenious mechanism: rather than being written directly (and riskily), filesystem changes are instead first recorded sequentially in a running list -- the journal -- the contents of which are then acted upon in turn. If the system should crash for any reason while a change is not yet accomplished, the recovery time upon reboot is greatly abbreviated, as long as this "edit decision list" remains intact. Journaling file systems are on the way from multiple projects, and rather than being theoretical, wouldn't-it-be-nice daydreaming, at least one is availble right now: the ReiserFS developed by Hans Reiser is even an option at install on some recent Linux distributions.

Why another, then? Wrong question: Tux2 is not a journaling filesystem. Phillips says that Tux2 offers Linux users the chief advantage of a journaling filesystem (namely, keeping files safe in the event of a system crash) but without a journal, and does so more efficiently.

"The big deal is when you compare it to journaling, which is a popular solution, and you see that it's just plain writing less blocks. That's a big savings. It's also not constantly going back to wherever the journal is on the disk to write to the journal, so there's a lot less seeking involved. So those two things together means that it should significantly outperform journaling." Perhaps more importantly, Tux2 is not actually a wholly new filesystem per se; it shares so much in common with ext2 that it is built as a patch to ext2, with the filesystem converted at runtime. How does Tux2 get around keeping a journal to do the things that a journaling filesystem does? Atomic updates are the key. (See also: soft updates) Instead of a journal, Tux2 uses what Phillips terms a "Phase Tree algorithm."

"I originally called it Tree Phase," he says, "and then Alan Cox mentioned it on the Linux kernel list. He called it Phase Tree on the Linux kernel list, and I decided I liked that better." The Phase Tree algorithm is simple at heart, but takes a little while to grasp -- at least it did for me. Happily, Phillips has written a lucid tutorial on his own site. Probably the best explanation is the one found on Phillips' project site: the exerpts which I found most illuminating are these:

All accesses to filesystem data are performed by descending through a filesystem tree starting at its metaroot.

Normally, three filesystem trees exist simultaneously, each with its own metaroot. One is recorded on disk with a complete, consistent tree descending from it. A consistent second tree, the 'recording' tree, in the process of being recorded to disk, descends from a metaroot in memory, and some of its blocks are in dirty buffers. A third tree, the 'branching' tree is in the process of being accessed and updated by filesystem operations, also with its metaroot in memory. The branching tree and is not required to be internally consistent at all times. In particular, some blocks that are free in the branching tree may not be marked as free in its block allocation maps but held on a 'deferred free' ('defree') list instead.

At some point the recording tree will be fully recorded on disk and its metaroot can be written to disk so that it replaces the metaroot of the recorded tree. This causes the filesystem to move atomically between states, as desired. At this point, the recording tree becomes the recorded tree, and the branching trees metaroot is copied to become the new recording tree. This event is called a 'phase transistion' and the interval between two such events is called a 'phase'.

"The problem is, it's not nice to block filesystem transactions. If you're using a KDE desktop or similar, you find your desktop moving in a very jerky way while the blocks are getting written -- no good. That's why we make another tree by copying the metaroot -- that's how we always start, we never start one by going up the tree -- meanwhile this second tree is undisturbed by that and can be written to the disk in peace."

This additional copy allows the user to work without noticing a system slowdown, while the intermediate branch is copied. Thus, there are always three "trees," and in the event of a system crash, recreating the system's correct state is as easy as identifying the latest succesfully written tree. "Each new tree is always incremented higher, so this is easy," Phillips says.

"There are a couple of other places where [Phase Tree] is obviously better than journaling. For instance, removeable media -- your removeable media is usually slowest, and you don't put a journal on it, because if you did, it would be really, really slow. So you put the journal on your hard disk, and the data on the removeable media. As soon as you pull your removeable media out, you have instant corruption, because you've removed yourself from your backup. Phase Tree doesn't do that -- you can just pull out your removeable media and you have something current up to the last tenth of second, quarter of a second."

Sleepless nights and database integrity Phillips' work with Phase Trees began a decade ago, when he implemented a system with similar functionality for a specialized database called Nirvana which he had developed on his own. "I would have implemented this on a Unix filesystem at the time as well, except I didn't have one available."

Was there a Eureka moment in 1989? "Oh yeah. I dimly recall having a a week of sleepless nights, tossing and turning, trying to figure out if it was even possible to do something abot this, and eventually convinced myself that it was. And as I recall, it was quite tricky to get it to a hundred percent state, not 99.99. I could smell the idea in there, but I couldn't find it's actual realizaton for some time. After that, the generalization of its application to a general file system is pretty obvious."

Still, the idea stayed with him until he realized it would be an interesting way to improve the performance of Linux systems.

Like the puzzle with square pieces sliding around a single missing square, only scant disk resources are used to accomplish the extra data's movement because the information is moved incrementally -- in blocks rather than all at once. That means, says Phillips, that "It really adds very little [disk] overhead. Something on the order of 1 percent."

Additionally, it has one more feature which may appeal to the fsck-hater in you: "Really, it's nearly a defragmenter already," Phillips says. "It would be trivial to add that functionality."

The dual advantages of lower overhead and -- most importantly -- a close relation to the ext2 file system should make it an easier transition for most users. Tux2 is actually built as a patch to the ext2 filesystem; standard ext2 filesystems are converted to Tux2 at mount time. According to Phillips, that conversion takes on the order of a tenth of a second per gigabyte on a typical system.

Fly In The Ointment Though Phillips downplays their significance, patent difficulties may lie ahead for Tux2 as well. Network Appliance applied for a patent in the early 90s which covers similar ground -- a few years after Phillips had implemented it in his database.

"What really steams me in this is that their [patent] application came three years after my invention," says Phillips. "I hate to use the word infringe, because that makes me sound like the bad guy -- but it seems as though my [method] doesn't infringe beause it uses a different algorithm. In fact," he says, "I've got two things: I've got prior art, and I've got a better algorithm ... We can fence them in [legally], so their best strategy is to be nice, but they haven't figured that out yet."

"I don't want to suggest that NetApp got the idea from me -- I don't think they did, I think they developed it independently. The only little problem is the chronology of it. I concieved the whole thing, essentially everything that they've written in their patent, so I was kind of upset when I saw it. I would have gone on to do in on a Unix file system at the time, if I'd only had one available. We know it's stupid, but you see people patenting things all the time on the web -- just because it is a business idea that is now being done on the web." The approach that Phillips has to the dispute is to simply keep working. "I don't want it to become a distraction, I just keep doing what I'm doing."

Do penguins have calendars? Phillips says that Hans Reiser has approached him regarding integrating the file protection capabilities of Tux2 with the additional features of ReiserFS. "But it's pretty obvious where the priority has to be," he says, noting that ext2 is the default file system, and isn't going away any time soon. "Ext2 is what everyone has by default, and that's too big to ignore."

Does Phillips anticipate Tux2 becoming the default file system in Linux systems? "Well, who knows what's going to happen?" he laughs. "It could. But you can be sure of one thing, Tux2 will live a fairly long life as an independent patch that people apply, and I will be the 1st to apply it. But sure, of course I'd like that."

With a caution that fits someone whose last job was in embedded controls, Phillips warns against putting Tux2 in too soon: "It has to be proven, it has to be 100 percent. Because that's the whole point of this, is to 100 percent. So I think any bug which is not an ext2 bug already is just not acceptable."

And ultimately, like any other possible low-level change, "It's up to his high penguiness." Besides which, "it's quite clear what the next Linux filesystem standard is going to be. Well, it's my opinion that ext3 is going to be the most popular standard linux filesystem next year. And a couple years after that, well, I certainly will be using tux2 all the time, and we'll see where it goes."

The current status is heavy development: "I want to give it as a Christmas present to myself and start using it in my root system for my own development," says Phillips, "as soon as I port it to [the 2.4 kernel]." Soon after that, the code will be released to the developers on the Tux2 mailing list which Phillips has been assembling, who will work to make a public release in the months that follow, a process which Phillips says will likely take six months to a year.

"There is a prototype for kernel 2.2.13. I'm not going to release it -- I have my reasons for that, and the main reason is that the amount of cleanup to make it presentable to the public is roughly the same as the amount of work I have to do to bring it to [a newer kernel]. Probably if I'd done nothing else but worked on it for a couple of months, I'd be using it now, but I've done a few other things [in those months], like change from an industrial control systems job where they wanted me to do the next version of the control system in Windows NT to a nice linux job where I can hack the kernel."

Does this have anyone else itching for 2.5?

252 comments

  1. Ordered writes possible with normal HDs? by Anonymous Coward · · Score: 1

    It seems to me that the crucial point about the strategy is not so much using "atomic operations", but to a much larger extent the well-ordered execution of those operations.

    How, in general, can the ordering of the atomic processes (block writes to the HD) be guaranteed? The caching strategy of most HDs is probably completely invisible to the OS. The only possibilities I see right now would be to disable caching on writes, or to flush the drive's cache between operations that need to remain ordered. Wouldn't this imply a significant performance penalty? Or do IDE/SCSI HDs actually provide a good mechanism for ordering write operations (other then disabling write-back cache)?

    I'm aware that journaled filesystems have to cope with the same problem. I'm just wondering.

  2. READ THE ARTICLE by Anonymous Coward · · Score: 1

    It's good "first Post" material you have there, but if you had READ THE ARTICLE, you would notice that this filesystem is not a journaling filesystem, and therefore is NOT comparable to BFS, obviously. Particularly, it has the potential to be rather faster.

    "Unfair" to the idiots who modded you up.

    Thank you for your time.

    1. Re:READ THE ARTICLE by Rand+Race · · Score: 1
      If you would read the post, you would notice that he was refering to the old "database-like" Be file system not the current journaled BeFS.

      --
      Insanity is the last line of defence for the master diplomat. But you have to lay the groundwork early.
    2. Re:READ THE ARTICLE by ranessin · · Score: 1


      When discussing filesystem performance, I think discussion of any modern filesystem, and the features is contains, is worthy.

      "Unfair" to the idiot(s) that moderated your parent post as "Offtopic"

      Ranessin

  3. Routine FUD by Cardinal · · Score: 1

    They'll say that it's unproven and/or untested, too new or too young to demonstrate that it's actually a reliable filesystem. Assuming it does prove itself, of course. If there are problems with it, they'll either point out and exaggurate those problems, or simply ignore Tux2 altogether.

    1. Re:Routine FUD by cyber-vandal · · Score: 2

      Wouldn't that sort of contradict the 'Linux is based on 30-year old technology' argument though?

  4. Re:Pulling the plug by Alex · · Score: 1
    Actually I did the pulling the plug test on an E250 yesterday with logging enabled and the box booted as if nothing had happened. It boots quite a bit faster as well.

  5. Re:Is this filesystem immune to the "rhnsd factor" by Phroggy · · Score: 1
    um, you do know that you can just use a swap file, instead of a partition? It's a bit less efficient, but works in a pinch, and obviously swap files can be created and deleted at any time. You can have multiple swap files active, so if your 400MB swap partition isn't enough, just make a 600MB swap file, run swapon, and you've got that gig of swap you needed.

    A new spiffy filesystem can't change the fact that your hard drive is partitioned. Within each partition lives a filesystem. You can't just simply change partition sizes on the fly. I don't know how Partition Magic works, and half the people I've heard from who've used it said it wiped their hard drive.

    --

    --
    $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
    $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
  6. Re:Nice algo, but.. by spacey · · Score: 1

    Most journals are written asynchronously for performance reasons. The performance of doing otherwise is considered unacceptable.

    So what make you feel that a journal necessarily has any more current and consistant information then a tree that's 2 phases out at the point in time of a disaster?

    -Peter

    --
    == Just my opinion(s)
  7. Re:Pulling the plug by spacey · · Score: 1

    I believe that the speedup is the result of gathering writes into groups. The more data you can throw on the disk in sequence, the better off you are in terms of speed. This is one of the ways that write-back caches speed up your life. All writes can be arranged so that data is written in a linear fashion to the disk.

    Using a data and metadata log can make some operations faster, but you do suffer with sun's ufs+logging the problem of having to write to the log, then copy the log to the filesystem. This can be an issue in an environment where the filesystem is used for a lot of transactions.

    IMO I believe that tux2 can do better in situations like this because of the elimination of an unneeded copy.

    Also note that sun's ufs filesystem has long been horribly, horribly slow relative to it's cousin, the bsd FFS. I'm going to guess that when adding the logging feature, sun's developers decided it could be seen as an opportunity to add other speed enhancements besides just logging.

    -Peter

    --
    == Just my opinion(s)
  8. Even more surprising omission! by Enahs · · Score: 1

    Daniel Phillips states that it's possible to pull the plug on such a machine, and not spend several minutes waiting for fsck to fix everything. Also surprising is the fact that it's intended as a replacement for ext2 (and the upcoming ext3).

    --
    Stating on Slashdot that I like cheese since 1997.
  9. Re:MIME types... by ReinoutS · · Score: 1

    ...and OS/2's HPFS has been using Extended Attributes for about, say, 8 years now. It can be (and is) used for exactly the kind of thing you're proposing.

    Oh, and HPFS fragmentation is negligable (sp?).

    By the way, did anyone hear of HPUFS? I hear the FreeBSD guys are working on it, seems to be an interesting project too.

  10. Re:One evil due to the Linux infrastructure. by iabervon · · Score: 1

    One reason to use a swap partition is so that your disk usage doesn't vary widely, depending on memory usage. Also, if your file system tends to get corrupted if the OS crashes while writing to it, it's a bad idea to have the OS write to a file system whenever memory is low. It's somewhat safer if you crash while only writing to a different partition.

    Actually, using anything on the same disk is a generally poor idea. When you're low on memory, it's generally because something is putting a lot of stuff there. Where does this stuff come from? Generally, disk. So you're pulling stuff in from disk and writing other stuff out to disk. This creates a major performance bottleneck. Whenever possible, I use a swap disk (or even disk chain, if there's one not in use), which is clearly going to be a fixed size.

  11. Re:can user processes schedule phase transitions? by Urmane · · Score: 1

    sync

    --

    --

    --
    "I find your lack of faith disturbing." -- Darth Vader
  12. Re:MIME types... by HBK-4G · · Score: 1

    BeOS's filesystem uses an extension of MIME to differentiate file types. Very nicely done.

  13. Re:Version control system by Gus · · Score: 1
    This was done in several pre-Unix operating systems, and a few post-Unix systems; however, it went by the wayside. If I recall correctly, the omission of this particular feature earned its own chapter in the UNIX Hater's Handbook. Of course, there's no reason why this feature couldn't be added to a source-available Unix, but no one seems to have yet.

    --
    --Gus
  14. Re:Is this filesystem immune to the "rhnsd factor" by Chang · · Score: 1

    I used Server Magic for the first time the other day and had a bad experience.

    I resized a 34GB NTFS partition, adding another 7GB or so.

    You have to reboot, so I did and when the system came up it started the resize process. It got to 54% complete almost immediately, then sat there for about 50 minutes doing nothing as far as I could tell (almost no disk activity). The finally, it rocketed up to 94% and then failed with a "too many clusters" error?

    So I rebooted thinking that I was going to have to restore the damn thing. It came up ok. but changed the drive letter.

    I've learned my lesson. I should have know better than to trust the crapshoot known as 3rd party tools on Windows.

  15. Re:Version control system by planet_hoth · · Score: 1

    VMS systems have something similar. (I don't know what their filesystem is called, though.) It was a nice feature, but kinda disk space intensive, obviously.

    --

  16. Re:WOW by RelliK · · Score: 1
    I haven't noticed a performance hit but we aren't a big company and our servers are not taxed too terribly much.

    Gee, wonder why. Might it have something to do with the fact that Reiserfs is faster than ext2? Much faster in fact.
    ___

    --
    ___
    If you think big enough, you'll never have to do it.
  17. Re:I am not wrong. You didn't read close enough. by RelliK · · Score: 1
    Also, wouldn't multiple swap partitions seriously fragment the memory, as well as causing intermittent instances of downtime just to copy the contents while the partitions are being made

    uhhm, what? Where the hell did you get that idea? If you create an additional swap partition or swap file, it just gets added to your total swap space. And if you create the swap partitions on different disks, this will even increase performace. Finally, the whole reason for putting swap on a separate partition as opposed to a file is precisely to avoid fragmentation (well, there's also no fs overhead).
    ___

    --
    ___
    If you think big enough, you'll never have to do it.
  18. Re:What is wrong with Reiserfs? by RelliK · · Score: 1

    Well, by that I meant the only one available in non-alpha/beta stage. None of the fs's you mentioned have been released in stable version yet.
    ___

    --
    ___
    If you think big enough, you'll never have to do it.
  19. Re:What is wrong with Reiserfs? by RelliK · · Score: 1

    Yes, but bottom line is they *are* faster. Significantly faster in fact. Anyhow, my main point was: why not concentrate on including Reiserfs in 2.4 since it is already available and works great? If SourceForge runs it, then it sure is stable enough for my workstation.

    ___

    --
    ___
    If you think big enough, you'll never have to do it.
  20. Re:What is wrong with Reiserfs? by RelliK · · Score: 1

    Then how come Reiserfs is so much faster than ext2? And I seem to recall that xfs and jfs also tout their speed.
    ___

    --
    ___
    If you think big enough, you'll never have to do it.
  21. Troll alert! by mortonda · · Score: 1
    As with most good trolls, it's hard to discern the difference between a good troll and a misinformed post. So here's a play by play analysis:

    Remember the problem in RedHat 7 where rhnsd would chew up all the file descriptors in a process of three weeks? I'm hoping that a fiasco like that never happens again. It's sad to see such a good distro company make such a stupid mistake like that and only have a pathetic excuse for it. I think that the problem with filesystem limits is in how they are always surpassed too quickly. Remember FAT16? Its first limit was 33 MB. With DOS 5.0, it became 2.1 GB. Then came FAT32, which has no absolute limit; however, Windows 2000 refuses to format a partition above 32 GB in FAT32 because of the greater efficiency of NTFS.

    This paragraph is way off-topic, an completely unrelated to Tux2. The only link I can see is the mistaken notion that file descriptors are related to a filesystem. Well, they're not. They exist in memory, as a part of a different subsystem. Apart from this, this post looks like a typical troll - inflamatory remark, bound to create some discussion. This paragraph make this look more like a troll than a misinformed remark.

    One bone I have to pick with ext2 is how the swap partition cannot be adjusted on the fly.

    Swap partitions don't use ext2, to the best of my knowledge. The whole point of a swap partition is to eliminate the overhead of a filesystem. And changing partition sizes on the fly is just mad.

    My Win2000 machine can adjust the swap file pretty well (with 7 windows of Quake 3, I forced the swap file size from 400 MB to almost 1 GB [!]).
    Ouch! why in the heck did you do that? At any rate, this is still OT.

    Will Tux2 have a dynamic swap partition?

    No. Filesystems live inside partitions - as such tux2 has nothing to do with swap partitions.

    After all, it's in the damndest situations where you realize that you made the swap partition too small.

    Well maybe, but the simple answer to that is, stop running 7 copies of Quake 3! I don't think that every computer system needs to be designed to handle every situation - I certainly don't need a car that has been designed to have a nuke dropped on in, and still function.

  22. Re:WOW by IRNI · · Score: 1

    Well mission critical is debatable.. we host websites for about 20 companies and my whole need for getting the server back up quickly is so I don't hear the phone ring off the hook with "WHy is my website down?!". Its a large hard drive and yeah FSCK took way too long when it was on the old box.

    I didn't recompile a kernel specifically for reiserFS I just made a new box, installed mandrake 7 with reiser and migrated all the websites to the new box.

    We have a tape backup of everything just in case. But its quite nice to have Reiser. And as one guy pointed out apparently it is supposed to be faster. But again I wouldn't know because I don't have any way to push it to test. And another guy mentioned hardware level preventative measures to stop morons from touching my machines. That will come in handy... if i can get in there on a sunday morning at like 3 am when none of our clients are watching :)

    I also turned off the captured ctrl-alt-del because we used to have an NT box and my boss would pick up the wrong keyboard to log in and reboot my server. Damn NT.... anyway KVMs and separation helped.

  23. Re:Version control system by John+Whitley · · Score: 1
    Yes, the Bell Labs folks created something like that for Plan9. It's not a filesystem, however, but rather the main fileserver. It's basically a WORM jukebox with a RAID caching the WORM and a bunch of RAM caching the RAID -- and the whole thing essentially just implements the 9P filesystem protocol.

    On a daily basis, the RAID cache is synched back to WORM. At any time, a user of the system can just cd to the fileserver's state for that day and inspect those files. E.g.

    cd /history/1997/October/13/...

    When they had the WORM (hmm.. might have been MO, don't have the paper handy) juke upgraded, the team noted that the capacity of the jukebox was expanding faster than they were using it, even with the system state snapshots. 8-)

  24. Re:TYPE & CREATOR CODES by gb · · Score: 1

    Even ignoring the Desktop Database, if it worked how you say, it may not be a separate database of filetypes, but it is still a database - just that it is the entire filesystem. I think it unlikely that a scan of an entire filesystem tree is going to be faster than a registry type system (NB. not that I'm advocating a Win32 style binary registry file - isn't my idea of a robust system !).

  25. Re:TYPE & CREATOR CODES by gb · · Score: 1

    Erm, this is exactly what the Mac does.

    I fail to see how it helps since you replace one three letter 'type of file' descriptor to be looked up in a database with two four letter ones which also have to be looked up in some form of database. Storing mime types explicitly might be marginally more useful, but in any case you'd have to change all user-space apps to understand the extra features of the filesystem. Good featuritis for filesystems is at best transparent to existing user-land apps and at worst optional.
    On a related note, would there be any sense in providing the filename extension->application mapping via a virtual filespace ? I know that both KDE and GNOME must do this, but a kernel level solution that allows you to 'open' an entry for each file/mime type and get paths to applications that can handle the file might be neat. (Not that I'm volunteering to code it :-) )

  26. These concepts are hardly unique by hpp · · Score: 1

    The techniques described (Phase Trees) were tried in databases in the 1970s (see the survey article by Verhofstadt in ACM Computing Survey, volume 10, issue 2, 1978). Databases have moved away from them because they either do not offer undo (roll-back) support, or do so at the cost of efficiently.

    One of the big innovations in databases of the late 1970s is that a transaction log can actually reduce I/O costs because you can (a) cluster unrelated writes and (b) avoid seeks - becuase all updates are written to one log, and the main data is updated lazily.

    If you don't want transactions and roll-back, the "careful replacement" strategies are definitely the way to go.

  27. Re:WOW by paulbort · · Score: 1

    If you can't have physical security at the room level, you should be able to set it up at the box level. ATX power supplies can usually be wired for always on, and AT's that don't have the switch built into the power supply can be fixed with a little bit of solder. (If they have the spade connectors on the back of the switch, you can just push the females together behind the switch, and you are once again idiot-resistant.
    (Don't forget the reset switch. I like to wire it to the keylock, so I can still use it if I have to.)

    --
    -- Spring: Forces, coiled again!
  28. Re:Version control system by T-Ranger · · Score: 1
    What if someone implemented a FS that combined versioning + data migration to near line, or even off line storage?

    Sure, drives are cheep, but tape is cheeper..

  29. Re:Don't forget the cache by Omnifarious · · Score: 1

    This is a good idea. I remember thinking of it a few months back. I should've sent some e-mail to the reiserfs development list. I had just assumed that it was obvious.

    The idea of detecting a power failure and automagically dumping to flash RAM is also excellent, and an even better idea. :-)

  30. Re:Sounds great, but BFS... by Lx · · Score: 1

    True enough. Have to chime in with my BeOS elitism and say we've been pulling the plugs on our machines for years with no ill effects. Be does this at Comdex and similar shows during demos, and of course then they get to show off it booting in under 10 seconds. As far as it not being a database, it's probably a good thing. From what I gather, they moved away from that for reasons of performance.

    To be fair, there are other filesystems that can beat BFS in raw performance, but BFS performance on its own is pretty damn good, and the additional features are a nifty bonus.

    -lx

  31. Re:Volume size limitations? by ShinGouki · · Score: 1

    actually, it means "second extended" (hence the 2) :)

    just take a look in filesystems when doing a kernel config, you'll see it listed as "second extended filesystem" support


    -dk

    --
    -dk
    Dream with the feathers of angels stuffed beneath your head.
  32. Re:What is wrong with Reiserfs? by ShinGouki · · Score: 1

    reread the article man, 80% of the focus was on the speed hit you take with a jfs that would be alleviated by this phase tree method


    -dk

    --
    -dk
    Dream with the feathers of angels stuffed beneath your head.
  33. KT by Poe · · Score: 1
    --
    Thank you for not thinking.
  34. Re:Is this filesystem immune to the "rhnsd factor" by ArsonSmith · · Score: 1

    swap partitions are raw partitions in linux giving the kernel the added advantage of not haveing to go through the file system code in order to write to swap.

    swap files on the other hand must be kept as a regular file somewhere on the system which means that every time something has to be swapped in it must go through both the swap code and the file system code in order to be completed.

    this may not seem like much but in swap speed is EVERYTHING

    ArsonSmith

    --
    Paying taxes to buy civilization is like paying a hooker to buy love.
  35. Re:Not Comfortable... by sith · · Score: 1

    Well, its not that you're not running an fsck with a journalling system (dunno about with tux2, not sure how he does that). What happens in the journal keeps track of all the files the system is using at a given time. When the power drops, or the system spontaneously reboots, the system only has to check the integrity on the files you were using. The only downside is the performance hit you take by having the drive continually writing the files the system is using. but, that is a fairly small hit, easily made up for the first time you get to skip the fsck on a 160gb raid array or something..

  36. Re:Is this filesystem immune to the "rhnsd factor" by E_D · · Score: 1

    You are confusing issues...

    In addition, Windows 2000 uses a 'lite' version of Veritas' Volume Manager, which (IMHO) is an excellent product that allows for all sorts of otherwise unheard of file system ajustments.

  37. Re:Version control system by meridian · · Score: 1

    sco also has versioning built into htfs. im not recomending sco as a solution tho :)

    --
    meridian at tha.net
  38. max file sizes by meridian · · Score: 1

    is there any support in tux2 for file sizes over 2gig? i have tried using lfs with 2.2.16/17 which allowed me to create a >2gig file except many utils (eg from util-linux) will still not work with files of this size.

    --
    meridian at tha.net
  39. Re:*BSD SoftUpdates provide crash resistance NOW by Chris+Pimlott · · Score: 1

    Thank you for including the irrelevant compile time comparison.

  40. Re:*BSD SoftUpdates provide crash resistance NOW by Pinc · · Score: 1

    Yes, it should, though at least with NetBSD's implementation (called "SOFTDEP"), it is still advised to do a fsck at bootup (and you still get the "file system not clean, please fsck" message when mounting an not-previously-umounted fs).

    But NetBSD has the Log-structured Filesystem (LFS), which is old, but active development has been resumed for quite some time and it will be quite mature and fully functional and stable in the upcoming 1.5 relase.

    It also goes beyond journalling, in that it ensures both metadata and data consistency at any time, thus you can also pull the plug during write operations and re-mount it at bootup without fsck.

    It also handles file creation and deletion very efficiently, comparable to asynchronous filesystems, and possibly beaten only by ReiserFS.

    Compare this to the ext2fs filesystem, which, being mounted asynchronously in its standard use, cannot even guarantee metadata consistency, resulting in a possible large number of files in /lost+found after a crash/power loss.

    --
    Bernd
  41. Re:Growable swap by Elladan · · Score: 1

    You can make a swap file in linux, and link it in on the fly. Eg:

    $ dd if=/dev/zero of=/root/swap.100M bs=1k count=100000
    $ mkswap /root/swap.100M
    $ sync # Yes, you have to sync
    $ swapon /root/swap.100M

    So, if you were running low on swap, you could make some more. I recall in times of yore, someone went and wrote a daemon to do this for you when you ran out of memory (though not, presumably, demanding it out from the kernel as necessary, like windows does).

    You can also do wild and crazy things with swap, like say running them over the network on a network block device, etc...

  42. Re:That command would be 'purge' I believe... by DGolden · · Score: 1

    I think you're missing the point. My post was talking about a rudimentary versioning system, as well as a trashcan. I don't use a script like that either, but I'm a lot more experienced than, say, my little brother.
    An OPTIONAL trashcan would be a conceptual crutch that might make the novice a little more comfortable, while not getting in the way of real work. Combined with a version history, it could be quite useful.
    The other implicit point which you didn't pick up on was that it'd be "the unix way" of happy little component tools chained together in user space to acheive the same end as some grandiose plan for a new filesystem.

    --
    Choice of masters is not freedom.
  43. Re:Just at the command prompt by DGolden · · Score: 1

    I don'te think intercepting the deleta calls a good idea. A new pseudo-delete call with similar semantics would be better, so that programs can be updated to use the trashcan/versioning facility if the so wish.

    --
    Choice of masters is not freedom.
  44. Re:TYPE & CREATOR CODES by Tam-Lin · · Score: 1

    Sounds nice, but what if, after creating the file, I don't want to open it up with whatever application created it to start with?

    --

    Silly signature limit . . .
  45. Re:Tux2 and patent issues by AT · · Score: 1

    Maybe you should read the whole article before posting. Hint: look for the "Fly in the Ointment" heading.

  46. I'd prefer extended file attributes alla SGI's XFS by bofh23 · · Score: 1
    I think the ability to store metadata associated with a file is useful, but I'd prefer not to enforce the available attributes and instead would like a system similar to that provided by SGI's XFS filesystem. In addition to the normal Un*x attributes you can supply arbitrary name=value attributes. Unfortunately, NFS 2 and 3 don't make those extended attributes available although I believe the designers of NFS4 will be considering this issue. see also:
    attr(1) - manipulate Extended Attributes on filesystem objects

    rfc2624 - NFS Version 4 Design Considerations

  47. Re:Version control system by Atomizer · · Score: 1

    VMS also lets you set how many versions you want. You can have none, 10 or whatever. I used to use this when I used RS/1 on VMS. All my critical tables with source code had a 100 versions. If I ever wanted to go back you could just rename the old versions to a newer version. You also had to use weird looking commands like DEL FILE.*;* to delete all the files.

    Windows has a program called GoBack that lets you set aside a percentage of your drive and you can recover any files, or the whole system from a previous time. So you could install a bad video driver that wrecks Windows, and not have to re-install. Just tell GoBack to revert to yesterday or something and you have your system back to that time. I think WinnyME has a system like GoBack, but more limited.

  48. Re:Version control system by tnegres · · Score: 1

    Many of the high-end "Appliance" type boxen have this feature. Usually called snapshots. You can set it up so that it keeps hourly, weekly, monthly snapshots of filesystems.
    Then all you have to do is go to /some/lame/dir/.snapshot/weekly.1 and look at last weeks snapshot of /some/lame/dir

    Not sure how it does it, but it has sure saved my butt on more than one occasion.

    -d

  49. Re:can user processes schedule phase transitions? by Polo · · Score: 1

    Would this also help when copying large quantities of files to an empty file system?

    Doesn't it take longer to write directory information than file data? If you have a power failure (unlikely), you could just start over.

  50. Re:Version control system by tongue · · Score: 1

    As several other posters have noted, VMS has/had this feature, which incidentally was the only one I liked during the brief period in which i had to develop code on VMS (COBOL, too---talk about the worst of both worlds!).

    Perhaps someone with some experience in the filesystem development area could comment on this for me:
    as someone else noted, VMS's implementation of versioning saves the full version of each file. How feasible would it be to be able to mark certain files as 'versioned' (with non-versioned files ignored in this scheme) and upon a save of a versioned file, the fs would perform some sort of a diff on it and save the patch? Obviously you wouldn't want that on all files, because i would think that a diff cycle on each and every file of a heavily used system would eat some serious cpu time. but for important ones: devel code, system files, etc., i could see that as a usable feature. How practical would something like that be?

  51. ADFS by Red+Moose · · Score: 1

    I say forget trying to make new filesystems, simply get hacking the Acorn Disk Filing System. which whups this Tux2 thing. I mean, what kind of idiot would test the filesystem by pulling out the power plug? Why not test how it runs over a bad hard-disk, with tonnes of bad sectors and 11 bytes files. Then tell me it's good. None of this crap.

    --

    Acting stupid isn't much fun when there's someone around who knows better

    1. Re:ADFS by Abcd1234 · · Score: 1

      But the point of a good file system is to avoid getting into a state with this kind of inconsistency in the first place. Granted, you want to be able to recover gracefully when there are problems, but the best situation is where, even in a sudden power-down or other strange situation, the filesystem can survive relatively unscathed.

  52. Re:TYPE & CREATOR CODES by Another+MacHack · · Score: 1
    actually, in the mac implementation, there IS NO EXTERNAL DATABASE.

    The Desktop Database is the external database, and if it gets corrupted, then applications lose their icons and documents forget which applications they should open under. Unlike the registry, it can be rebuilt with no loss of data by scanning every application on the drive (or at least those with BNDL resources)

  53. Re:(OT)Directory permissions by Another+MacHack · · Score: 1

    He's talking about a per-directory umask, not a periodic permissions update. The periodic chmod -R is only safe if it adds permissions rather than taking them away (i.e. your super-secret file had better start out as 600, not 644 until the script happens to run). Plus, you have to run a script every so often, which is wasteful since most of the time it won't have anything to do.

    Personally I like Netware's system; inhereited rights filters kick ass.

  54. Re:Version control system by thrig · · Score: 1

    Snapshots are a spiffy technology, but aren't the same as version control.

    Version control would establish a history of revisions to a file, with logs and the ability to patch between various revisions, and so forth.

    NetApp snapshots, on the other hand, require a section of the disk to be dedicated to storing recently changed files, and occur manually at the schedule set by the administrator-- two changes to a file inside the minimum snapshot period, and only the later change will get recorded.

    However, you can get the snapshot period down quite low (e.g. every 10 minutes) soas to approach immediate "snapshotting" that a real version control system would allow

  55. Thanks! by timothy · · Score: 1

    my fault, fixed shortly ago, but (dog ate my homework / the system is slow);)

    I think I also went through and "corrected" all of my double-els to single els before realizing that it was the other way around.

    Appreciate it --

    timothy

    --
    jrnl: http://tinyurl.com/c2l8yr / foes: http://tinyurl.com/ckjno5
  56. Re:Version control system by ttfkam · · Score: 1

    As long as you didn't try to version your log files (sheesh! wouldn't THAT get ugly) and were to focus all attentions on user and config data (/home and /etc), I would assume that there were innumerable benefits to this even with a speed decrease.

    I guess the sticking point would be that copies of the complete files should never be kept; only the difference deltas are important. The most recent file is the "real" file and all previous revisions are converted to diff outputs. For fundamentally different changes, this would be of no benefit. However in the real world, most config files, source files, etc. are usually minor changes which would occupy only a small fraction of the original file size. Slap on a max-revision flag or LRU algorithm when disk space gets tight.

    Who knows? Right now this might be a little too slow. Then again, people are looking at 3D user interfaces for the personal computer. Fifteen years ago this would have been laughed out of consideration let alone actually attempted. Now with the advent of Voodoo, TNT, GeForce, et al chipsets this isn't such a crazy idea anymore.

    An idea is only a bad idea while it hasn't produced anything useful. As soon as something useful emerges, it ceases to be a bad idea.

    Just a thought...

    --

    - I don't need to go outside, my CRT tan'll do me just fine.
  57. Re:Version control system by journey- · · Score: 1

    Well, similar but not quite the same ... VMS has versioned files which basicaly mean when you save a file, it gets to be one version higher than the last time you saved it. If you request a file without specifying the version you get the newest one. The downside is it keeps the entirety of each file i think.

    Dont quote me on this though, just remembering some of the oddities i found when i received a VAX-4100, and tried to learn some stuff about VMS.

  58. Not quite right by Craig+Davison · · Score: 1

    The 32 MB limit was the FAT12 file system, which survives today on MSDOS and VFAT format floppies.

    [OT] IIRC, the minimum size of a FAT32 volume is 512 MB. The limit of FAT32X (FAT32 with support for > 8 GB partitions) is 1 TB.

    Also, for the trivia buffs, NT (or is it OS/2?) can create and use 4 GB FAT16 volumes. The cluster size is pushed to 64 K.

  59. Wrong: Desktop Database by tmoertel · · Score: 1

    Wrong. MacOS HFS and HFS+ maintain per-volume "desktop databases" that contain (among other things) a mapping of creators to their respective applications. Hold down Command and Option keys as a volume is mounted, and the Finder will give you a chance to rebuild the its desktop database.

    See:
    http://developer.apple.com/techpubs/mac/MoreTool box/MoreToolbox-483.html

  60. Re:TYPE & CREATOR CODES by johnrpenner · · Score: 1


    > The problem is, how do you preserve this information when you
    > copy the file? 'cp' doesn't understand it. Existing programs
    > won't understand it. HTTP doesn't understand it. FTP doesn't.
    > NFS doesn't, etc, etc...
    >
    > You'd have to rewrite all existing software to become aware
    > of this extended information. So far, no one has taken up
    > this task.

    why are you saying 'it hasn't been done before,
    so we can't do it now'?

    software can always be improved. you've got to start somewhere.
    this is an excellent opportunity. this would be the basis that
    would need to exist FIRST so that it would become possible to
    write this into the 'cp' command sometime in the future.

    j.

  61. Re:TYPE & CREATOR CODES by johnrpenner · · Score: 1


    > I fail to see how it helps since you replace one three
    > letter 'type of file' descriptor to be looked up in a
    > database with two four letter ones which also have to be
    > looked up in some form of database.

    actually, in the mac implementation, there IS NO EXTERNAL DATABASE.
    it works like this: you have an image file - say: myImage.tif
    well, you can name it 'myImage' instead, and the creator and type
    info will be: TIFF/8BIM -- when you open the file (i.e. double-click),
    then the file system scans the directory tree for a file of
    type 'APPL' (i.e. an application) which has a corresponding
    TYPE resource (i.e. '8BIM') - then opens the file with that app.

    in other words:

    - the data file = 'myImage' - TIFF/APPL
    - the application = TYPE='APPL', and reads '8BIM'.

    therefore, no external database needed,
    AND - no need to open files and read the first few bytes
    in order to figure out what they are.

    results:
    - faster and more flexible file type determination over REGISTRY.
    - more reliable (because no external databases to get corrupted).

  62. Re:TYPE & CREATOR CODES by johnrpenner · · Score: 1


    WHOOPS! -- i mean:

    - the data file = 'myImage' - TIFF/8BIM
    - the application = TYPE='APPL', and reads '8BIM'

  63. Re:TYPE & CREATOR CODES by Egotistical+Rant · · Score: 1

    > Sounds nice, but what if, after creating the
    > file, I don't want to open it up with whatever
    > application created it to start with?

    Same way the Mac does it: rather than double-click, either drag-and-drop or explicitly open the file in the new application.

    I've always felt that file extensions were a contrivance, and the Registry concept only made this even more brain-damaged. I must admit, amidst all the other sloth that is MacOS, the type/creator scheme is among the system's human-centric gems.

  64. Re:can user processes schedule phase transitions? by eric17 · · Score: 1

    More interesting would be system calls to perform multiple-file transactions for any file system, but optimized for the case when the file system supports phase trees or similiar ideas.

    Yes, it's good that individual files can't be corrupted, but sometimes it is important to keep multiple files in sync. This sort of corruption is just as annoying and looks the same to the average user - their app doesn't work right.

    -- Eric

  65. Re:Version control system by jefu · · Score: 1


    Back in the mists of prehistoric time I used a Univac 1100 system running some Univax OS whose name I forget for the moment. It had the (um) feature that in text files, the last five versions of the file were stored in the file itself with each line marked with its version. So you could easily move backward - up to five versions back.

    Of course, this did rather complicate reading in a text file a bit.

  66. Re:*BSD SoftUpdates provide crash resistance NOW by redelm · · Score: 1

    SoftUpdates is not a new filesystem. *BSD still uses the same FFS it always has. It's just a flag to modify OS behaviour.

    Simplified, SoftUpdates is nothing more than ordered writes: Data hits the disk before metadata is altered. And lower level metadata is written before higher level. So when powerfail happens, there may be some unattached inodes, but chains are not broken. They just have old data.

    Tux2 is a whole new "Phase Tree" filesystem. That means that both the FS proper and the tree algorithm have to be debugged in parallel. Fun!

  67. Re:*BSD SoftUpdates provide crash resistance NOW by redelm · · Score: 1


    Erm ... my wife has the UPS. She _hates_ it when MS-Windows95b crashes due to the rotten power around here [Houston] - 1-2 dips/month. Believe it or not, I can keep her box [sic] stable iff she'll let me near it for the preventative maintenance it requires :)

  68. Re:No, magic numbers are the way. by vectro · · Score: 1
    Take a look at /usr/share/magic sometime.

    From the 'file' manpage:
    The magic file entries have been collected from various sources, mainly USENET, and contributed by various authors. Christos Zoulas (address below) will collect additional or corrected magic file entries. A consolidation of magic file entries will be distributed periodically.
  69. Re:Version control system by ddmckay · · Score: 1

    VMS (Ick!) has this.

  70. Re:One evil due to the Linux infrastructure. by charon.de · · Score: 1


    Thx for cleaning up this tread...

    In case I regocnize some swapping on the machines I admin, I just order some more RAM, shut down, plug it in, nice moment to get a new and fresh compiled kernel running and finally the problem is solved...

    Linux should never swap to disk and it will fly...:-)

    Michael

  71. Re:Is this filesystem immune to the "rhnsd factor" by pyros · · Score: 1

    I never used ServerMagic, never felt good about doing that kind of thing on a live server. But I never been let down by PartitionMagic. I've used it to resize/move/copy/convert partition plenty of times. The only time it doesn't work is when using the boot floppy that PM4 creates. It boots Caldera DR-DOS. No matter what operations I choose, when it tries to apply them, I'm told the BIOS and the Windows environment are reproting different driev geometries. It works fine running from windows, with all the bels and whistles, just not from DOS. I never tried the Linux tools it came with, I really like the GUI too much.

    --

  72. Re:Wrong: Desktop Database by toh · · Score: 1

    I know it has preferred applications, but I believe they're stored in an external database equivalent to the Windows file type registry - there's no space in the filesystem directory entry for "creators". That means you can't have a different creator for two files of the same type. Unless you mean

    By contrast, the Finder's desktop database just records where creator applications are (if they're present), and what they advertise as filetypes they can open (in their BNDL resources). But I'm not sure what you mean about getting out of sync, that's not something I've genuinely seen happen since System 7.5.x or so. In fact the creator-app tracking and alias tracking is damn near perfect for local volumes in 8-9 (network volumes are a different story, but that's not a filesystem issue).

    Thanks for the link.

    --
    -- Life is short. Forgive quickly. Kiss slowly. ~ Robert Doisneau
  73. Re:Version control system by toh · · Score: 1

    There are quite a few hierarchical storage management systems that do exactly this. Most of the ones you'd want to rely on are commercial and typically bundled with expensive hardware, but I think you'll find a couple of more modest open source projects at freshmeat or sourceforge.

    --
    -- Life is short. Forgive quickly. Kiss slowly. ~ Robert Doisneau
  74. Re:Wrong: Desktop Database by toh · · Score: 1

    Aha - that's what I was getting at, then. It sounds like the storage for the preferred app is actually in the filesystem, rather than in an external database. That's cool. Although it's possible they've done it with BFS indices within that particular filesystem (as mentioned before) instead of something like enduring unique creator types. Still, nice touch.

    --
    -- Life is short. Forgive quickly. Kiss slowly. ~ Robert Doisneau
  75. Re:MIME types... by runswithd6s · · Score: 1

    See also file(1), magic2mime(1), /etc/magic, /usr/share/misc/magic, mailcap(5), /etc/mailcap, ~/.mailcap, metamail(1)

    Don't take this the wrong way, but you sound like a desktop user who's not explored enough of a UNIX system from "console land" to know what tools are available for your disposal. Did you know that you can set up your own ~/.mailcap to specify what applications get run based on mimetypes? Did you know that you can figure out what type of file a given file is by using the file command?

    I should be easier on you. What you're talking about isn't entirely off base with some of the advancements available in newer file systems. It is not inconceivable to store a magic number in the first 32bits of a file entry on the ReiserFS system. Doing so would allow applications like file much quicker access to file type signatures without having to actually open the file.

    So your comment isn't too far off the mark with regards to the topic at hand, filesystems. Where your comment would have better application would be in reference to a unified approach to maintaining MIME-to-application lists. Just remember to look at the applications I suggested above.

    --
    assert(expired(knowledge)); /* core dump */
  76. Has anyone noticed... by zappe · · Score: 1

    The complete lack of code, zero references to published literature or hard documentation on this filesystem? How is a filesystem supposed to become king of an open source OS without any source?

    And I thought Microsoft was bad with hyping non-existant features...

    ...Now with anti-gravity packaging!...

    And, by the way, journaled filesystems dont typically let you do a "begin; make install ; commit" sequence. The transactions are only for metadata updates. Otherwise accesses to the filesystem would have to be locked. Transaction isolation of that sort is usually relegated to the world of databases.

    Mike

    1. Re:Has anyone noticed... by zappe · · Score: 1

      Oh, you think you can stop my flame war with your simple "Hitler" message! Hahahaha! I am Mojo-Jojo! I will rule! There shall be only one flamer, and his name is Mojo-Jojo, the one and only flamer. Because Mojo-Jojo is my name, and I am Mojo-Jojo, and I will flame greatly on Slashdot!

    2. Re:Has anyone noticed... by zappe · · Score: 1

      Ehhh, the intelligence/credibility of this whole thread is questionable, to say the least, so I figure Mojo-Jojo is about as applicable as Tux2.

  77. Re:More wishes by psergiu · · Score: 1

    AUCH ! And the disk will grind and grind all day long slowing down everything.

    A simmilar thing is implemented in hardware in the HP AutoRaid 12H storage boxen. The data is kept in Raid5, the unused space is used to Raid1/0 the most accesed blocks. In perionds with no disk activity it will "optimise" the data by moving the most accesed blocks in Raid1/0. This way you have a "fast" raid5 storage and a "space wise" raid 1/0 :)


    --

    --
    1% APY, No fees, Online Bank https://captl1.co/2uIErYq Don't let your $$$ sit in a no-interest acct.
  78. Re:Version control system by ericbusboom · · Score: 1

    Yes -- it has been done. The product is called ClearCase. The product has been bought and sold many times, and is now owned by Rational. www.clearcase.com will get you a link inside Rational's site.

    ClearCase works very well, but being infinitely flexible, it is also very complex and fairly slow. At my last company, misuse and misunderstanding lead to several disasters that lost several weeks of work -- precisely the event that the product is supposed to prevent.

  79. Re:That command would be 'purge' I believe... by phutureboy · · Score: 1

    IIRC, the Delphi online service used to run under VMS on a honking big VAXCluster, as did the WELL (minus the honking big part).

    I'll say one thing about VMS... it was/is mega-reliable. I don't recall our 2-node cluster hicupping even once in the course of 6 years.

    --

  80. Re:TYPE & CREATOR CODES by Stonehand · · Score: 1

    Use magic numbers, not extensions. Perhaps you've been wandering in the Windows world too much, but you must not have tested your ideas on a 'nix box. Try renaming, for instance, a JPEG file to have a .txt extension -- and xv handles it fine.

    Why?

    Because the first few bytes in the file conform to what is expected of a JPEG. Open one up -- and there's a header inside. It really DOES NOT CARE about the extension.

    And, this is much saner than altering the filesystem...

    --
    Only the dead have seen the end of war.
  81. Re:*BSD SoftUpdates provide crash resistance NOW by ptbrown · · Score: 1

    Very true, no one is about to deny that ext2 is a dinosaur.

    But how does soft-updates compare to Tux2? Which, believe it or not, happens to be what this article was about.

    --
    Any sufficiently advanced civilization is indistinguishable from Gods.
  82. Don't forget the cache by ptbrown · · Score: 1

    I don't think journaling, phase-trees, soft-updates, whatever are quite the panacea the hype seems to making them out to be. Not that they aren't useful or shouldn't be pursued -- they are and should. But the ultimate problem we're dealing with is that disks are slow. So writes are cached, and that's what causes problems when there's an unexpected poweroff. And caching occurs at many levels: the disk drive, possibly the disk controller, the device driver, and the file system driver. So the thing to remember is even the writes made as part of journaling/phase-tree/soft-update/whatever are, at some point, cached. So the potential for data loss is still very much present, and I fear (well, not really, since I don't give a damn if your computer crashes while downloading porn) that this hype over FS technologies might be presenting a false sense of security.

    So in short, the problem with file system inconsistencies won't go away until we have storage devices fast enough to allow for immediate committal of writes. (But will we then have such incredibly faster RAM and bus speeds that even <1ms will be considered "slow"?)

    --
    Any sufficiently advanced civilization is indistinguishable from Gods.
    1. Re:Don't forget the cache by SDSFracture · · Score: 1

      This functionality has been available on higher-end RAID controllers for years (read Compaq Intel servers). You can have a power supply cook off, kill the system board, and if it doesn't take out the array controller, pull the dead system board and power supply, replace them, and (theoretically) have it finish the disk writes once the system powers back up. I believe this was the SmartArray product - it's been 6 years or so since someone tried to sell me one for a 50 seat novell network.

    2. Re:Don't forget the cache by dangermouse · · Score: 2

      Ooh.. I wish I had mod points, 'cause this is a pretty good idea. Suppose the system could then detect the power failure and write the contents of this RAMdisk to flash RAM? That'd make it semi-permanent, and it would matter little how long it took to get the machine back up... upon coming back, the system could check the flash RAM for updates and make them before continuing.

      Or something. I sure am making this up as I go, without knowing much at all. ;)

    3. Re:Don't forget the cache by Guy+Harris · · Score: 2
      It is true that disks themselves have caches, and I'm not sure what guarantees the hardware makes about those, but I believe that the idea is that if the os block driver asks for a write of a single block, the drive is pretty much guranteed to have enough power to finish that write. I'm not totally sure on this, though.

      Probably a wise decision - I'm not sure I'd trust write-caching disk drives not to lose data on power failures. I suspect many OSes simply tell the drives not to do their own write caching.

    4. Re:Don't forget the cache by jshalott · · Score: 2

      No one's forgetting the cache. The whole point of this is that it's a way to deal with the fact that filesystem caching creates a disparity between the os-view of the file system and the actual state of the filesystem on the disk. Instead of just dirtying and writing individual pages willynilly, we keep an idea of "snapshots" (not to be confused with NetApp snapshots...) of the whole filesystem, and we ensure an entire snapshot of the filesystem is written out to disk atomicly, not just single blocks/pages. So we don't guarantee the contents of your files, but we do guarantee the consistency of your filesystem. It is true that disks themselves have caches, and I'm not sure what guarantees the hardware makes about those, but I believe that the idea is that if the os block driver asks for a write of a single block, the drive is pretty much guranteed to have enough power to finish that write. I'm not totally sure on this, though. Of course, as stated in the article, we in the free BSD world have been enjoying soft updates for a long time now, so this idea isn't all that revolutionary.

    5. Re:Don't forget the cache by be-fan · · Score: 3

      One thing I was thinking is that journaling data or disk cache could be written to a battery backed up RAM-disk. That way, if the power fails, all the data would still be on the RAM disk, and the disk could be properly updated. The RAM disk would be as fast as regular RAM (well, depending on where it is located) so writes would be nice and fast. RAM is cheap, batteries are cheap, so whats holding it up?

      --
      A deep unwavering belief is a sure sign you're missing something...
  83. Re:Version control system by jhines · · Score: 1

    It was very nice in a development environment, since you had the last xx (settable) versions of your work available to you.

  84. Re:this is not going to make microsoft happy... by Drestin · · Score: 1

    NTFS 5 is a commercial quality journaling file system.

    What a pack of idiots. I don't see how those linux guys can operate screw-top caps.

  85. Re:this is not going to make microsoft happy... by Drestin · · Score: 1

    You wrote: "but when an NTFS partition corrupts so badly that..." - well, gee, if we're just making up scenarios that's pretty unfair! I mean, why can't I say: "but when Linux screws up so badly that the pc not only won't boot but has actually suffered physical damage..." - you didn't address likelyhood ...

  86. Re:*BSD SoftUpdates provide crash resistance NOW by AntiBasic · · Score: 1
    That paper you are referring to is Here.

    Soft Updates: A Solution to the Metadata Update Problem in File Systems by Gregory R. Ganger and Yale N. Patt

    Now onto my opinion...

    ffs with softupdates is great. It has the rare combination of speed AND stability. One of major problems with that behemoth known as ext2fs is that its either slow and stable or fast and extremely unstable. While some of the "alternative" Linux filesystems fix some of the problems inherent with ext2fs, they still won't touch ffs w/ softupdates for a while.

  87. Re:You can get crash-proofness from ext3 now by AntiBasic · · Score: 1

    That's nice and all but ext3 has been shown to be slower than ext2. Not only that, it doesnt have the maturity that ffs with softupdates has. I'll have to admit the mount compability sounds neato.

  88. In fact... by Freedent · · Score: 1

    At trade shows the Be guys used to demonstrate the FS by actually yanking out the cord and rebooting. No just one or twice in a show mind, but repeatedly as necessary.

  89. Re:Version control system by Troed · · Score: 1
    At my previous company, we used ClearCase until all the developers (myself included) stopped working until we had another solution.

    Reason? Well .. it kind of bluescreened NT whenever there was "too much" disk access ... like ... uh ... when we compiled our software ...

    Getting a bluescreen every second time you compiled your work, losing all that work and needing to rebuild the system (took a few hours) meant we never got any real work done ..

    (That said, I'm sure the bug is fixed now)

  90. Use a LRU replacement scheme... by MfA · · Score: 1

    Treat old versions as temporary data which can be purged at will, the more space you have the more history you keep. If it can be done with a small enough overhead it would be interesting.

    1. Re:Use a LRU replacement scheme... by jmv · · Score: 2

      ...Only if you have some activity on the drive, you'll need to wipe the history often. You can have a "deamon" that deletes the history when the disk is too full, but then there wouldn't be much difference with the DOS delete/undelete (well, it would be a bit more powerful, but not that much).

  91. Ehmmm the "daemon" would be part of the filesystem by MfA · · Score: 1

    As I said in the original title, which I hate wasting, you could use a LRU replacement scheme. The history would not be static, it would form its own actively maintained part of the filesystem.

  92. Re:TYPE & CREATOR CODES by PSC · · Score: 1


    so, if you have a text file, you don't need to put .tif on the end of it, simply, you would have the type and creator of the file set to: 'TIFF' and '8BIM' which would mean that its a TIFF file, and it should be opened by photoshop if in a GUI you go and double-click it.


    Why would this information belong to the file system?

    IMHO generating a huge number of arbitrary file "types" in filesystem space (read: kernel space) when it can be painlessly done in user space is a Bad Thing(tm). I mean, these types had to registered in an /etc/services-style database anyway, which (again IMHO) is *way* more registry-like than a naming convention. And don't forget that it IS just a convention -- you can name your files whatever you like. Store an image without any extension, and xv will still get things right, for example.


    this approach makes it much more difficult for any accidental SEPARATING of the file type info from the info that determines which app should open it


    On my account, I alone determine what application should open TIFF files. Thus a system-wide mapping of type app on a multi user O/S is simply inacceptable. (It's a typical single-user line of thought, after all.) Even if you go with file system-integrated file types, you would still need per-user mappings.

    Anyway, there's no real need for this meta-information, be it in the file name or in the inode. man file(1).

    Cheers, patrick

    --
    --- The light at the end of the tunnel is probably a burning truck.
  93. tux2 website by jmd! · · Score: 1

    URL in the article is a 404.

    correct url:
    http://innominate.org/~phillips/tux2/

    mailing list is here:
    http://innominate.org/mailman/listinfo/tux2-dev

  94. More wishes by heikkile · · Score: 1

    This sounds wonderful, but may I humbly propose more features. Linux memory management seems to believe in the old saying that unused memory is wasted memory. Why not the same about disk space? It should not require too clever AI to see which files keep changing, and to keep backup copies in the "unused" disk space - or complete version histories... Likewise, seldomly used files could be compressed away, transparently to the user. And perhaps files that are often used together could be located in the same area of the disk, to make them more accessible, or whatever. Just don't waste the "unused" disk!

    --

    In Murphy We Turst

  95. Re:TYPE & CREATOR CODES by iankerickson · · Score: 1

    That's called HFS, the MacOS file system, not an original idea.

    --
    Democracy. Whiskey. Sexy. Pick any two.
  96. Re:Version control system by iankerickson · · Score: 1

    RTFH.

    SET FILE/VERSION=n

    where n is the max number of old versions you want VMS to retain. Otherwise, you have play janitor with your disks every so often or keep a purge job retained in batch queue.

    PURGE/KEEP=n will delete all but the last n versions.

    Also, versioning on vms requires the app to implement it correctly. You can easily update a file without creating a new version with a little rms know-how. One of our vendor systems does it wrong. It will only execute version 1 of a file, read from the latest version, and save to the latest version without creating a new file. This is where that set file/version=1 comes in handy... VMS can be made to do what you want, except save just the diff between versions. AFAIK, it makes a completely new copy each time.

    I thought someone wrote a cvs filesystem as a add-on to ext2. You'd have to have a really clear idea when to clean out old diffs, or your file system would start looking like a masters thesis written in MS Word on "Fast Save".

    --
    Democracy. Whiskey. Sexy. Pick any two.
  97. Re:Magic 404s by great+throwdini · · Score: 1

    See above -- formatting went goofy, but the link is fine.

  98. Re:Version control system by ekidder · · Score: 1

    Aye, that be the truth. It has only been two months since I stopped working on a VAX (thank the Gods) and that was one of the things I had to get used to when I was playing with it. And periodically deleting old versions :|

  99. Re:TYPE & CREATOR CODES by jallen02 · · Score: 1

    I think the point is you identify the file creator in there but you should be able to change that easly say from photoshop to gimp? anyways its easy enough to do.

    Jeremy

  100. Re:There are serious problems with this idea by __donald_ball__ · · Score: 1

    The basic problem is that information vital to the use of the file is not stored in the data that you get with read()/write(). This makes it impossible to cleanly store this data on another system or to transfer it. Yes, you can "encode" (or "binhex") it, but if you do that, why not just store the encoded version on the disk, and remove a large and complex mess from the OS?

    So you add new system calls - read_creator(), or read_meta("creator"). what's the big deal with that? applications which don't know about those system calls blithely ignore the meta information.

  101. Re:Wrong: Desktop Database by Wesley+Felter · · Score: 1

    You can have two different preferred apps for two different files of the same type, because there is a (optional) preferred app attribute on every file.

  102. Re:There are serious problems with this idea by Wesley+Felter · · Score: 1

    That's an interesting idea, but I guess it depends on the definition of a file's "data". If the data is everything that you can get from read(), then storing all the attributes in the file data will cause most apps to think files are corrupted. OTOH, if attributes are contained in a new kind of file data that isn't visible to read(), then I don't see how that's much different from a system like BFS or XFS where an inode can hold as many named attributes as you want.

  103. Re:There are serious problems with this idea by Wesley+Felter · · Score: 1
    Let's make a few changes here:

    The basic problem with file permissions under Unix is that information vital to the use of the file is not stored in the data that you get with read()/write(). This makes it impossible to cleanly store this data on another system or to transfer it. Yes, you can "encode" (or "tar") it, but if you do that, why not just store the encoded version on the disk, and remove a large and complex mess from the OS?

    In fact there is absolutly no reason for timestamps to be stored in any way that the OS sees. The data is only used by user-level programs.

    My point is that the decision of what metadata should be stored natively by the filesystem is mostly arbitrary; file types are as reasonable as mod times. (Permissions are an exception because they're necessary for system security, but I couldn't resist.)
  104. Re:One evil due to the Linux infrastructure. by Wesley+Felter · · Score: 1

    Yeah, I don't like swap partitions either; I'm not sure why all the Linux distributions I've seen are using them.

  105. Re:Wrong: Desktop Database by Wesley+Felter · · Score: 1

    BeOS has creator types (except they're called something like "preferred applications"). Now that I think about it, BeOS probably also uses BFS indices to find an app given its signature; this is cool because the filesystem's indices never get out of sync like the Finder's desktop database can.

    See BRoster::FindApp() for some details.

  106. Re:There are serious problems with this idea by Wesley+Felter · · Score: 1

    Actually, permissions are necessary for insecurity - if you didn't have them, people would just be limited to working with their own files and never be aware of any others.

    This is true, but sometimes users want to share files. I think per-process namespaces (as in Plan 9, Inferno, Spring, EROS, etc.) are a good compromise that allow some things to be private and some to be shared.

  107. Some corrections by Wesley+Felter · · Score: 1

    If rhnsd was leading file descriptors, then no filesystem could fix that, because file descriptors only exist in memory.

    One bone I have to pick with ext2 is how the swap partition cannot be adjusted on the fly.

    This has nothing to do with ext2. In general, you can't safely resize partitions at runtime. Swap files are probably more flexible, but I haven't used them under Linux.

  108. Re:Version control system by f5426 · · Score: 1

    I toyed with an analog idea a dozen of times already. I'd like to write a driver for that sometimes (if I have time)

    My idea would be to do it at the block level of the device. Writes would be stored into a separate device. Stored, not logged, in the sense that two writes in the same block would be stored at the same place. Blocks would be indexed by a btree.

    The idea would be to be able to take a 'snapshot' of a device, which would become read-only (and would be able to be mounted somewhere). From time to time, checkpoints could be done, which would also appear as read-only versions. Checkpoints would be able to be merged together, or into the original. When merging into the original, you should be able to produce a reverse-merge which is data you could use to get back to the stored state of the device (ie: it consists of the previous image of data). Amusingly, they would have the very same format than forward logs. They enable to store previous version somewhere.

    An extra twist would be to store the previous image in a separate file and doing the writes in the real device (ie: keeping an undo-log instead of a redo-log)

    Speed would get slower and slower as the number of checkpoints increase, but I don't mind this.

    A goal is to be able to do system-wide modifications and check you are happy with them before commiting (say upgrading various parts of the OS, or when you are ready to 'make install' as root, but fear that the software package you are about to install may put zillions of file in the wrong directories)

    Another goal would be taking snapshot of filesystem, to guarantee consistent archiving, for instance. I hate the feeling I have when dumping an active filesystem...

    Cheers,

    --fred

    --

    1 reply beneath your current threshold.

  109. Magic 404s by Klowner · · Score: 1

    I got a 404 on http://www.innominate.org/~philips/tux2 :(

    Klown
    LAST POST! er, no wait...

  110. Re:Version control system by Sunda666 · · Score: 1

    the filesystem from VMS (the OS of the good old old VAXes) does exactly this.

    --


    ``If a program can't rewrite its own code, what good is it?'' - Mel
  111. Re:Version control system by shepd · · Score: 1

    Ahhh, everybody talking about VMS having this feature.

    Netware has had a similar feature for a long time that is (IMHO) better implemented because it is simpler.

    When you delete something, the delete attribute is set, unless the purge attribute is already set, in which case the file is actually unlinked.

    When HDD space is low (I think there's a high/low watermark you can set on the server) files start getting purged from the system. If you are desperate and want everything purged immedately, you can always use "purge /a".

    This way you can salvage your "deltree /y *.*" mistakes. :-)

    Common, a set of purge/delete attributes can't be hard to implement in ext2fs, eh? (No, I have NO CLUE about programming filesystems, that's why I'm asking).

    --
    If you could be told what you can see or read, then it follows that you could be told what to say or think - BoC
  112. Re:One evil due to the Linux infrastructure. by cthulhubob · · Score: 1

    > Personally, I'd prefer a setup where you define the maximum limit of the swap file by the size of its partition.

    umm, that is how it's defined...
    $ free
    total used free
    (cutting out the stuff about actual RAM...)
    Swap: 145112 20328 124784

    I'm using 20 megs of swap out of a possible 145 megs (or so).

    but in any case, if you need to add more swap space (say I needed an extra 55 megs of swap), it's not too hard.

    dd if=/dev/zero of=swapfile bs=1024768 count=55
    mkswap swapfile
    swapon swapfile

    --

    In post-9/11 America, the CIA interrogates YOU!
  113. Re:One evil due to the Linux infrastructure. by cthulhubob · · Score: 1

    Doh! ::slaps himself on the forehead::

    I need more coffee...

    --

    In post-9/11 America, the CIA interrogates YOU!
  114. Re:Version control system by jameshowison · · Score: 1

    Can anyone tell me how this relates to WebDAV?

  115. Re:Version control system by duggy_92127 · · Score: 1

    Rational's ClearCase product is implemented as a filesystem, called MVFS. It's not a generic filesystem for everyday use, but it does have versioning (and branching, and labels, and hyperlinks, and metadata...) all built in.

    Doug

  116. Re:TYPE & CREATOR CODES by 2Bits · · Score: 1
    Yeah, then you should have also three forks, just like the Mac filesystem.

    While at it, why don't you just use Mac instead, and leave the Unix file as it is?

    Keep the OS an OS, and everything else on the application level, let the applications handle it, would you?

  117. Re:I am not wrong. You didn't read close enough. by itarget · · Score: 1

    Don't be fooled by the appearance of a single swap file. That sucker is fragmented all over the place. A resizable swap file is definately a hands-off convenience, but it's a Very Bad Thing(tm).

    Each time windows needs more swap, it will just grab a chunk of available HD space. Each time it does this your swap file fragments more and more. This is compounded by the fact that windows tends to free up chunks of these chunks and/or grab more elsewhere as memory needs change. Not only does this fragment your swapped memory, it fragments your HD as well. It becomes an absolute mess in short order so you need frequent defrags.

    Even on my win98 machine I've just guesstimated the amount of swap I'll ever need and made the swap file a fixed 400mb. No more HD grinding during resizes for me, and my overall performance is better for it. If I ever need more, I'll increase the size (though this is a pain under windows because it requires a reboot).

    On my linux machines I've got swap partitions (approx 2x physical memory), because not only do they not resize and fragment, they eliminate the filesystem overhead associated with paging to swap files. If I'm in a real pinch and need more I can just create swap files and activate them without requiring a reboot. These augment existing swap space, not replace it. When the swap file(s) have outlived their usefulness and I want the space back I can just deactivate and remove them (and no, you don't lose anything that might still be in them, that stuff gets shunted to live memory when you deactivate the swap file).

    A resizing swap file that doesn't fragment, and maintenance downtime just for running swapon? What I want to know is: What are you smoking, and why aren't you sharing?
    ---
    Where can the word be found, where can the word resound? Not here, there is not enough silence.

    --

    "Where shall the word be found, where will the word resound? Not here, there is not enough silence." -T.S. Eliot
  118. Re:can user processes schedule phase transitions? by cdg72 · · Score: 1

    I emailed Danial about this a few weeks ago.
    The simple answer is yes. This will be trivial to implement. And very cool ... despite my out-dated, unused ansi-c background, I'm reading the developer list emails simply because I think it sounds fantastic.
    There is an option for OSF/Digital/Tru64 Unix called AdvFS that has a clonefs command that makes a read only copy of the file system that can be mounted like any other partition. We use it to get a good, coherant snapshot of large database files, if takes a few seconds and then everyone can get working again -- new/changed data uses new blocks and the directories and inodes point to the new blocks for these changes only and no longer have references to the old data (freed when the clone is dismounted and released), while we backup the database (hours even using DLT) ... technically, there is no need to close the database, the snapshot will be good data, but the DB engine will freak out because it's not used to being able to trust a file system.

    T.A.G.* You're IT
    Inspired by Heinlin's 'Stanger in a Strange Land'
    * - Hint: The "T.A." stands for "Thou Art"

    --
    T.A.G.* You're IT
    Inspired by Heinlin's 'Stanger in a Strange Land'
    * - Hint: The "T.A." stands for "Thou Art"
  119. YAFS by broody · · Score: 1

    Tux2 sounds interesting and all but the other horses are so far along, why bother? I suppose if you just can't stand to backup and reinstall; then this might be FM.

    I suppose they are geek creds for the whole "Phase Tree" thing but how long is going to take to catch up?

    That said, it does sound like a cool project and something fun to play with but it will be a long, long time before I stick something so new and groundbreaking on a box I rely on.

    As soon as it get's outta beta, I'm jumping on XFS' bandwagon.

    As the SGI guys say:

    • Sub-second filesystem recovery after crashes or power failures (never wait for long fscks again)
    • 64-bit scalability: millions of terabytes, millions of files, and a million files per directory (no more 2 GB limits)
    • High reliability and performance from journaling and other advanced algorithms

    C'mon this filesystem rocks and SGI is releasing it GPL; game over. SGI made on killing on this thing. I doubt if it wasn't for the expected storage projections and CXFS we would be seeing it but it's not like most of the machines I am messing with need that much storage.

    Today I'm using Reiserfs, tommorow I am using XFS and if this one catches up in performance and reliabillity then maybe Tux2.

    --
    ~~ What's stopping you?
  120. Re:Version control system by ameoba · · Score: 1

    This sounds a lot like the way the Plan9 fileserver (at least the one at the lab) is done. The file storage node has a multi-tiered system. Local Disk Cache, Server Disk Cache ( I think the thing has 100+MB dedicated to that), Server Active Storage (several GB of hdds), then Permanent storage (a WORM jukebox, IIRC). Each tier trades and order of magnitude of access time for an order of magnitude of storage, dumping data to the next tier incrementally on some period.

    How realistic, and practical it is, who knows, but it sure is cool. Say, do you think talking about a Plan9 network is close enough to mentioning a Beowulf?

    --
    my sig's at the bottom of the page.
  121. Re:Is this filesystem immune to the "rhnsd factor" by InsaneGeek · · Score: 1

    Why don't you use a swap file on your Linux box

    dd if=/dev/zero of=/swap1 bs=1024 count=51200
    mkswap /swap1
    swapon -a /swap1

    On others (Solaris, Irix) it's even easier

    mkfile 50m /swap1
    swap -a /swap1

    And you've added an additional 50meg to swap.

    Your Win2k doesn't have a swap parition, it has a swap file, which is exactly what you are doing here. Pretty much every recent flavor of unix can create swap files and add them on the fly; some even have the very nice feature of virtual swap, so you never actually allocate physical disk (which is great for all those apps that reserve it but never use the 2gig of swap they want)

  122. Re:Rebooting without thinking first... by nagora · · Score: 1

    4. Frustrated Linux user looks at reset button and thinks "What does that do again?".

    --
    "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
  123. Re:TYPE & CREATOR CODES by Enigma2175 · · Score: 1

    You seem to be contrasing Windows and Mac ways of typing files. I have no idea whatsoever what this has to do with a Linux filesystem. Linux doesn't care what extension you have on the end of your file, it can be .bat, .blowme or .foo, it doesn't matter. I think people are quite aware of the limitations of the Windows registered extension system, but at least come up with an original idea instead of copying the Mac idea and presenting it as your own.

    Enigma
    .sigless


    Enigma

    --

    Enigma

  124. Re:A bit OT question from non-hacker by matthe1 · · Score: 1

    Afs or andrew file system does this. it has a lot of neat features. one minus you can not do per file acls. you have to use directories.

  125. Re:Version control system by SubtleNuance · · Score: 1

    The problem is how do you define a version? At the operating system level when the OS gets a request to write some physical block, does it count that one request as a new version, a string of requests as a new version?Im not sure how it works, but VMS will present the most recent (highest version) if none is specified ie: somefile.txt;5 vs somefile.txt;1. The symantics work perfectly with Digital's VMS, anyone interested can go see about Files 11 & Spiralog here

    FYI: VMS/VAX blows (IMHO)... I cant wait to pull the plug on my cluster...

  126. journaling not slower by zurab · · Score: 1

    The central point of a journaling file system is that in exhange for a small hit in performance, file integrity is assured by an ingenious mechanism...

    I hope this was not meant to imply that ReiserFS is slower than Ext2fs. Here are the ReiserFS benchmarks showing how it beats Ext2fs in virtually every category tested, a lot of times pretty convincingly. Also, I found Hans Reiser's Future Vision a pretty interesting read too.

  127. Re:can user processes schedule phase transitions? by Abcd1234 · · Score: 1

    This wouldn't be necessary in a fail-safe environment. The idea is that the last file that was being compiled by the kernel wouldn't quite make it to disk, but the rest of the files would be fine. Just deleting that one last file and then restarting make should begin the compile process from where it left off... ideally. :) Even better, if the makefile was designed so that each object file was built in a separate directory and then moved into place after it was properly compiled, you wouldn't have to worry about deleting the dirty file before restarting the compile. Much nicer than having to start right back from the beginning, as in your example.

  128. Re:TYPE & CREATOR CODES by darksmurf · · Score: 1

    root@linux:/usr/share/pixmaps# file xchat.png
    xchap.png: PNG image data, 50 x 50, 8-bit/color RGBA, non-interlaced

    The file command did not look at the extension to get that data, it looked at the file. In Linux the program opening the file needs to be smart enough to know what it's opening, that's not hard.

    The functionality you want of a file being opened by clicking on it in XWindows depends on how "smart" the software you are using to browse the fileystem is in the first place, and I believe most take care of that request in their design.

    The 8.3 curse lives with us still in the minds of the uninitiated ;)

    -Nathan

  129. Re:What is wrong with Reiserfs? by darksmurf · · Score: 1

    Journaling is not the feature of those FSs that makes them faster.

    They do other tricks that ext2 does not to get the speed.

  130. Re:Not Comfortable... by RedWizzard · · Score: 1

    Tux2 doesn't prevent you from running fsck or a similar recovery tool to try to recover pieces of an interrupted write. You just don't need to run it to be guarenteed a consistant filesystem.

  131. Re:*BSD SoftUpdates provide crash resistance NOW by RedWizzard · · Score: 1

    It's been discussed in the kernel mailing list recently. The general consensus is that since Linux will soon have a bunch of "real" journaling filesystems which are more robust soft updates are not necessary, and also it'd be hard to port anyway because the VFS layers are quite different. Here is a summary from KernelTraffic.

  132. Re:Version control system by billcopc · · Score: 1

    I think a fully versioned file system would be overkill, you'd just end up wasting alot of space and would require some sort of garbage-collector to wipe out the antique, useless files. What I've been doing for the past few years is simply to use a version-controlling archiver, but since I'm more of a Windows geek than a Linux geek, I have no idea if they make Jar32 for Linux. I use it to archive all my important projects since it incrementally stores everything with cross-compression, so modifying a 10mb file a dozen times doesn't cost you 120mb, it only stores what's been changed and in a very efficient way, making it perfect for keeping a permanent history of your entire project including all associated media files, not just the code. So if your marketing moron likes the old v1.0 logo better than the latest v3.2 logo, you can still pull it out. It's pretty sweet.

    --
    -Billco, Fnarg.com
  133. Phase tree reminds me of... by boy+case · · Score: 1
    Just the wrong side of off-topic, but this whole phase tree thing, particularly the atomic switching of roots, reminds me of a system I made up to allow recording of TV soap opera episodes for later playback! No not TiVo, it's a manual procedure using two video tapes.

    Scoff at it here.

  134. Re:*BSD SoftUpdates provide crash resistance NOW by Denial+of+Service · · Score: 1
    I deliberately hit the power-bar off switch during four FreeBSD 3.3 kernel SMP compiles fairly late in the process...

    I'm sure what you meant to say was:

    I deliberately hit the UPS off switch during four FreeBSD 3.3 kernel SMP compiles fairly late in the process...

    ---

    --

    ---
    Slashdot: News For Zealots. Stuff That's Hypocritical.
  135. Re:Version control system by Probashi · · Score: 1

    Network Appliance has done it. I believe (from my memory) they keep seven versions of your files. So, you can get back the files you have deleted/modified without going to your tape backup. NetApps demoed this feature and off course the yanking-power-but-not-losing-data feature a while back to my last company (~1996-97). Netware does something that mimicks what you are asking for. It can be set so that when you delete/modify the file, it copies the older/removed version to a hidden directory called deleted.sav. You can use their 'salvage' utility to get them back again.

  136. Nice algo, but.. by nickol · · Score: 1

    But the latest update at the disk, one that will be restored when you 'pull out the cord', will be 2 phases old. A bit more than in journalling systems.

  137. Mod down that website!! (a joke, folks, relax) by vslashg · · Score: 1
    Uh, oh. I sense conspiracy!

    I followed the link, and I got a list of files. And guess what one of them was?

    tux2.first.post.txt

    Aaaaugh! The FPers are taking over!!

  138. Is this the end of 'updatedb'? by cbwsdot · · Score: 1

    Or is that only for journaling filesystems?
    or do I not know what I'm talking about?

    --

  139. Great, by Bender+Unit+22 · · Score: 1

    Great, the filesystem on Linux could need some improvement. Linux are great for many things, but I have seen some nasty crashes with following filesystem errors that could be avoided with the proper disk system.
    I'm looking forward to yet another strong argument for using it as server.
    --------

  140. Re:WOW by The+Troll+Catcher · · Score: 1

    Doesn't MP3.com use reiserfs? They certainly get a lot of traffic...

    I've been using Reiserfs for a few months, and it's great - I run development kernels and live in a place with less-than-reliable power - fscks are hella-fast.

  141. Re:One evil due to the Linux infrastructure. by The+Troll+Catcher · · Score: 1

    You have an interesting number for 10^20 :)

    10^20 is 1048576, not 1024768 (that's a resolution ;)

  142. Re:I am not wrong. You didn't read close enough. by AFCArchvile · · Score: 1
    Didn't you see me mention that my swap file changed sizes? I do know the main difference between the Windows and Linux virtual memory subsystems.

    Someone else before you brought up the fact that you can change the size of the swap partition with a script. No karma points for you. Also, wouldn't multiple swap partitions seriously fragment the memory, as well as causing intermittent instances of downtime just to copy the contents while the partitions are being made? I don't know about you, but I HATE it when a server has downtime just to make some file system maintenance.

    --
    "Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer
  143. Re:One evil due to the Linux infrastructure. by AFCArchvile · · Score: 1
    "Graphics rendering systems would be slowed down many orders of magnitude if this much data had to be swapped per frame.

    Damn right; when I ran those 7 instances of Quake 3, they played back the cinematics at worse than 1 frame per second! That's what happens when a rendering program is forced to use virtual memory! I could've had even more instances of Q3, but I set com_hunkmegs to 128, and the eighth instance couldn't reserve that much.

    --
    "Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer
  144. Re:Whoa, you're right and I have proof! by AFCArchvile · · Score: 1
    "Also, dynamicly creating and removing swap files (or extending and shrinking them) is going to cause your filesystem to become massivly fragmented very quickly..."

    I just checked the fragmentation of the partition with WINNT on it: here's the summary:

    Total fragmentation = 37 %
    File fragmentation = 75 %
    Free space fragmentation = 0 %

    Ouchie. Looks like I'm defragging this weekend.

    --
    "Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer
  145. Re:One evil due to the Linux infrastructure. by AFCArchvile · · Score: 1
    "In general, you can't safely resize partitions at runtime."

    Well, gee, isn't that why Windows uses a swap file? How else could my virtual memory be boosted by 600MB in less than two minutes? It seems that Linux is too bound to its infrastructure to make any changes for the better. Personally, I'd prefer a setup where you define the maximum limit of the swap file by the size of its partition. The OS would then gauge how much it would need, and adjust appropriately. If multiple instances of huge programs execute, the system can adjust accordingly. If someone thought of this need before me, then a similar setup probably already exists; I wouldn't be surprised if Solaris or Irix were like this.

    --
    "Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer
  146. Is this filesystem immune to the "rhnsd factor"? by AFCArchvile · · Score: 1
    Remember the problem in RedHat 7 where rhnsd would chew up all the file descriptors in a process of three weeks? I'm hoping that a fiasco like that never happens again. It's sad to see such a good distro company make such a stupid mistake like that and only have a pathetic excuse for it. I think that the problem with filesystem limits is in how they are always surpassed too quickly. Remember FAT16? Its first limit was 33 MB. With DOS 5.0, it became 2.1 GB. Then came FAT32, which has no absolute limit; however, Windows 2000 refuses to format a partition above 32 GB in FAT32 because of the greater efficiency of NTFS.

    One bone I have to pick with ext2 is how the swap partition cannot be adjusted on the fly. My Win2000 machine can adjust the swap file pretty well (with 7 windows of Quake 3, I forced the swap file size from 400 MB to almost 1 GB [!]). Will Tux2 have a dynamic swap partition? After all, it's in the damndest situations where you realize that you made the swap partition too small.

    --
    "Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer
  147. Volume size limitations? by AFCArchvile · · Score: 1
    What's the size limit on a Tux2 partition? I know that the limit to NTFS5 is 4 exabytes (!) and that FAT32 starts to be less efficient than NTFS at 32GB. I never got a chance to take ext2 to the limit (mainly because it was only a 10GB hard drive).

    Personally, I like the name to this one. It has a clear connection to Linux in some way, while just looking at the ext2 name makes you wonder how the hell it got that name. Anyone care to reveal the nomenclature origins of ext2?

    (I apologize ahead of time if this gets posted twice; the college network is performing like a 28.8 modem today.)

    --
    "Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer
    1. Re:Volume size limitations? by scrytch · · Score: 2

      > I sure hope tux2 is capable of at least 2^31 or 2^32 * blocksize.

      I sure hope you meant filesize. Otherwise you could have a 2 gig drive with a real big superblock and nothing else on it :)

      --
      I've finally had it: until slashdot gets article moderation, I am not coming back.
    2. Re:Volume size limitations? by Skapare · · Score: 2

      I sure hope tux2 is capable of at least 2^31 or 2^32 * blocksize. I don't see why it wouldn't be. OTOH, if it's cleanly enough written, you should be able to redefine a few macros and have a version capable of 2^63 or 2^64 * blocksize, and with larger blocks, too.

      Other limitations on size, such as limiting a single file to 2GB tend to be more a problem with API's trying to conform to standards (you have to be able to address a location in a file via the API to the byte level, including negative values for relative seeking), and using the variable type called off_t, which probably could not have been equated to the type long long (though the new C99 now makes that a standard one).

      From what I recall, ext, which came before ext2, meant "extended", and probably refers to going upwards from the rather limited minix. The filesystem in use by BSD may have at that time still been a licensing issue.

      --
      now we need to go OSS in diesel cars
  148. MIME types... by DagSverre · · Score: 1

    ...is what I miss most in the current filesystems. Currently, I need to set up my filetypes again and again, Apache, gmc, you name it, and I really think there should be a better solution for it. Anyone else feel that deciding filetypes by extension isn't necesarry the optimal solution? What if filetype was included as an attribute for each file in the filesystem itself? I guess changes would need to be done to the actual Linux system and just not the FS, but implementing it in the FS is at least a start. Example: I've got a set of HTML I'm currently editing. I change their type to HTML-edit instead of HTML so that when I double-click I open it in a text editor instead of a browser... I believe it could be used for lots of other stuff too...

  149. Re:Nice sentiments, but... by Not+Fragile · · Score: 1

    I think that this is really aimed at a smallish companies that cannot afford dedicated IT staff, and that is inept enough to be unable to run fsck and also inept enough to "pull the wrong plug" occasionally.

    I am not sure that enterprise customer are really the aim. More the "sub enterprise" ones....

    Nonetheless Whilst I aplaud the growth of such things, I think that there is a huge market out there for OS extensions that do much of the revision/version control work, and intelligent disaster recovery work on behalf of the (dumb) user that pull the wrong plug, or trashes the f/s...

    --
    Not Fragile
  150. Re:*BSD SoftUpdates provide crash resistance NOW by Daniel+Phillips · · Score: 1
    Furthermore, should others report a bug before I suffer data loss, I can revert to plain old boring ext2 by just editing my fstab. Now that is a feature you don't get with other journaling fs (ext3? I'm not sure).

    Going from ext2 to tux2 is easy, as you know. Going back is harder because some of the metadata will have moved to non-traditional locations. Turning it back into ext2 requires a filesystem defrag to puts all the metadata (e.g., inode table) back into its original locations. For Tux2 this is an easy and safe operation, but you may have to wait a while before you see it because getting the basic reliable updating working, tested and benchmarked is more important. You'll see this feature before version 1.0 comes out but not in the first developer versions.
    --

    --
    Have you got your LWN subscription yet?
  151. Re:Patents? This algorithm was published in 1977 by Daniel+Phillips · · Score: 1
    The only reason that the metaroot has to be cloned and updated now is that the allocation table is linked there.

    And because Tux2 aims to give you consistency that extends across files, so you want to be able to represent a filesystem state by just the metaroot. The possibility does exist to save a few blocks per phase by relaxing this and allowing subtree modifications... but... I'd see it more as an experimental kind of thing you could try after the basic atomic update is in place and solid, then you'd see how much you actually save. I'm thinking about these kinds of things too, and nice analysis by the way.
    --

    --
    Have you got your LWN subscription yet?
  152. Re:You still need an fsck program. by Daniel+Phillips · · Score: 1
    Unexpected power-off is NOT the only thing which can happen to a filesystem. What about these disasters?

    1) Bad block takes out part of your disk unexpectedly.
    2) Your OS screws up and spews a mess onto your filesystem before it crashes. (there ARE bugs in the kernel!)
    3) You have a minor headcrash which takes out one of your tracks, but the disk is still functional.

    What're you gonna do? Tux2 isn't gonna help you.

    You could restore your latest dump. You could also attempt to repair the filesystem.

    Yes, sure, in this case you want to run fsck to attempt a repair. Tux2 places a little extra information (one byte) in each inode to help fsck put things back together after such a disaster. After normal loss of power, intentional or otherwise there is no need to do fsck, any more than you would do a surface scan on your hard disk. Paranoia is a reason, I suppose. The point is, it's your choice.
    --

    --
    Have you got your LWN subscription yet?
  153. Failed experiment with ReiserFS by Immorphal · · Score: 1

    I once installed Mandrake 7.1 with ReiserFs. Everything worked fine until I wanted to try and experiment. I started up a few programs in X (using Gnome) and then pulled the plug on the PC. It booted up nicely ... file check went by in a flash ... and I was back in Linux. I tried to start X up again and oops.... no X. Just loads of error messages. Being a Linux newbie there wasn't much I could do but to re-install Mandrake again... this time without ReiserFs ! :-)

  154. Great by Big+Ol'+Troll · · Score: 1

    just thought I'd mention that I was planning on oiling my hinges on my car doors. Just hope that there is no additional accumulation of dirt and that it has a high level of cheviness after I'm done.

  155. Rebooting without thinking first... by Depressive+Cyborg · · Score: 1

    This filesystem might cause this problem:

    1. Linux user knows: reset button is not dangerous
    2. Linux user is helping winblows user doing something
    3. Winblows hangs...
    4. Frustrated Linux user push reset button without even thinking about Ctrl-Alt-Del (!)
    5. scandisk will chew the 30 GB disk until next weekend


    I am a fan of Marvin, Mozart and Nirvana.

  156. Some of M$ Linux Myths may now become false by linuxlad · · Score: 1

    One of the major myths on the M$ webpage is that Linux does not have a journaling file system as stable as Nt. This may help to defeat the M$ empire.

  157. You are wrong. You are ignorant of the issues. by Anonymous Coward · · Score: 2

    Actually, you ARE wrong, on several levels, and for different reasons.

    First of all: your swap file in win2k does not magically change size. In fact, what actually happens is very similar to the mechanism of making a new swap file. When Windows is running out of memory it allocates a large continugous block and adds it to the VM. So it DOES seriously fragment the memory, since you have what really amounts to perhaps a dozen different swap files.

    Second of all, what the above poster described would not require any down time at all. The data from the "old" swap file does not have to be copied into the "new" one. The new one simply has to be created and added to VM. The kernel can certainly handle more than one being active at a time.

    Third of all: this has nothing whatsoever to do with the filesystem, be it FAT, NTFS, or ext2. This is a direct vm->disk interaction.

    Thank you for your time.

  158. Re:Version control system -- CVS/Podfuk by Erich · · Score: 2
    I'm pretty sure that you could fairly easily write some VFS code to get CVS working with the Gnome Midnight Commander VFS stuff (the vfs libraries that Gnome uses), with all the code out there that does things with CVS.

    After you have a CVS GMC VFS library (Go-Go-Gadget-TLAs!) you can use the excellent podfuk to instantly allow you to use the CVS archive as a filesystem!

    --

    -- Erich

    Slashdot reader since 1997

  159. No, magic numbers are the way. by Erich · · Score: 2
    Magic number collisions happen only when people are stupid and don't check the magic database before plunking down their own format -- virtually nobody has this problem.

    To associate a program with a filename (why would you want to do this? It's backwards), you can do it at the filemanager level. And I believe that you're wrong in believing that you want to open up files with the same program that made them... I want to make images with the gimp, but I want to view them with xv or xloadimage or ee.

    What happens on a Mac when your little four-letter-codes have a collision? What happens if two programs have the same app code?

    AFAIK, on the mainstream unix filemanagers you can configure what program opens what kind of file, but there is a default for each file that is supported.

    The magic database is ubercool. Learn it. Love it. Use it. MacOS- and Windows-style file-type resolution sucks. As you said, extention-based types suck. But keeping creator info / 4-letter file codes (the mac way) sucks, too.

    --

    -- Erich

    Slashdot reader since 1997

    1. Re:No, magic numbers are the way. by scrytch · · Score: 2

      > Magic number collisions happen only when people are stupid and don't check the magic database before plunking down their own format -- virtually nobody has this problem.

      "the magic database". A central repository of every last file format known to computing. How charmingly quaint.

      --
      I've finally had it: until slashdot gets article moderation, I am not coming back.
    2. Re:No, magic numbers are the way. by alannon · · Score: 2

      >What happens on a Mac when your little four-letter-codes have a collision? What happens if two programs have the same app code?

      Apple maintains a database of all creator codes. Each creator code is a 32 bit number. You can search their database to see if the one you want is taken. If it's not, you simply request it and it is given to you.

  160. Re:One evil due to the Linux infrastructure. by spacey · · Score: 2

    Linux can use swapfiles, but they're less effecient the a swap partition because the various VM and FS policies that deal with normal user files are very different from the optimal use of a swap file. I.e. you really don't want or need a cache layer between your memory and swap file.

    Also, dynamicly creating and removing swap files (or extending and shrinking them) is going to cause your filesystem to become massivly fragmented very quickly, causing many multiples worse performance then the already horrendously nasty and unreasonable performance of having to page out 600MB, then do something, then page it back in.

    What kind of glutton for punishment are you that you want to do this to yourself?

    Actually, can you give an example of an application that really does require this kind of swap? When you run a large database you try to pin its memory (shm, cache, and table buffers) into active memory so it can't be paged out. Graphics rendering systems would be slowed down many orders of magnitude if this much data had to be swapped per frame.

    The only reason I can think of for the need for 600 MB of swap in most systems is because of an application with a lot of leaking memory. Please let slashdot know if you've got another reason :)

    -Peter

    --
    == Just my opinion(s)
  161. Re:What is wrong with Reiserfs? by Christopher+B.+Brown · · Score: 2
    The reason ReiserFS is not included in 2.4 (yet) is that it wasn't yet quite ready (and yes, there were political disputes over that) when they froze 2.3.x in preparation for 2.4, and New Stuff Doesn't Get Put Into Even Releases.

    That being said, the day that 2.5.0 gets released, there will doubtless be a flurry of activity to get ReiserFS in there, as well as to backport it to 2.4.1 or 2.4.2. If there be further political disputes at that time, there will doubtless be considerable flaming. There have been some pretty dramatic flames surrounding ReiserFS already...

    As for the focus, or lack thereof, resulting from introducing Tux2 as an additional option, I think this is entirely a healthy thing.

    I doubt that all of ext3, XFS, JFS, ReiserFS, and Tux2 will prove "totally successful." On the one hand, if one of them became dominant, that would effectively "shut out" the others. On the other hand, it's not likely that all of them will be considered equal, at the end of the process.

    Reality is that a couple of them are likely to become very popular, and the others are likely to eventually languish unmaintained.

    At first blush, that sounds wasteful. I don't think it is. I think it a very good thing that a bunch of groups are independently trying out some differing approaches to filesystems. This allows any to individually "succeed" or fail without resulting in Disaster For Linux.

    As with Gnome versus KDE versus GnuStep versus Berlin, the different systems can learn both from each others' successes and from each others' mistakes.

    As with many projects, there would not necessarily be benefit to trying to conglomerate these all into One Big Project; that certainly can lead to unworkable bureaucracy.

    I'd rather see five attempts that try radically different approaches to "reliable fast FSes," and see a couple provide tangibly useful results than for them to try to cooperate more than they successfully can, and risk having NO journalling filesystem at all.

    --
    If you're not part of the solution, you're part of the precipitate.
  162. Pulling the plug by ChrisRijk · · Score: 2

    SunWorld recently did an article on Sola ris 's journaling fs. They did a "pull the plug test" too on with and without journaling on. They also found that with journaling on the file-system was quite a bit faster. (I've done some testing myself, and found some things to be about 10x faster...)

    1. Re:Pulling the plug by ceswiedler · · Score: 2

      I believe that the speedup is the result of gathering writes into groups. The more data you can throw on the disk in sequence, the better off you are in terms of speed. This is one of the ways that write-back caches speed up your life. All writes can be arranged so that data is written in a linear fashion to the disk.

      Linux already does this. It's called the "elevator" mechanism. It does exactly what you say, and it has nothing to do with journalling.

  163. Re:can user processes schedule phase transitions? by aheitner · · Score: 2

    This is actualy completely not what you want.

    If you did this, you wouldn't save your .o's if you crashed in the middle. So you'd lose all your work. So an hour into that m18 compile ... and you'd have to do it all over (yuck!)

    What you want is not to write your .o's unless you can safely write all of them (remember, you hit ^C in the middle of a compile, your object files are in good shape, so you can continue later).

  164. Norton Filesave by Pseudonymus+Bosch · · Score: 2

    Norton Utilities 4.5 for DOS had FileSave (I think) that implemented this. You set a limit to the space of files not-really-deleted, and you could say that .TMP, or .BAK shouldn't be stored.

    A TSR intercepted the "delete-file" and "how much space free" calls.

    If the really free space went too low, the programa would really delete files. So it was transparent.

    Something similar was put into MS-DOS 6.0 and in OS/2. If Unix doesn't have it, I call it a shortcome. But Unix was never designed for fallible beings (hence, case-sensitive filenames).
    __

    --
    __
    Men with no respect for life must never be allowed to control the ultimate instruments of death.
    GW Bu
  165. Just at the command prompt by Pseudonymus+Bosch · · Score: 2

    I have seen a similar script somewhere. But it wouldn't work out of the command line.

    What's needed is a program intercepting every call to the "delete file" system call. It has been done on DOS.
    __

    --
    __
    Men with no respect for life must never be allowed to control the ultimate instruments of death.
    GW Bu
  166. Re:TYPE & CREATOR CODES by Pseudonymus+Bosch · · Score: 2

    Perhaps you've been wandering in the Windows world too much,

    Perhaps you've been wandering in the Unix world too much. Can magic-based systems distinguish English plain-text from German plain-text? Somebody could find it useful.

    A possible solution without external metadata would be an in-file header like XML and HTML, but I find it cumbersome.
    __

    --
    __
    Men with no respect for life must never be allowed to control the ultimate instruments of death.
    GW Bu
  167. file and OS/2 by Pseudonymus+Bosch · · Score: 2

    The file command did not look at the extension to get that data, it looked at the file. In Linux the program opening the file needs to be smart enough to know what it's opening, that's not hard.

    OS/2 Rexx scripts must start with a Rexx comment.
    Like:

    /* example.cmd Rexx Script to do something */
    a= 1
    [...]


    In spite of this "magic bytes", file can't distinguish them from C or C++ header and code files (.h, .c and .cpp)

    And file can call "English text" things that are not text and are not English.

    IBM OS/2 does implement (on FAT and HPFS filesystems) Extended Attributes. You have up to 64 kB associated to any file, where you can store attributes (official or your own), for example, URL where I downloaded it from, date of creation, date of last read, date of last update,...

    One of the official attributes is type, you can label a file as "text", "OS/2 command file", "DOS command file". You can even assign your own type.

    Some OS/2 programs (not those ported from Unix) can use them to ignore extensions.

    OS/2's Workplace Shell works both with extensions and file types. You can assign .jpg, jpeg, image/jpeg and JPEG file to the same or different program objects. You can even associate an extension to several programs. There can be conflicts as to which gets open by default, but the icon tells you which it will be.

    It is not perfect, because many programs ignore extended attributes. But I think it's a good idea.
    BeOS did it better because it has no limit to the size of the attributes.
    __

    --
    __
    Men with no respect for life must never be allowed to control the ultimate instruments of death.
    GW Bu
  168. Re:A bit OT question from non-hacker by Guy+Harris · · Score: 2
    Setting the permissions on a per directory basis. So that if I put a file in my www_docs, it'll be 644, if I put it in a directory where several people help editing web pages, then it gets 664, my personal stuff is 600, and so on.

    Some file systems that support access control lists (NTFS and the Solaris version of the BSD file system, I think) give directories a second access control list which is the ACL to give to files created in that directory. (I seem to remember that Multics had this - I think it originally had "common ACLs" for directories, which were combined with the ACLs of files in the directories to give the ACL that gives permissions for access to those files, and that those were replaced with an "initial ACL" of that sort.)

  169. You can get crash-proofness from ext3 now by Bruce+Perens · · Score: 2
    I've been running my laptop on ext3 for a month or so. It is mount-compatible with ext2 and provides journaling. Just add the patch to the kernel, create the journal file, and mount your old ext2 filesystem as ext3. Debian's current release is ext3-ready and properly runs the journal and refrains from running fsck after a crash. It does have lower performance than what is proposed here, especially in this early version, as it writes a lot of things twice. However, my laptop is not write-bound and the writes happen in the background, so I don't notice how much faster or slower they are, anyway.

    Phase tree filesystems sound like a better way to do this, but you don't have to wait. Get crash-proof today.

  170. Re:There are serious problems with this idea by spitzak · · Score: 2
    EXACTLY! The Unix decision is arbitrary. The only non-arbitrary decision is NO ATTRIBUTES. Certainly adding a few more arbitrary attributes is just going to make it worse!

    I would like to see a system where the file permissions, the file name, the date, everything is stored in the data in the file. In your attempt to disagree with me I think you reinforced my position.

    I think there is some work being done on this. Permissions are controlled by the parent directories as well as the file, since you are allowed to set the permission and user of your file to anything you want with this.

  171. There are serious problems with this idea by spitzak · · Score: 2
    The basic problem is that information vital to the use of the file is not stored in the data that you get with read()/write(). This makes it impossible to cleanly store this data on another system or to transfer it. Yes, you can "encode" (or "binhex") it, but if you do that, why not just store the encoded version on the disk, and remove a large and complex mess from the OS?

    In fact there is absolutly no reason for this information to be stored in any way that the OS sees. The data is only used by user-level programs (for instance a file browser that selects what program to launch).

    Another problem is that the id space gets used up quickly and then only commercial software vendors who talk to the official Linux ID assignment consortium can make new IDs. With magic bytes in the file, if there is a collision, you just make a more complex test for distinguishing files that looks at more bytes.

    The biggest problem is that there is zero chance that once you add this database feature that there will not be dozens or even thousands of new id/value pairs added to the system, and dozens of standards for encoding these so that files can be copied. I would much rather force everybody to use simple files and thus get all this mess into user space.

    Personally I feel that everything about the file, even it's name, could be stored in the data somehow, though I'm not sure how. Some of the ReiserFS stuff is looking at this, I think, since the Unix overhead of name/permission/date is larger than most of the files they want to use.

    1. Re:There are serious problems with this idea by spitzak · · Score: 2
      The data would be completely visible to read(). Like you say, if this were not true you would lose the whole point of it. Yes this would require some apps to be rewritten to skip over it.

      In most cases I expect the format to be flexible enough that the data can be hidden in a comment in existing file formats. A good example is the "#!" syntax used by executable scripts in Unix.

    2. Re:There are serious problems with this idea by toh · · Score: 2

      Actually, permissions are necessary for insecurity - if you didn't have them, people would just be limited to working with their own files and never be aware of any others. I've used systems that worked this way (some VMS installations, the ancient THEOS). Permissions, as the name implies, allow you to relax security below that unsharing level and give other people a peek.

      But your main point is entirely correct and well-made. See some earlier spiel by me for one suggested solution I've been mulling over, an extensible tagged metadata format within the filesystem (actually HFS+ could be theoretically capable of this, but no one has ever used the third file fork AFAIK).

      --
      -- Life is short. Forgive quickly. Kiss slowly. ~ Robert Doisneau
  172. Re:Patents? This algorithm was published in 1977 by K-Man · · Score: 2

    Having multiple transactions active does not reduce the recursion. Recursive cloning of blocks is reduced by re-use, that is, by having large enough transactions that the same block may be updated many times, and not require branching at each update. Since having large transactions requires large flushes, the phasing is helpful, but not directly.

    Recursion could also be reduced by maintaining allocation information near the data in the inode/file tree. Since a transaction only involves cloning blocks and saving the resultant allocation changes, it would be possible to localize an entire transaction at a node fairly far down the tree. The only reason that the metaroot has to be cloned and updated now is that the allocation table is linked there.

    i.e.

    The current tree structure:

    metaroot
    allocation table
    subtable 1
    subtable 2*
    ...
    inodes
    inodes 2
    inodes 3
    index block
    ...data
    index 2
    ...data2*

    Changing data2 above would require changing something in, say, subtable 2 above. Since changing = moving, we recurse up (from the nodes marked with "*" above), generating cloned blocks with new pointers, until we identify a common parent node which can be updated in *one* atomic operation. In the above picture, that one block is the metaroot.

    If instead we have something like this:

    metaroot
    inode1
    allocation1*
    index1
    index2
    data1*
    inode2
    ...etc.

    The "*" nodes have a common parent at the inode1 block. We clone data1 into a new block listed in allocation1, and clone parent nodes until we find ourselves at inode1. At this point, we can flush the cloned blocks, and then overwrite inode1 with pointers to the new subtrees in one atomic operation.

    The idea in doing this is that free space tracked at each node would be close to the data at the node, so that data locality would be maintained, or at least helped. I'm not sure how well space could be managed in such a framework, however.

    --
    ---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
  173. Re:Sounds great, but BFS... by pergamon · · Score: 2

    I don't remember all the specifics, but from what I do remember, the filesystem was really more of an DBMS rather than just a bunch of named inodes. The BeBoxes used to have a way, in firmware, to rebuild the indices of the database which would get corrupted every once in a while (this was actually a very quick operation).

    This was all rewritten, I believe, before any public (non DR) releases got to the general public...

    Sorry I can't remember much off the top of my head, but it's been a while since I wrote an app for a Be release before the FS rewrite.

    Here's some more info:

    http://www.68k.org/mirror/BeBook_DR7/StorageKit/ index.html

  174. Re:Sounds great, but BFS... by pergamon · · Score: 2

    oops. perhaps i should have actually read through that old bebook before posting.

    anyway, this page has a short description of the API side of it at least:

    http://www.68k.org/mirror/BeBook_DR7/StorageKit/ intro.html

    in any case, the current BeFS still has some DBesque traits, in that anything in the FS can have attributes, and the filesystem supports queries for those attributes. check the new stuff out here:
    http://www-classic.be.com/documentation/be_book/ The%20Storage%20Kit/index.html

  175. WOW by IRNI · · Score: 2

    What more can you say? I personally think this will give journaling a run for its money. I use Reiser(advertisment)FS on my server at work because of a few uncontrolable things that happen. Server room isn't locked.... nothing i can do about it. So after all the "oops... i turned off the wrong machine, Larry"'s I get from people, turning on the machine and having it back up in a few seconds is crucial. I haven't noticed a performance hit but we aren't a big company and our servers are not taxed too terribly much. But thats no reason to not do better. This Filesystem looks like it offers stability and performance. I am glad to see it happening. :) Thanks.

    1. Re:WOW by scrytch · · Score: 2

      > You entrust your mission critical data to ReiserFS?

      MP3.com effectively entrusts its entire business to it. And backups of course. Once you can verify a backup, you should be able to restore at least the data (if not fs-specific metadata) to another filesystem in case it goes tits-up somehow. And the fact that it's open source means that while you personally might not have the skill to fix any such bugs (I sure don't), you can at least try to hire someone outside the original manufacturer who can.

      I would trust my data to ReiserFS more than any new version of NTFS or FAT.

      --
      I've finally had it: until slashdot gets article moderation, I am not coming back.
    2. Re:WOW by jovlinger · · Score: 2

      Really. You entrust your mission critical data to ReiserFS? I must admit I didn't think it was mature enough for that.

      How are the version updates to the fs code? Do you just recompile the kernel, or do you have to buy a new disk and copy the data from one format to the other?

      I'm constantly stepping on the powerstrip and flicking main cutoff switch (don't ask -- small appartment) and killing my fileserver. Fsck of 25 gigs is no fun. It's getting to the point where I'm about to repartition and use apmd just to avoid rechecking all of it.

      yet to have ext2 lose me any info, tho.

  176. Re:Not Comfortable... by John+Whitley · · Score: 2
    Stable filesystem or not, I'd still be running a filesystem check.
    This is not necessary in a properly designed filesystem. The only reason for fsck is that the f/s algorithms permit the filesystem image on disk to be in an invalid state. The goal of more modern f/s design is to ensure that before, during, and after all operations, the filesystem is in a valid state. I.e. no operation ever, even temporarily, corrupts the filesystem.

    E.g. journalling filesystems such as XFS perform the same constant-time check routine every time they start, to inspect and clean up the journal (and commit transactions, if necessary) in case we crashed last time. The journal may not be complete, and some modest amount of data may have been lost, but the filesystem is not corrupt.

  177. Re:TYPE & CREATOR CODES by AArthur · · Score: 2

    Magic Numbers are a good idea, but they are far from perfect. Like it or not, every once and a while, the magic number will be the same for multiple type of the files. Or a single type of file will have many different magic numbers (as they start differently).

    Another problem with magic numbers, is app's can't own a certain type of file. For example, if you created a file with GIFconverter, you would probably want that app to open it, and not another one (that might be selected as the default for that file system).

    That said, Magic Numbers do work fairly well, but they aren't a be all end all solution to file to app matching.

  178. Re:this is not going to make microsoft happy... by GypC · · Score: 2

    OK, it is technically a journalling filesystem, but when an NTFS partition corrupts so badly that a journal replay won't fix it and you have no option but to reinstall... I would hardly call that commercial quality. Or even a real journalling filesystem.

    "Free your mind and your ass will follow"

  179. Group commits by Cato · · Score: 2

    DBMSs such as Oracle also gather multiple transactions (each a set of writes) and do a 'group commit' - the transactions are committed atomically by writing to a journal.

    The point is that you can group writes based on transactions and performance, or based purely on performance - Oracle and some journalling filesystems do the former, Linux and others do the latter.

    In both cases you end up with a long sequential write to the journal file - certainly Oracle claimed a big speedup in transactions per second when this was introduced.

  180. Re:TYPE & CREATOR CODES by Pig+Hogger · · Score: 2
    Why don't you say that you are essentially describing the filesystem of a Macintrash???

    --
    Americans are bred for stupidity.

  181. Re:Version control system by Lord+of+the+Files · · Score: 2

    Many commercial version control systems have this feature available - particularly older ones designed for UNIX. They will pretend to be an nfs export, and you can mount that and work on it.
    I think in general the conclusion was that it's not worth the trouble. It's not much harder for a user to run checkout.
    I don't think it's ever been done to the extent you describe (although as other people have mentioned VMS apparently stored old versions of files) because while reading could be done reasonably quickly, wirting is slow (you have to run diff every time), a nuciense to implement, and uses a lot of space. Imagine the added cost of version control on (say) your mail spool. Or if you overwrite a large file.

    --

    God does not play dice - Einstein

    Not only does God play dice, he sometimes throws them where they

  182. Wouldn't it be nice.... by ge · · Score: 2

    if you could keep an old tree around in tux2 so you could get a consistent backup of your system without having to shut it down? I'm sure this will complicate freelist management, though, you can only free a block if all references to it are gone.

  183. Re:Version control system by joshv · · Score: 2

    The problem is how do you define a version? At the operating system level when the OS gets a request to write some physical block, does it count that one request as a new version, a string of requests as a new version? It really cannot know about 'versions' at the application level without some changes to the APIs, and that won't happen.

    What you could do is create manual 'check points' or snapshots. By default all disk writes go to an 'undo' log. The 'real' data is elsewhere on disk. When a file is requested the OS first looks for writes to that sector in the undo log and returns that data if present.

    At any point you could blow away the undo log and go back to the previous state. You could also 'commit' the undo log, writing it to the 'real' data area of the disk and starting with a fresh undo log.

    One could also imagine keeping more than one undo log. But this might get space prohibitive.

    AFAIK, vmware supports something like this with its virtual disks, you can chose to rollback all disk transactions when you shut down the virtual machine.

    -josh

  184. Re:Version control system by EricWright · · Score: 2

    It's been since the summer of 1993 since I used a VMS machine, but you've pretty much got it right. Each archived version was a complete copy (as opposed to some sort of smart diff file). purge was your friend, especially if you were working on a class project that had to be compiled over and over to kill all bugs!

    Eric

  185. Re:*BSD SoftUpdates provide crash resistance NOW by Straker+Skunk · · Score: 2

    He meant to say that the timestamp gets written into the kernel, thus making MD5 not feasible for comparing the earlier-built kernel to the new one.

    --
    iSKUNK!
  186. Re:That command would be 'purge' I believe... by DGolden · · Score: 2

    It's been said before but anyway:

    How about making "gb" (garbage) or something a short script that moves files to a trashcan location, say /var/trash, cloning the directory hierarchy as it goes (i.e. /var/trash/usr/ etc.)? If you want to be smart, make it deal with different types of files appropriately, and make sure that's it's got correct permission control. Then just train yourself to type "gb" instead of "rm", unless you really mean it. Schedule a cron job to delete the contents periodically, and/or make a script "emptytrash" that rm -rf /var/trash/*

    It would be fairly trivial to extend this to /var/trash.n/ where n is the n-th previous "gb", so that each time you delete, it gets versioned out. The interaction could go, say:

    $ echo "Jeans" >/home/david/pants
    $ gb /home/david/pants
    pants trashed.
    $ more /var/trash/home/david/pants
    Jeans
    $ echo "Baggy" >/home/david/pants
    $ gb /home/david/pants
    pants trashed.
    $more /var/trash/home/david/pants
    Baggy
    $more /var/trash.1/home/david/pants
    Jeans

    Have a ugb command to reverse the delete.
    Yes, the idea needs work, and sounds space wasting, but HD space is cheap these days.

    --
    Choice of masters is not freedom.
  187. Re:I am not wrong. You didn't read close enough. by mindstrm · · Score: 2

    1) When windows changes the swapfile size, this causes fragmentation.

    2) If you know the difference between windows & linux virtual memory, why do you think that Tux2 has *anything whatsoever* to do with swap partitions? It's a filesystem... not a partition.

    3) One of the first 'recommendations' for tuning Windows servers is to lock the size of the swap file so it *doesn't* resize, as resizing causes fragmentation and hence, slowdown.

    4) Adding additional swap files to linux has no worse an effect than enlarging the swap file on windows.

    5) I said nothing about *partitions* I said swap *FILE*. Linux can do BOTH.

    Downtime? Creating a new swap file takes seconds, and causes *NO* downtime whatsoever; activating it is virtually instant, with a single command.

  188. You are confusing two issues. by mindstrm · · Score: 2

    You are confusing two issues.

    1) Windows does not use a swap 'partition', it uses a swap 'file'. Linux can use either. And if you use a swapfile, you cannot necessarily resize it on the fly, but you can make another one and add it in.. effectively the exact same thing.

    2) How linux deals with swap (be it file or partition) has *nothing* to do with Tux2, Ext2fs, NTFS, or any other filesystem. They are not related in any way whatsoever.

    3) if you find your swap partition is too small, simply make a swap file and mount it, on the fly, to add additional swap space.

  189. Re:Sounds great, but BFS... by EvlG · · Score: 2

    I'm curious about your comment that BFS is "no longer a true database."

    What did you mean by that? How was it a database? Inquiring minds want to know :)

  190. Re:Version control system by hanway · · Score: 2

    Rational ClearCase installs a file system with extensive versioning properties, but it would be more analagous to NFS than any local disk FS, since transactions are handled by a separate database server. Yes, it's a good idea, even though it's slow (and expensive.)

  191. Re:Not Comfortable... by rcw-work · · Score: 2
    You will still be able to do that, but as for the people running servers with disk arrays that are a significant fraction of a terabyte, well, they need the choice of being able to bring the server up now rather than waiting several hours (or days) for an fsck to finish.

    For an array where you've promised 99.99% uptime (an hour a year), you simply can't check it like that. You wait until you can upgrade the array to new hardware that you can start with a fresh filesystem on.

    For the less extreme circumstances, it's still nice to be able to plan downtime for this. That way you can schedule it to automatically happen Thanksgiving day instead of when someone trips over the power cord.

    And yes, you are correct that having filesystem integrity does not necessarily mean you also have file integrity. You can't do much about that unless you go the VMS route of keeping versions of files around.

  192. off_t by SpinyNorman · · Score: 2

    The old fseek(), ftell() which used a long to represent the file offset are being replaced by fseeko() and ftello(), which use off_t, specifically as part of phasing in Large File Support (LFS). This API is available on Sun (others too, I'm sure) as well as Linux.

    In order to enable fseeko/tello, you should compile your code with -D_LARGE_FILESOURCE, which will give you a default (currently 32bit) off_t. If you add -D_FILE_OFFSET_BITS=64, then off_t will be 64 bit, and fseeko/tello will be redefined as their 64 bit cousins. These definitions are part of the LFS standard.

    glibc6 already has 64bit support, but of course you also need a new kernel (2.4) to get the >2GB support. AFAIK there's no 2.2.x backport.

    BTW Mandrake 7.1 has a buggy stdio.h that doesn't support _LARGE_FILESOURCE (I believe it's been fixed in a more recent version). You can use -D__USE_UNIX98 instead to enable fseeko/tello support.

  193. Re:Version control system by sprag · · Score: 2
    The filesystem is called Files-11. The format on the disk is ODS-2 or ODS-5 (depending on the version).

    It is a nice feature, and it is only as space intensive as you want it to be. By default, there is no limit to the number of backups, but using set file/version_limit=2 foo.bar you can limit it to (automatically) 2 versions on the disk. The count is always incremented...so you can have foo.bar;32 and foo.bar;33 and when a change is made, foo.bar;34 is created and foo.bar;32 is erased.

    There's been times where this would be nice on unix. Didn't RMS put VMS-style versioning on the list reasons why a new OS was needed when the HURD first appeared?

  194. Re:TYPE & CREATOR CODES by johnrpenner · · Score: 2

    > Sounds nice, but what if, after creating the file,
    > I don't want to open it up with whatever application
    > created it to start with?

    its easy to change the file extension by renaming the file. likewise, its easy to change the file TYPE & CREATOR with a command-line or gui 'Get File Info' type of command.

    > Try renaming, for instance, a JPEG file to have a .txt
    > extension -- and xv handles it fine.
    >
    > Why?
    >
    > Because the first few bytes in the file conform to what is
    > expected of a JPEG. Open one up -- and there's a header
    > inside. It really DOES NOT CARE about the extension.
    >
    > And, this is much saner than altering the filesystem...

    this just formalises the process and makes it more reliable than depending on the not always reliable scanningn of the first few bytes of a file. its faster to read the type & creator off the directory than to scan the first few bytes of the file itself - you don't have to open it for a read then.

    this technique has been used successufully for over ten years in the mac's HFS and HFS+ filing systems - so its realiability (of this one technique - not the whole OS itself!!) has been proven to be effective in elimng the need for a registry.

    regards,
    john.

  195. Patents? This guy works in Berlin. by Simon+Brooke · · Score: 2

    Uhhmmm.... why are we worried about patents, here? The guy works in Berlin. No European country currently allows patents on software (although we've a fight on our hands to keep things that way). Oh, you mean you lot in the benighted US of A won't be able to (legally) use it? Write to your representative and get the law changed. Software should not be patentable, and you (and, I think, the Japanese) are about the only places where it is.

    --
    I'm old enough to remember when discussions on Slashdot were well informed.
  196. That command would be 'purge' I believe... by devphil · · Score: 2

    I got my introduction to the Internet -- email, Usenet, FTP, you name it -- on a VAX running VMS. I dearly missed the versioning filesystem when I moved to Unix, especially when I discovered that when you type 'rm' you had better by damn mean it.

    Then I discovered that Unix actively encourages exploration and experimentation, whereas VMS seems to place many obstacles in the way. I never looked back. :-)

    --
    You cannot apply a technological solution to a sociological problem. (Edwards' Law)
  197. Re:Version control system by jovlinger · · Score: 2

    Has anyone produced a file system that was essentially a version control system

    This is what net app has, I think.

    Here at CCS, I believe our NFS needs are served by a net app server. Whatever it is serving them, it does automatic snapshots every hour, so that at any point you can access the .snapshot directory and find the state of your files one, two, three, four hours ago, and one, two, or three days ago, or one week ago.

    So not quite a version system, but mighty cool.
    And really useful. It lets you dive in with exploratory programming (as long as you for the top-of-the-hour so the stable code is snapshotted), cause restoring your files is as easy as cp.

    I agree it would be cooler if you could request a snapshot of this tree now please, but systems informs me that this is not possible/too much hassle.

  198. Re:*BSD SoftUpdates provide crash resistance NOW by jovlinger · · Score: 2

    I thought it was very cool how Tux2 is implemented as a mount-time upgrade over ext2. This really increases my confidence in the system, as presumably it reuses all the low-level integrity routines.

    Furthermore, should others report a bug before I suffer data loss, I can revert to plain old boring ext2 by just editing my fstab. Now that is a feature you don't get with other journaling fs (ext3? I'm not sure).

  199. Re:Version control system by jovlinger · · Score: 2


    I had completely forgotten about that.
    Reminds me of the std "blue sky" storage solutions where there is no distinction between cache/volative/persistent storage; each level is merely a faster cache for the next level down.

    Has anyone made this work, on a performance basis, or is it inherently blue sky.

    BTW, go ahead and mod up this parent (post #57).

  200. Re:Version control system by jovlinger · · Score: 2

    You don't have to guarantee that you can rollback a file; just being smart about not overwriting previous versions until you have to (and then in a LIFO/LRU order) should be a neato 90% solution.

  201. Re:*BSD SoftUpdates provide crash resistance NOW by jovlinger · · Score: 2

    Erm. I take that back. Further reading hints that while the modification is performed at mount time, it is apparently performed once only and modifies the underlying disk. It is unclear whether this is reversible.

  202. this is not going to make microsoft happy... by romco · · Score: 2
    "Phillips says that Tux2 offers Linux users the chief advantage of a journaling filesystem (namely, keeping files safe in the event of a system crash) but without a journal, and does so more efficiently."

    From the Microsoft website:

    "Linux lacks a commercial quality Journaling File System. This means that in the event of a system failure (such as a power outage) data loss or corruption is possible. In any event, the system must check the integrity of the file system during system restart, a process that will likely consume an extended amount of time, especially on large volumes and may require manual intervention to reconstruct the file system."



    I wonder what Microsoft will say if Tux2 takes off?
    --
    AdFuel
    1. Re:this is not going to make microsoft happy... by GypC · · Score: 3

      That's a pretty funny quote, especially since NTFS is not a journalling filesystem.

      What a pack of liars. I don't see how those Marketing guys can look at themselves in the mirror.

      "Free your mind and your ass will follow"

  203. *slower* than ext2? by be-fan · · Score: 2

    The next generation, be-all, end-all filesystem for Linux is...SLOWER? than Ext2? Are you kidding me? I don't really understand what problem he is trying to solve. Current journaling filesystems are a good deal FASTER than ext2. The one I have most experiance with is BeOS. On Bonnie (one of the only disk tests available on BeOS) BFS beats ext2 by nearly 20% with about 40% less processor usage (If you want to see the actual results, I'll post 'em if you want) on the read tests, and by a smaller margin on the write tests. It has problems on the read-back tests, but that is due to BeOS's VM issues. As for ReiserFS, I've used that too and that is also faster than Ext2. So if journaling FSs are faster than the current ones, why bother making one that is slower?

    --
    A deep unwavering belief is a sure sign you're missing something...
  204. Re:Version control system by norton_I · · Score: 2

    Tux2 will eventually support snapshots as well. Basically, the way it works is that you have a "snapshot" command that clones the metadata tree. Then, like the name implies, you can access the filesystem as it was the moment the snapshot was made.

    First, however, we are going to do the data integrity.

    VMWare works at the block device level for its rollback system--all dirty blocks on a device are stored in memory while VMWare is running, then at shutdown you can either discard them or flush them. While it allows you to bail out of a horked session, it makes no guarantees about data integrity while the block device is actually being written to.

  205. wheel reinvention by toh · · Score: 2

    My reading of the tutorial really does make this sound a lot like running softupdates on UFS/FFS - same concepts, same benefits and guarantees, somewhat different methodology and wholly different implementation. It looks like one should be able to do something equivalent to snapshots to reclaim lost junk while other stuff is going on, that being one of the cool things you get for free with softupdates's guarantees. One thing UFS+softupdates offers that Tux2 apparently can't is compatibility with pre-softupdates filesystem code, mostly because 4.4BSD UFS was more cleanly designed in the first place. That means a UFS filesystem flagged for softupdates will still work with any preexisting UFS implementation, while a Tux2-converted ext2fs is basically useless to an old Linux box.

    At the risk of waking the Linux and BSD zealots and trolls, why do this at all? The view from above has always made ext2 look very much like a middling attempt to reproduce what UFS already did very well, while at the same creating a lot of installed ELFish Unix boxes that are annoyingly incompatible with it. I should be able to boot Linux, FreeBSD, or even Solaris from the *same* filesystem, just as I used to boot different Suns from the same disk.

    Why not just finish porting a reliable UFS implementation, incorporate the (now firmly BSD-licenced) softupdates code into it, and make that the default filesystem for new Linux installations? You can still have XFS or some other journalling system where it makes sense, but let ext2 die in peace. The only dubious benefit it ever offered over UFS was the dangerous performance increase from willy-nilly asynchronous writes, and now that's not an issue (both softupdates and phase tree can be nearly as fast while also being safe).

    You might be able to teach an old dog some new tricks, but it's just sick to try and do the same with a dead one. ;)

    --
    -- Life is short. Forgive quickly. Kiss slowly. ~ Robert Doisneau
  206. Re:Wrong: Desktop Database by toh · · Score: 2

    The Desktop database really has nothing to do with HFS[+]. It's maintained and used by the Finder, which is merely an application. HFS and HFS+ just provide slots in the directory entry to store the type and creator attributes, which the Finder (and a lot of other Mac apps) then makes use of. This is why the command-option signal you mention takes effect when the Finder loads (or sees a new volume mounted), rather than at boot time.

    The Mac's four-letter types and creators work very well, but they're unnecessarily cryptic by current standards of disk space, memory, and CPU register width. The BeFS uses MIME types for filetypes, but doesn't record creators in the filesystem AFAIK (it uses an external database like the Windows registry - correct me if I'm wrong!). What I'd really like to see in some upcoming filesystem is a more flexible scheme that can store an arbitrary number of tags, so you could flexibly encapsulate whatever metadata came with the file and is relevant to the OS - including all of the filesystems discussed above (none of which can completely describe a file from the others).

    --
    -- Life is short. Forgive quickly. Kiss slowly. ~ Robert Doisneau
  207. Re:Version control system by toh · · Score: 2

    First, however, we are going to do the data integrity.

    Uh, don't you have to do the data integrity first? If you didn't have a guarantee that the only things that could be wrong were unallocated blocks and inodes, working from a metadata snapshot could be painful (or just a waste of time).

    --
    -- Life is short. Forgive quickly. Kiss slowly. ~ Robert Doisneau
  208. Re:Is this filesystem immune to the "rhnsd factor" by blakestah · · Score: 2

    Hmm...

    Labelled a troll for pointing out accurate areas in which RH and Mandrake distributions look like they were rolled by rookies.

  209. Re:Version control system by jmv · · Score: 2

    I Just thought that what I would find much more useful is a filesystem that simply knows how to work with other version control systems, like CVS. You could tell the filesystem that for a certain number of file in your CVS, it should ask cvs to commit every time the file is changed. That has two advantages:

    1) You select which files are version-controlled. Most of the files on a fs shouldn't be.

    2) You history is compatible with other version control systems. (and can be remote, ...)

  210. Re:Version control system by jmv · · Score: 2

    Is this a good idea?

    IMHO, it's a very bad idea. I don't know about speed, the problem is about capacity loss. Imagine, it basically means that you cannot delete files from the drive. Simple operation: download source code, untar, compile and install, delete source code. That (now useless) source code wil live on your drive forever. Also, when your disk is full, it's full, and it'll stay full until you guy a larger one. Anyway, you get the idea.

    You could have a RC filesystem that has a "real delete" option, but then why not just use CVS, as for most of the files, you don't want revision control.

  211. (OT)Directory permissions by yerricde · · Score: 2

    So that if I put a file in my www_docs, it'll be 644, if I put it in a directory where several people help editing web pages, then it gets 664, my personal stuff is 600, and so on.

    It's possible on any filesystem that supports POSIX permissions (not FAT32). All you have to do is write a shell script to do chmod -R on the directories in question.

    --
    Will I retire or break 10K?
  212. e2fsck good by dizee · · Score: 2

    Actually, the time it takes for e2fsck to complete if a box goes down gives me time to check slashdot. ;)

    Mike

    "I would kill everyone in this room for a drop of sweet beer."

  213. A bit OT question from non-hacker by KjetilK · · Score: 2

    Talking about filesystems, I'm wondering about something that would seem like a useful feature: Setting the permissions on a per directory basis. So that if I put a file in my www_docs, it'll be 644, if I put it in a directory where several people help editing web pages, then it gets 664, my personal stuff is 600, and so on. It's not possible where I sit now, I've bothered every sysadmin I've met about it, is it possible on some file systems in widespread use? Is it something for future file systems, like this one? Or is it simply a Bad Idea [tm]?

    --
    Employee of Inrupt, Project Release Manager and Community Manager for Solid
  214. LOL by 2nd+Post! · · Score: 2

    Microsoft would probably say, if Tux2 takes off...

    <em>Linux lacks a commercial quality Journaling File System.</em>

    The nick is a joke! Really!

  215. Nice sentiments, but... by Not+Fragile · · Score: 2

    Having grown up with many different OS's and their file-system issues - AS/400, ICL CSM, Unix, DOS, Windows etc. I have never once had a screwed file system that I have not been able to recover from quickly using the tools that the OS provides.

    One demonstration that we used to do on a regular basis to show the power of our crash recovery in a Progress application was to pull the plug on a Xenix machine, mid transaction ! In hundreds of demos, the worst issue we had was a power-supply that started to make "odd" noises.

    Now if you backup your system whenever you make changes, and you distribute your file systems over multiple platters, and ensure that the crash recovery processes are in place, you will be fine.

    I welcome crash recovery tools, and even file-systems that do not shit theirselves if you "pull the wrong plug", but simple things like labels that say "Do not pull this plug", and UPS devices, even battery backed cache's on disk controllers, veritas file systems, RAID 10 mirrors etc all help, and negate the need to develop this kind of stuff.

    FWIW my Linux boxes have never screwed their filesystems, they have many of the above precautions implimented, but even then, there are no issues.

    Now if you want to invent a new filesystem, look at change control, look at saving OS files that have changed and easy go-backs, look at mirrors. Oh most of that can be done already.....

    ./nf

    --
    Not Fragile
  216. Not Comfortable... by Accipiter · · Score: 3
    Maybe it's because I grew up on unstable filesystems, but....

    The Tux2 filesystem project has the following goals:

    [SNIP]

    Eliminate the need to perform fsck after an interruption
    [SNIP]


    If I was saving a file, and my computer decided to take a shit and die on me, I'd want to run an integrety check on the file system whether it's stable or not. If not for anything, but my own sanity. I mean, you were in the middle of saving a file. If that was a large file, and the computer died.....well, logically, the saved data should be recoverable. However, experience says that the file would most likely be corrupted.

    Stable filesystem or not, I'd still be running a filesystem check. (When Windows 95 died on me, I ran scandisk as soon as it was finished booting - even before OSR2. Just to be SURE everything was cool.)

    -- Give him Head? Be a Beacon?

    --

    -- Give him Head? Be a Beacon?
    (If you can't figure out how to E-Mail me, Don't. :P)

    1. Re:Not Comfortable... by mindstrm · · Score: 3

      But that's the whole point of how this new stuff works: You don't run an fsck. There is absolutely no point in doing so.

      All the complex mechanisms behind the filesystem ensure that, if the FS thinks a file is there, then it *IS* there, period. If the power was yanked halfway through writing a file, it simply won't be there.

      In the case of a Journalling system, this works because, instead of a fsck, you simply look at the journal. If there is stuff there, you know what hasn't been written (and now can't be, cause you crashed) and you can make the appropriate adjustments.
      In the case of phase tree, it's even simpler to check: it appears to work something like... the new trees are written backwards, root last.. so if the root is htere, the write is complete. If it's not, you don't see it anyway!

  217. You still need an fsck program. by PeterM+from+Berkeley · · Score: 3

    Unexpected power-off is NOT the only thing which can happen to a filesystem. What about these disasters?

    1) Bad block takes out part of your disk unexpectedly.
    2) Your OS screws up and spews a mess onto your filesystem before it crashes. (there ARE bugs in the kernel!)
    3) You have a minor headcrash which takes out one of your tracks, but the disk is still functional.

    What're you gonna do? Tux2 isn't gonna help you.

    You could restore your latest dump. You could
    also attempt to repair the filesystem.

    You need fsck or some other means of filesystem repair.

  218. TYPE & CREATOR CODES by johnrpenner · · Score: 3



    TYPE & CREATOR CODES

    i really hope they use this excellent opportunity to
    be able to get rid of REGISTRY TYPE TRACKING once and
    for all.

    basically all those little three letter extensions
    that are used to keep track of the file type like
    .txt .tif .jpg etc. are a cludge.

    if you simply make one extra entry in the file directory
    system (in addition to filename, date, block pointers) itself:
    TYPE & CREATOR -- then you will never again need to keep track
    of file types externally by a sort of 'Registry' file.

    so, if you have a text file, you don't need to put .tif
    on the end of it, simply, you would have the type and
    creator of the file set to: 'TIFF' and '8BIM' which would
    mean that its a TIFF file, and it should be opened by
    photoshop if in a GUI you go and double-click it.

    this approach makes it much more difficult for any
    accidental SEPARATING of the file type info from the
    info that determines which app should open it - and thus
    makes the user-experience and OS less prone to error and
    frusteration.

    it would be simple to add - if only someone bothered to
    put it in now - while the system is being determined.

    please consider this.

    regards,
    johnrpenner@earthlink-NOSPAM-.net

  219. This is terrible! by Lepidoptera · · Score: 3

    First they don't allow dogs in dorms, now our computers won't eat our papers! How am I ever going to get another paper extension again???

  220. Sounds great, but BFS... by dayeight · · Score: 4

    the Beos Filesystem, which is now at least five years old has been offering these sort of essentials for now 5 years. The only real regret is how its no longer a true database....

    Still, this beats the pants off of FAT :)

  221. Re:Version control system by phutureboy · · Score: 4

    VMS has had this for many, many years. Files are listed as such:

    ex: README.TXT;4 would be version 4 of README.TXT

    There's a command you can type to purge all but the 'x' most recent versions, but I don't remember what it is, as I'm actively trying to forget I ever even used VMS. Anyway, you could really eat up some disk space if you didn't run this command every so often.

    I always found the versioning to be a pain in the ass to deal with, but I guess it did come in handy occasionally. I think the negatives outweigh the benefits though.

    --

  222. Re:Patents? This algorithm was published in 1977 by dpk · · Score: 4

    The Phase Tree algorithm is actually an improvement on this idea. Instead of recursively updating trees for every filechange, you keep 3 'phases' of the tree and cache which blocks have changed - then atomically write the metablock (et. al) after X changes or X milliseconds. This drastically reduces the amount of work that must be done and removes the constant recursion problem. This is better described in the info given on the Tux2 site.

    Put your hand in the puppethead

  223. Correct Link to "Tux2" by great+throwdini · · Score: 4

    Appears to be this:

    http://innominate.org/~phillips/tux2/

    Two els

  224. Another case... by 2nd+Post! · · Score: 4

    Of a user trained by the machine, and not the machine designed to accomodate the user!

    Of course there should always be system integrity checks available to the user for the paranoid among us (scandisk, fsck, etc)...

    But one would imagine a properly designed computer system has the capability of *never* having corrupted data! The machine would be pointing out to the user that FileA.ext was lost due to problems, and that the user needs to check on the integrity of the data, or that the data seems to be okay, does the user want to double check, or that nothing seems to be wrong.

    It's like... driving your car to the grocery, and then checking the oil, air, gas, transmission fluid, and brake fluid. The analogy is broken because the car didn't die, ala Windows, but it should be that the machine should be smart enough to tell you when something is wrong. I think.

    The nick is a joke! Really!

  225. Patents? This algorithm was published in 1977 by K-Man · · Score: 5

    I read about tux2 a few weeks ago, and noticed that it was implementing an algorithm I had read about a few weeks before, in a book on database architecture. Hopefully the discovery of some prior art will counter any patent claims.

    Tux2's reliability algorithm essentially goes as follows:

    1. At the beginning of a transaction, the "metablock" (including the block allocation table) at the root of the filesystem tree is copied into a buffer.

    2. Whenever a block in a file is updated, the updated image of the block is written to a newly allocated block, and the "new" metablock is updated with the new allocation. Blocks pointing to the old block may also be updated, in recursive fashion, eventually copying and updating an entire subtree from the original. The blocks in the "old" subtree are marked as free in the new metablock. The newly allocated blocks can live in memory, but must be written to disk before commit.

    3. At commit time, the new subtree replaces the old one. This operation simply involves overwriting the original metablock with a new one, which contains pointers to the new subtree as well as to the other subtrees which have not changed. If this operation does not complete, the complete picture of the old metablock, the old subtree linked to it, and free blocks where the new subtree was written, is maintained. If the operation does complete, the new image of the filesystem with the new, updated subtree, and free blocks where the old subtree used to be, is obtained.

    This is a good algorithm, and it's the only way to achieve atomicity and reliability without any logging, but it does have a few tradeoffs. Each update necessitates allocating a new block, so, for instance, changing one byte in the middle of a 2G, contiguous file will require allocating a block at least 1G away (and putting a hole where the old block was). There is also a ripple effect as pointers are updated up the tree, so changing one byte of data may will mean cloning a block, then cloning the blocks that point to the block, and so on up to the root.

    --
    ---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
  226. Re:can user processes schedule phase transitions? by Omnifarious · · Score: 5

    While that would be nice, using it to create a system snapshot for backup would be even nicer. You could tell all of your applications to write everything they need to write and freeze temporarily. Then you start a backup application as a transaction to your filesystem and it gets the frozen snapshot while your apps are unfrozen and work merrily away. The unfrozen applications see all their updates, and the backup sees the frozen filsystem.

  227. can user processes schedule phase transitions? by sethg · · Score: 5
    It would be kind of cool if, for example, I could do something like:

    # tux2_transaction && make && make test && make install && tux2_commit

    and then if there was a power failure in the middle of the build, I wouldn't have a build directory half-full of compiled files.

    On the other hand, I'm not sure how useful this would be; it would be easy (I assume) to defer phase transitions for an entire file system until a moment convenient for the superuser, but it could degrade performance for all other users on the system, and to get around that problem, you'd have to do all the grunt work of implementing a multi-user relational database within your file system.
    --

    --
    send all spam to theotherwhitemeat@ropine.com
  228. *BSD SoftUpdates provide crash resistance NOW by redelm · · Score: 5

    IMHO, *BSD with it's "soft updates" by Kirk McKusick is far superior to Linux ext2 in crash fs corruption resistance. I deliberately hit the power-bar off switch during four FreeBSD 3.3 kernel SMP compiles fairly late in the process
    (78 sec total vs 125 sec Linux 2.2.14) to make sure that data was going to disk.

    In all four cases I ran, the fsck upon repowering was fast, minor and automatic, mostly freeing unattached blocks whose metadata presumably wasn't fully written at powerout. More surprising, in three of the four trials, `make -j 4` _resumed_ the compile and as best as I could tell completed the interrupted kernel compiles without error. (Same ksize. md5 doesn't work because of timestamp) About 30-45 seconds
    worth of data was lost in dirty buffers at poweroff. In the fourth case, I got compile errors, but only had to `make clean`.

    I am seriously impressed. I've had poweroffs during Linux kernel compiles and had manual fsck work to do. There some info at Kirks's site http://www.mckusick.com/softdep/index.html
    and there's a very interesting paper whose URL I don't have handy.

  229. Version control system by pallex · · Score: 5

    Has anyone produced a file system that was essentially a version control system (MS SourceSafe, MKS SI etc)...you could delete a file, write a new version of a file etc, and be assured you could go back and get older versions if needed (ie each time you boot up your os you `label` the existing version numbers of all files so you could go back to that state if you screw up before the next `label`.
    Do you follow me? Is this a good idea? Has it been done? Too slow? Etc.. :)

    a.