Slashdot Mirror


Ext3cow Versioning File System Released For 2.6

Zachary Peterson writes "Ext3cow, an open-source versioning file system based on ext3, has been released for the 2.6 Linux kernel. Ext3cow allows users to view their file system as it appeared at any point in time through a natural, time-shifting interface. This is can be very useful for revision control, intrusion detection, preventing data loss, and meeting the requirements of data retention legislation. See the link for kernel patches and details."

36 of 241 comments (clear)

  1. So which is it? by EveryNickIsTaken · · Score: 3, Interesting

    Ext3cow, an open-source versioning file system based on ext3, has been released for the 2.6 Linux kernel. Ext3cow allows users to view their file system... Well, is it the file system, or the file system manager?
    1. Re:So which is it? by Bob54321 · · Score: 4, Informative

      From the example screenshot it appears it is a file system. You take a snapshot of your system at some point in time and it stores this data even when files change. Of course, with any file system it is important to have functionality that allows you to view the files as well...

      --
      :(){ :|:& };:
    2. Re:So which is it? by hpavc · · Score: 2

      You don't take a snapshot, thats the big deal with it.

      --
      members are seeing something, your seeing an ad
  2. What a name by Anonymous Coward · · Score: 3, Funny

    So is it EXT or is it just a FAT cow?

  3. Overhead? by HateBreeder · · Score: 2, Interesting

    Couldn't find real-world information about space and performance overhead.

    Does it store many copies of each file? or only the differences between the old and the new version?

    --
    Sigs are for the weak.
    1. Re:Overhead? by JoeD · · Score: 3, Informative

      Check the "Publications" link. The first one is an article in "ACM Transactions on Storage".

      It's a bit dry, but there is an explanation of how it stores the versions, plus some performance benchmarks.

    2. Re:Overhead? by DaveCar · · Score: 2, Informative


      Couldn't read TFA (slashdotted), but I would *imagine* that 'cow' is copy on write and that it just uses new blocks for the changes - so only the differences, but not minimal differences.

    3. Re:Overhead? by anilg · · Score: 3, Informative

      COW has been present for a long time in ZFS since Solaris 10. The overhead there is negligible and its quite stable. Last I heard, it was being ported to FUSE on linux. Upcoming in the next releases of FreeBSD and OSX. Wiki has a pretty good article.

      --
      http://dilemma.gulecha.org - My philospohical short film.
  4. CVS/Subversion replacement ? by BuR4N · · Score: 5, Interesting

    This might be far fetched but how far off is it to use these filesystems as a revision control system replacement ?

    Never tinkered with any of these filesystems, but wouldnt it be very comfortable for at least us developers to have a filesystem that worked something like Subversion. Just hook up something on the network and use it as the central code repository.

    --
    http://www.intellipool.se/ - Intellipool Network Monitor
    1. Re:CVS/Subversion replacement ? by scottv67 · · Score: 2, Interesting

      We should probably ask some VMS users about that. They had a versioned filesystem 20 years ago.

      It's actually closer to 30 years ago. I can't believe VMS is celebrating it's thirtieth birthday this year.

      http://h71000.www7.hp.com/openvms/25th/index.html

      Having multiple versions of a file is *extremely* handy. That feature saved me bacon many-a-time. For those of you who have never been fortunate enough to login to a VMS system, the file versioning looks like this to the user: scott_file.txt;5 scott_file.txt;4 scott_file.txt;3 scott_file.txt;2 and so on The file version incremented each time you modified the file. You could set the number of file versions that the OS would keep for you. I don't remember the maximum number of versions of a file that you could keep but I remember seeing version numbers that were five digits wide. The version number wrapped after a while. Thanks, -Scott

  5. True undelete by ex-geek · · Score: 4, Insightful

    Undelete, not half-assed, desktop based trash can implementations, is something I've always been missing on Linux. And yes, I generally know what I'm doing, but i'm also human and do make mistakes.

    1. Re:True undelete by xenocide2 · · Score: 3, Informative

      There's a couple reasons for it not being in the kernel. First, it misleads users who expect some degree of data security. The good news is that sort of person likely follows kernel patches to the FS and would likely be aware of the problem, possibly even writing a script that replaces rm with a real-rm.

      The second argument is that it's better handled in user space, so the OS doesn't have to make that sort of policy. There's no reason you can't just alias rm to some .Trash, or configure your Desktop Environment to do so (GNOME does, for example). There's all sorts of things you have to decide that might not suit everyone. For example, if I delete a file on a USB drive, does it go in a .Trash storage in the USB drive, or do we copy it over to a main .Trash folder? Many people don't realize they have to empty the trash to reclaim space on their thumbdrive in GNOME.

      The final argument I can come up with is security problems. We can't have one global .Trash bin in a multiuser system. And quotas. And permissions.

      Reading historic archives of the LKML suggests it's at least come up once. I guess Torvald's opinion is that anything that CAN go in the userspace SHOULD. Can't explain the webserver in kernel though. Perhaps that opinion has changed some time in the last 10 years?

      --
      I Browse at +4 Flamebait

      Open Source Sysadmin

    2. Re:True undelete by jonadab · · Score: 2, Interesting

      Undelete isn't what makes this really cool, IMO. I don't generally delete stuff I still want, so that isn't really a big issue.

      What I want, that a versioning filesystem can deliver, is the ability to revert a file back to an earlier version, after I've saved changes that turn out to be undesirable. This is a mistake I *do* make from time to time, often enough that I have been really hoping for a versioning filesystem in modern operating systems. This, to me, is a killer feature. I'm currently using FreeBSD, but this feature would by itself be enough to bring me back to using a Linux distribution, once it gets to the point of being included. Without it, once you save your changes and exit the application you can't go back. The past is lost. With a versioning filesystem, that's no longer true. I consider this to be *THE* feature for filesystems, far more important than things like journaling, much less performance tweaks. I have been wanting it ever since I saw the automatic versioning on OpenVMS, and I've been waiting, waiting, hoping, wondering why we don't have it in modern operating systems. I *want* this.

      --
      Cut that out, or I will ship you to Norilsk in a box.
  6. Re:Hooray for repeating history! by Anonymous Coward · · Score: 2, Interesting

    So because it was a good idea 20 years ago, it somehow isn't good that it's been implemented now? Sure, in an ideal world we'd all have been using versioned filesystems since the advent of VMS, but we havn't.

    Actually a tell a lie; the ISO9660 spec. copies the VMS design and also allows files to have a version number, using the exact same scheme I.e. the version # is appended to the file following a semi-colon. So "FOO.BAR;1" is a valid ISO9660 filename.

  7. VMS file versions someone? by ntufar · · Score: 4, Interesting

    It reminds me of VMS file versions.

    In VMS if you had a file named article.txt, each time you modified and saved it in editor, a new version was created named article.txt;1 article.txt;2 article.txt;3 and so forth. So after a long session of edit and saves you could end up with a hundred copies of file in your directory. A lot of clutter in the directory but easy access to older versions of the files.

    With Ext2cow you basically get the same functionality in a bit different way. By default you see only article.txt file. If you need to access a previous version of the file you need to specify a cryptic code like this: article.txt@10233745. A bit cumbersome but, hey, how often you access older version of your file anyways. Looks better than VMS' approach.

    This filesystem seems like a perfect solution for me as I am writing my Ph.D thesis. Currently I take backup every day and name it thesis20070420.tar.bz2, thesis200070421.tar.bz2, thesis20070422.tar.bz2 and so forth in case I need to go back and see how it looked some time ago.

    However, in my home directory I have a lot of large audio and video files that I would never want to be versioned. I wander if Ext3cow keeps extra copies of the files if I move them around, change file named but do not modify the content. Probably I would have to make a new partition and put my text files I am working on there under Ext3cow and leave my media files on ext3.

    1. Re:VMS file versions someone? by physicsnick · · Score: 3, Interesting

      Hmm, when I read your post I thought I'd come here and suggest Subversion. Seems everyone else has done the same.

      You really should use it. It's much easier to set up than you'd think, especially if you're on a Debian/Ubuntu box. If you use the file:/// syntax, you don't even need any kind of daemon or http server running; the client can do everything on its own. Say your thesis is currently sitting in ~/thesis, it's this easy to set up:

      sudo apt-get install subversion
      svnadmin create ~/thesisrepo
      svn import ~/thesis file:///home/${USER}/thesisrepo -m "Initial import"
      mv thesis thesisbackup
      svn co file:///home/${USER}/thesisrepo thesis


      That's it, you're done. ~/thesis is now a working copy of your repository, the repository itself (which will hold all versions of your files) is contained in ~/thesisrepo, and your original folder is backed up as ~/thesisbackup.

      To work on your thesis, go into ~/thesis and start writing as you've always done. When you want to save a snapshot of the current state of your thesis (i.e. commit your changes), open a bash terminal, go into ~/thesis and type svn ci -m "some message". That's it. Much easier than running a backup; you can just stick it in a daily (even hourly) cron job. To back up all versions of the thesis on removable media, tar up the ~/thesisrepo folder and put it somewhere safe.

      There's a bit more to know about it; namely you need to tell subversion when you add, remove, move or rename files. A good source for that is the Subversion Book, specifically Chapter 2.

  8. The C in CVS. by SharpFang · · Score: 4, Informative

    Concurrent...

    Sure you can "go back in time", but two users working on the same file at the same time would be a pain. Networking would require additional layers - even plain SAMBA/NFS, but still. Plus a bunch of userspace utilities as UI to access it easily.

    It's not bad as a backend for such a system, just like MySQL is good as a backend for a website, but by itself it's pretty much worthless.

    --
    45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
  9. Re:So by psbrogna · · Score: 2, Interesting

    I don't think it's supposed to be new (it's one of the things I miss most about VMS). It's outstanding functionality to have both for end users and sysgeeks/devs; built right into the file system level (ie. LOW). I prefer this approach to the hacks that other O/S's have implemented at a higher level. It's always harded to do something like this down deep at the roots rather than add it on as superficial gloss later. Granted, the end users don't usually notice or appreciate the diff but over time it keeps complex sys's like O/S's from becoming a teetering tower of shims and bolted on widgets.

  10. Smells like dirvish by Zekat · · Score: 2, Interesting

    This sounds like http://www.dirvish.org/, which is nearly as nice as the automatic file snapshots done by the "Network Appliance" fileserver boxes I've used at the last 2 out of 3 workplaces.

    --
    Mmm, donuts.
  11. Re:Can No One Else INNOVATE? by heffrey · · Score: 2, Insightful

    What evidence do you have that this is reverse engineering?

    Or do you mean that they are re-implementing Time Machine?

  12. Security, backups by Midnight+Thunder · · Score: 2, Interesting

    This solution certainly helps if you accidentally delete something or need to go back to an older version. SVN is one solution, but it is a bit more explicit, while solutions like this and Apple's Time Machine help avoid needing to remember to update your repository. It should be noted that this doesn't replace backups, since this does not protect against hard-drive corruption. I do have a few of questions though:
        - what are the security considerations here?
        - can you delete the existence of file, as to ensure that it is not easily found again?
        - what are the effects on hard-disc storage space, ie are there any estimates to how much extra storage is needed for this?

    --
    Jumpstart the tartan drive.
  13. Re:So by TodMinuit · · Score: 2, Informative

    It's more like Plan 9's Fossil, only without the extremely cool Venti.

    --
    I wonder if I use bold in my signature, people will notice my posts.
  14. Re:Excellent work but... by ajs318 · · Score: 2, Insightful

    The Linux kernel will never, ever have a stable ABI. Compatibility across versions is guaranteed only at the Source Code level, not the binary level. This is 100% intentional, and the only people it really hurts are those who would deny us access to the Source Code. And they deserve it.

    --
    Je fume. Tu fumes. Nous fûmes!
  15. some background by pikine · · Score: 4, Informative

    I'm answering questions that people posted so far altogether.

    Is it a file system or a file manager?

    It is a file system. You access old snapshot by appending '@timestamp' to your file name. You have to first instruct ext3cow to take a snapshot first before you can retrieve old copies, otherwise it simply behaves like ext3. It appears that snapshot is always performed on a directory and applies to all inodes (files and subdirectories) under it.

    My complaint is its use of '@' to access snapshot. Why not use '?' and make it look like a url query? Better yet, use a special prefix '.snapshot/' like NetApp file servers.

    Does it store many copies of each file? or only the differences between the old and the new version?

    How far off is it to use these filesystems as a revision control system replacement?

    ext3cow takes it's name from "copy on write," and it does this on the block level. When you modify a file, it appears to the file system that you're modifying a block of e.g. 4096 bytes. COW preserves the old block while constructing a new file using the blocks you modified plus the blocks you didn't modify.

    You can think about it as block-level version control. However, when you save a file, most programs simply write a whole new file (I'm only aware of mailbox programs that try to append or modify in-place). Block-level copy on write is unlikely to buy you anything in practical use.

    Does it provide undelete?

    Only when you remember to make a snapshot of your whole directory. An hourly cron-job would do, maybe. There is always the possibility you delete a file before a snapshot is made.

    --
    I once had a signature.
  16. Re:Can No One Else INNOVATE? by beezly · · Score: 3, Insightful

    Go away MacTroll...

    Veritas VxFS has had this for years. Snapshotting has been implemented in the Linux LVM layer for ages. This is just another way to do it.

    I don't know anything about the technical implementation of Vista Shadow Copies or Apple's Time Machine, but if it's anything like ZFS then I'll be impressed. I believe there are rumours about the next release of OS X using ZFS (which was developed by Sun), but I'll believe it when I see it.

  17. Re:Can No One Else INNOVATE? by siride · · Score: 2, Informative

    Because it wasn't REVEALED until 2006, so even if Apple was working on it in 2002 (not likely, since Open Source projects generally have longer cycles than proprietary ones due to manpower issues), the ext3cow people would not have been aware of it. Why do you think people are stealing this from Apple? It's a good idea that follows logically from ideas found in revision control software such as Subversion and its predecessors. And as others have pointed out, VMS had this 20 years ago. The idea certainly has been in existence for longer than Apple has. The wikipedia article indicates that the TENEX operating system in the 60s first had versioning filesystems. In any case, Apple hardly invented it, Apple was hardly the first to use it, and Linux implementations have been released before Apple even demoed Time Machine. So, basically, you are 100% wrong.

  18. Can't tell, its slashdotted by tinkertim · · Score: 2, Informative

    Well, is it the file system, or the file system manager?

    I can't tell, the site is experiencing the /. effect.

    Mirror of the patch (I grabbed it when I saw this in the firehose) can be grabbed here until my server gets sluggish too.

    in /usr/src type : patch -p1 linux-2.6.20.3-ext3cow.patch

    The site said its not been tested with other kernel versions, but if you feel brave just s/linux-2\.6\.20\.3/your-version/g. Haven't tried it, but should work.

    It wen't dark just around the time I was getting the docs and utilities.

    Did anyone happen to grab the utilities? Got a link?
  19. Re:Excellent work but... by oliverthered · · Score: 3, Insightful

    Your wrong, it also hurts those people who write drivers that aren't accepted into the kernel. And it also hurts end users or haven't you noticed the lack of Linux drivers for a lot of hardware.

    --
    thank God the internet isn't a human right.
  20. Re:Excellent work but... by Toffins · · Score: 4, Insightful

    Compatibility across versions is guaranteed only at the Source Code level

    (Disclaimer: Linux is excellent) But is compatibility even guaranteed at source code level?
    Here are some specific examples where source level API changes have occurred:

    1. Consider that up to linux-2.6.6 all SATA disks were treated as IDE PATA disks accessible via /dev/hd*, but in linux-2.6.7 they started to be treated as SATA disks only accessible via /dev/sd*. This changeover caused existing SATA disk systems to become unbootable after upgrading to linux-2.6.7 because the boot device at /dev/hd* was no longer accessible. Never documented in kernel/Documentation/*

    2. And between linux-2.6.15 and linux-2.6.20 the way the usb subsystem handled usb devices was changed so that usermode usb drivers like the usermode speedtouch driver was broken due to kernel returning EINVAL from each USBDEVFS_SUBMITURB command which is required after a USBDEVFS_CONTROL command issued by the modem_run ADSL line monitoring process. This generates thousands of error messages per second via syslogd. No news of this particular aspect of the usb changes was ever documented in kernel/Documentation/*.

  21. Re:Excellent work but... by ajs318 · · Score: 2, Insightful

    How about us who don't want to recompile everything whenever a new kernel release comes out? It is a freaking pain in the butt.
    No it isn't. That's a filthy lie made up by people who want to sell you pre-compiled binaries and stop you mucking about with the Source Code, and nobody who can spell 'make clean && make install' believes it. (Or you could use Gentoo, which automates the recompilation; or a distribution using pre-compiled .rpm or .deb binary packages, which will have been recompiled for you by the distro's own team.)

    Anyway, not everything will change at one time. You only need to recompile such applications and libraries as actually break.
    --
    Je fume. Tu fumes. Nous fûmes!
  22. Re:So by samkass · · Score: 2, Informative

    Apple's Time Machine isn't just a *file* backup system. It's a *record* recovery system. Neither MS Shadow Copies nor this provides an API for software to search records back through time and pull a single record back to the present (ie. a single address book entry or photo). It's frustrating having people equate them so closely when it misses half the point of Time Machine.

    --
    E pluribus unum
  23. Re:Can No One Else INNOVATE? by jonadab · · Score: 2

    Actually, filesystem versioning is older than Apple as a company, much less OS X. ITS had it in the sixties, and VMS has had it since the late seventies. Nonetheless, it's an undeniably useful feature, and I'm glad it's finally making its way into the major OSes.

    --
    Cut that out, or I will ship you to Norilsk in a box.
  24. Re:Excellent work but... by cyclop · · Score: 2, Informative

    If someone writes kernel drivers correctly, those drivers will end in the kernel mainline. Linux supports out of the box more hardware than every other OS, no matter how obsolete and obscure. If you don't have your drivers accepted, AFAIK it's a problem with your code not being of enough good quality, nothing else.

    --
    -- Patent no.123456: A way to personalize /. comments with a sig attached to the end.
  25. Interesting - I have a couple of questions by ratboy666 · · Score: 2, Interesting

    No flaming -- I don't have the time to research this, so I'll just post the questions!

    1 - What happens to large databases? I am assuming a delta storage method, but that might slow down the database (specifically, I use mysql).

    2 - Large files? Specifically, deletion (I store lots of videos)

    3 - Usenet spools? (Lots of small files, deleted regularly).

    I suspect that I would have to segregate my files...

    --
    Just another "Cubible(sic) Joe" 2 17 3061
  26. Re:Can No One Else INNOVATE? by sofla · · Score: 2, Informative

    Given that Unix has had the concept of file "versioning" since I don't know when (but a long-azz time!)

    *scratches head* Unix? Versioning? Never seen it myself. Not to say it isn't there, but over the years I've used several *ix flavors and fs versioning isn't something I've come across. I suppose next you'll tell us Unix has file locking (afaik it doesn't, unless you count advisory locks. I don't).

    This is a reverse-engineering of Apple's Time Machine, through and through.

    I hate to be one to point this out, but, er... Time Machine is a BACKUP tool. Don't believe me? Go to http://www.apple.com/macosx/leopard/timemachine.ht ml and read the copy yourself, being sure to pay special attention to use of phrases like "the drive you're backing up to". How on earth you could possibly confuse a backup tool with a versioned file system is beyond me.

  27. Re:Excellent work but... by Doug+Neal · · Score: 2, Insightful

    A huge number of problems in Windows can be attributed to its lack of package management. Every installer is pretty much allowed to do whatever it wants, put files where it wants, change registry keys, whatever.. and when was the last time you saw a Windows program with an uninstaller that worked? I mean really worked? They all leave crap lying around afterwards that they "couldn't" remove for some vague/unspecified reason. Sometimes you don't even get an uninstaller at all. There's no version tracking, and no management of dependencies. Everything just has to ship with all the libraries it needs and hope it doesn't break anything else, which it doesn't always manage. You end up with a total mess, even if you're careful. Your average Windows PC needs reinstalling once a year at least to stay usable, on almost all of the occasions that someone's asked me to check out a few problems on their computer, I've ended up reinstalling just because it would be quicker than to clear up the mess.

    The "dependency hell" that you speak of has been a non-issue for years, even Red Hat makes a passable stab at it these days.. there's plenty of issues stopping Linux becoming a mainstream desktop OS but package management isn't one of them. Users don't want to have to run installers from CDs or whatever as you described, it's just what they're used to doing at the moment. If you showed a complete computer novice Synaptic or Click N Run, and then showed them the equivalent in Windows, which do you think they'd prefer?