Slashdot Mirror


File Systems Best Suited for Archival Storage?

Amir Ansari asks: "There have been many comparisons between various archival media (hard drive, tape, magneto-optical, CD/DVD, and so on). Of course, the most important characteristics are permanence and portability, but what about the file systems involved? For instance, I routinely archive my data onto an external hard drive: easy to update and mirror, but which file system provides the best combination of reliability, future-proofing, data recovery, and availability across multiple platforms (Linux, OS X, BeOS/Zeta and Windows, in my case)? Open Source best guarantees the future availability of the standard and specification, but are file systems such as ext2 suitable for archival storage? Is journaling important?"

105 comments

  1. If you are worried about interoperability use FAT by Xner · · Score: 2, Insightful
    It's simple and supported by almost all machines and devices. Worst come to worst you can hunt for your data with with grep and dd.

    If you are not constantly editing the information (and you won't be, it's for archival purposes) the admittedly major downsides of not being journalled and being prone to fragmentation are non-issues. You might run into problem with capacity limits and/or file size limits though.

    --
    Pathman, Free (as in GPL) 3D Pac Man
  2. The best archival filesystem by Helvidius · · Score: 4, Funny
    I have heard that the most permanent way of preserving data for long, LONG time is to write your data in stone. Granite being one of the best. Aside from that, computer data will lost a much shorter time than even the printed word. So buy some acid-free, archival quality paper and print those bits out!

    Of course, that's just my opinion--then again, I could be wrong.

    --
    "Care about people's opinions and you will be their prisoner." ~~Tao Te Ching~~
    1. Re:The best archival filesystem by jamesh · · Score: 2, Interesting

      I'm sure that a while ago I read about a system that could print encoded data onto paper at a reasonably high density (eg not readable by a human, but easily decoded with a scanner). At a 'plucked out of the air' figure of .25mm x .25mm per 'bit', and an equally 'plucked out of the air' figure of 11 bits of data per byte (to allow for clocking and maybe some error correction), you'd fit about 80kbytes on a single page of A4, and about 40mb per 500 sheet ream. Not that high (and possibly much higher or much lower once you stop plucking figures out of the air :), but if you had some stuff that you wanted stored for a seriously long time it might be feasible. Add in a few pages describing the encoding you have used and store it properly, and it might still be useful in thousands of years...

    2. Re:The best archival filesystem by _Sharp'r_ · · Score: 2, Insightful

      Stone? Easily chipped or cracked if dropped, low tensile strength, not very portable? No thanks.

      Try thin metal plates. A little more difficult to etch by hand (which can be alleviated by using the right malleability of gold), but well worth it for the long-term benefits of damage-resistance and portability.

      --
      The party of stupid and the party of evil get together and do something both stupid and evil, then call it bipartisan.
    3. Re:The best archival filesystem by Anpheus · · Score: 1

      Gold is actually taking out two birds with one stone, as it is known for its lack of reactivity. For that reason it is used in a great deal of computing equipment.

      Unfortunately you'll have to alloy it heavily in order to get the $/byte down!

    4. Re:The best archival filesystem by jamstar7 · · Score: 1

      Yup. 20 million Mormons can't be wrong.

      --
      Understanding the scope of the problem is the first step on the path to true panic.
    5. Re:The best archival filesystem by theshowmecanuck · · Score: 1

      Remember though that they must be used in conjunction with small (possibly granite) stones to read them.

      --
      -- I ignore anonymous replies to my comments and postings.
    6. Re:The best archival filesystem by Jugalator · · Score: 1

      Hmm, how many stones to store 1 TB?

      And let's define "byte" as "inscribed letter". :-)

      --
      Beware: In C++, your friends can see your privates!
    7. Re:The best archival filesystem by mrchaotica · · Score: 2, Insightful

      The downside of gold is that invading Conquistadors (or otherwise no-good people) might try to melt it down into bars or bullion, destroying your data.

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

    8. Re:The best archival filesystem by clydemaxwell · · Score: 1

      har, har!
      seriously, though, I was just commenting about this. Don't use HDDs for archives -- they fail. Use tape, DVD, or some other media-only (i.e. no embedded electronics to fail) device.

      --
      Browsing with classic discussion, noscript, at -1 and nested
      no hidden comments and I only mod UP
    9. Re:The best archival filesystem by jamstar7 · · Score: 1

      That too. Loose the stones, yer SCREWED.

      --
      Understanding the scope of the problem is the first step on the path to true panic.
    10. Re:The best archival filesystem by jcaplan · · Score: 1

      Stone definitely has a nice record for permanence, having preserved hieroglyphics for millennia, but I quibble with your choice of granite. Living in New England and seeing many old gravestones has allowed me some observations of performance of granite. Granite is nice to carve and pretty, too, but it suffers from weathering and the details become indistinct over time, becoming difficult to read after a century or so. Slate was popular before granite came into vogue and is found in the older sections of cemeteries. The fine line carvings of willows and skulls (I seem to recall that this was some sort of Greek revival symbology) are still beautifully preserved along with the names of the deceased and often a line of verse.

      So, go carve up some slate or mod an old dot matrix printer to work with discarded slate roofing tiles and archive away. Archaeologists of the future will thank you for your foresight in being the only one to permanently record the Slashdot discussion archives to a non-volatile medium so that geek wisdom could be used to rebuild a new glittering tech society following the Great Collapse of the 25th century.

    11. Re:The best archival filesystem by Anpheus · · Score: 1

      That's not a downside, that's taking security through obscurity to a new level.

    12. Re:The best archival filesystem by Kadin2048 · · Score: 1

      I remember the article you're talking about; it was here on Slashdot a while back. It was widely assumed to be a hoax, at least in the advertised implementation. He was talking about TB per page, and replacing Blu-Ray discs with paper, etc.

      But in theory there's no reason why you can't do a 2D bar code at high resolution across a page. You wouldn't want to use regular toner though, since it sticks to the pages; you'd want to use real ink that sinks in. Preferably pigment inks instead of dyes, too.

      The problem you get into is even if you can make the paper last a few thousand years, how long are the readers going to last? If you have to use a scanner to retrieve the information from the pages, then you run the risk of your information being irretrievably lost if the scanners all break down (say if civilization temporarily loses the ability to produce them). This requires that you keep around a lot of "bootstrapping" information on how to build the decoders, etc. in a natural-language format.

      --
      "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
    13. Re:The best archival filesystem by ITMagic · · Score: 1

      There was a utility, some time ago, available for Win/MSDOS to do just this. At the time, it was shareware. Sadly, I cannot find any reference to it now. I wish there was a FOSS equivalent, as of all the archival media we have, this (with decent ink) at least has a proven track record of hundreds, if not thousands, of years.

    14. Re:The best archival filesystem by Helvidius · · Score: 1

      Your plan has one flaw--if one tried to record data in gold, it might just happen that people in the future would find the medium (gold) much more valuable than the message.

      One only needs to look at ancient civilizations for historical president.

      Alas, one man's gold is another man's date with a prostitute.

      --
      "Care about people's opinions and you will be their prisoner." ~~Tao Te Ching~~
  3. Don't overlook popularity by fromvap · · Score: 3, Insightful

    I would say that ubiquity is the most important factor in being able to read something in the future, not it being open source. FAT32 is certain to be easily, if not legally, accessible for the very short expected lifetime of an external harddrive. To improve data recovery capabilities, you might like to create some archives in RAR format for error checking, with PAR2 files for redundancy and recovery. Hard drive space is cheap, so for safety keep the uncompressed files as well as the archives. Since hard drives fail, you should have more than one of them. And ideally, make DVDs also. I created some files with early betas of Openoffice 2, and it was not at all easy to open them once the file format changed before the final release. As another example, despite it being open source, the legal problems of Reiser may cause that file system to be inconvenient to access in the future. An outdated, but very popular legacy format will have support that will last far longer than people want it to. Because of the high marketshare that Wordperfect had in the days of Noah, even now you can open Wordperfect files in Word and Openoffice. If you think FAT32 will be unreadable anytime soon, think again.

    1. Re:Don't overlook popularity by name*censored* · · Score: 1
      Since hard drives fail, you should have more than one of them
      You could set it up in a RAID-0 (if it's only 2 disks) or RAID-5 with 1 redundancy (3 or more disks). But you could also consider CDFS or UDF... put your data on CDs/DVDs. If you have overly large files, you could use RAR to break them up into little pieces to burn to CD. If you want free online storage for few gigs of very important files, just open up as many gmail accounts as you need, compress your files with RAR into 1MB pieces and upload each of them attached to an email. You'd have to remember to visit your account within every 90 days though - I think that's how long they give you before they close your account down.
      --
      Commodore64_love: I don't comprehend people who're so frightened of death that they'll bankrupt themselves to stay alive
    2. Re:Don't overlook popularity by piranha(jpl) · · Score: 2, Insightful

      Does anyone use RAR outside of the copyright infringement scene?

    3. Re:Don't overlook popularity by MrHanky · · Score: 2, Insightful

      I second this. FAT-32 isn't the most robust file system out there, but it's ubiquitous and well understood. Robustness is probably not the most important aspect for archival storage, if that means write once and store, and it's meaningless if you can't read the format. It's not a modern file system, though, and has some problems (4 GB file size limit, etc.).

      I wouldn't say the same goes for RAR. It's a proprietary format, owned by a company and used mainly for piracy. I know you can extract it on many OS today, but I wouldn't trust it for tomorrow. Neither would I trust Word to open Word Perfect files -- I've received RTF files created by the latter that I couldn't open in Word. Market share alone doesn't guarantee anything, you need a format that is well known. Sadly, neither WP nor Word documents are.

    4. Re:Don't overlook popularity by RupW · · Score: 4, Informative

      Does anyone use RAR outside of the copyright infringement scene? Yep, I do. It's widely accepted, better than zip and better than .tar.gz or .tar.bz2 because it orders the files more intelligently than tar before trying to compress them. tar.rz goes some way to address that but you have to do it in two steps because rzip doesn't pipe. .tar.rz compression is about equivalent for large numbers of small files but rzip will often beat rar single large files.

      The killer feature back in the day was the first good implementation of disk splitting. But the compression still stands up now.

      On my 'if I ever get free time' list is to implement rar's file ordering in GNU tar to see if that helps gzip and bzip2 catch up RAR's compression ratio. But I've no idea if/when I'll ever get around to that.

      -- paid-up RAR user since 1996.
    5. Re:Don't overlook popularity by Ucklak · · Score: 1

      Isn't RAR proprietary? Isn't that what the poster is trying to get away from?

      Last I checked, RAR compression isn't available on any default installation of Linux, Windows, or Mac.

      RAR may be the best or versatile but every time I've had to un-RAR something, I either used a trail version or a cracked version.
      Not something I want to trust 15 years later.

      --
      if you steal from one source, that is plagiarism, if you steal from many, well, that's just research.
    6. Re:Don't overlook popularity by Lehk228 · · Score: 1

      7zip can unRAR, so can other compression utilities

      --
      Snowden and Manning are heroes.
    7. Re:Don't overlook popularity by jZnat · · Score: 1

      In my experience, even a registered version of RAR (legit actually :O) compressing with max compression is still beat by tar/bz2 for textual things. 7z can do even better, so sometimes RAR isn't that great.

      What RAR is good for is cross platform file splitting, parity files (borrowed concept from RAID 5 and co), and the ability to archive without compressing. This is why it is used in the "scene" all the time.

      --
      'Yes, firefox is indeed greater than women. Can women block pops up for you? No. Can Firefox show you naked women? Yes.'
    8. Re:Don't overlook popularity by AdamPiotrZochowski · · Score: 1
      At work we used rar to compress nightly all of the source code, inluding each devs own copy. We had 2gb of source code compressed down to 100megs, all because rar has much better compression methods, and as another posted, a better file ordering mechanisms.

      The command we used:

      rar a -m5 -s -mc63:128t -mdg -mcc -en -tsm0 -tsa0 -tsc0 -ri1:10 ${todaysDate}.rar "*"

      -m5 == maximum compression
      -s == solid archive, the real saver for multiple copies of same file
      -mc63:128t == text compression (PPM algorithm), the real saver for source files
      -mcc == image compression (our source also had images)
      -mdg == increase the dictionary size to max
      -en == dont include end of archive
      -ts[mac]0 == dont include file modiciation/activity/creation dates
      -ri1:10 == be nice to the system, sleep 10ms for every file operation

      There are other commands we could have used, storing ntfs file properties / permissions / compression bits / junctions, but we didnt need a backup for system recovery sense, but as belts and suspenders. Afterall, there was source control as well.

      The only other compression software that can rival this power is 7z, but I have not used this alot.

      The only other compression software that can rival compression is the paq6, unfortunatelly its still experimental so have not used alot of it.



      cheers

    9. Re:Don't overlook popularity by fritsd · · Score: 1

      I disagree; what's ubiquitous *now* might not be in 20 years (which is nothing). Can you still find the source for an executable for your current architecture to unpack your old zoo, lharc, and .Z files? (yes, I know gzip does .Z and zoo is in Debian, but you get my point). I think it's much, much more important that it's described in detail in a lot of locations (such as PK-ZIP and US-tar format). And when you mention Wordperfect, let me mention Wordstar :-) BTW, where's the royalty-free standards document that describes fully what the structure of a FAT32 filesystem is?

      --
      To be, or not to be: isn't that quite logical, Slashdot Beta?
    10. Re:Don't overlook popularity by Cecil · · Score: 1

      RAID-0 is striping. RAID-0 a tongue-in-cheek way of saying "Zero RAID" or "Not RAID". It's not redundant at all.

      You meant to say RAID-1 (mirroring), I'm sure.

  4. How Archival? by Stone+Rhino · · Score: 4, Insightful

    Is this going to be relatively live, with data being mirrored onto it regularly, or is this going to be written once and accessed occasionally from then on? If you're only going to write to it a very small portion of the time, (or even WORM), journaling will be useless to you, since anything that takes out your data won't be stopped by it.

    How far into the future are you going to need it? I understand the whole "not wanting to become unreadble," but honestly, no one's going to bother re-implementing a filesystem to look at their old vacation photos. Pick a popular filesystem, and you'll be sure of support down the line. FAT's still doing just fine for itself, and the ISO filesystems for CDs and DVDs will be readable as long as people are making drives for them.

    All of the data integrity features on filesystems aren't going to protect against disk failure/media wearing out, and error correction on that scale is beyond the scope of any one disk to handle. Like the department jokingly advised, parity files and other methods can handle this in a robust, media-spanning manner, and protect against everything from a few flipped bits to a whole-disk data loss (assuming you have enough parity data).

    I think the reason not much talk about filesystems has been going on is because they're mostly irrelevant for this task. They're designed to handle the issues of a live environment; the issues that archives face are beyond the capability of how you choose to store your data on each piece of media to solve.

    --


    Remember, there were no nuclear weapons before women were allowed to vote.
    1. Re:How Archival? by larien · · Score: 3, Interesting
      Just to be pedantic, ISO isn't the filesystem, it's either ISO9660 (CD-ROM) or UDF (DVD).

      However, you're correct that both are ubiquitous standards and likely to be readable by all modern operating systems and should be for some time to come.

    2. Re:How Archival? by Anonymous Coward · · Score: 0

      FAT and FAT32 have the drawback that they can't store files larger than 4 GB (or was that 2GB?). There's a similar problem with ISO9660 in that many implementations are inclomplete or buggy and fail with large files as well. Linux's UDF support is currently broken in that one cannot write files larger than 1GB (that was a workaround for a security problem and may be fixed by now). tar archives appear to have portability problems with files > 2GB.

      I don't have any good solution for large files. If you really have to make them accessible in the future and don't mind some extra work, you may want to consider splitting files into chunks smaller than 1GB.

  5. No Filesystem is Best by xanalogical · · Score: 2, Interesting

    If you're only using it for archive, writing anew each time, then skip the file system altogether. Treat the media like a block device, tar or otherwise archive your backup and just write the tar as a single, linear sequence of bytes. And don't compress it, so that a bit error early in the sequence doesn't mess up later blocks.

    Now which archive format is best - tar, cpio, etc.? I've heard that cpio is a much simpler underlying format.

    And if you have the space, write the archive sequence multiple times onto the block device, so if one block is destroyed you can pick it from from a peer location.

    -Jeff

    1. Re:No Filesystem is Best by Aladrin · · Score: 4, Insightful

      You'd be MUCH better off creating PAR2 files for the archive set, instead.

      If you made 2 copies of the archive on the media, and piece 10 of both sets die, you've lost everything. If you made 1 copy of the archive, and a 10% par set, any 10% of the pieces (data and parity both) could die and you'd still have your data. If you made a 100% par set, you could lose half of the data and parity and still recover. And it doesn't matter which portions.

      Add to that the fact that if you lost piece 10 in archive 1, and piece 9 in archive 2, it would be not much fun to figure out the dead pieces and make a full archive again. With PAR2, the tool will do the work for you.

      --
      "If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
    2. Re:No Filesystem is Best by hey! · · Score: 1

      I like the idea of PAR, but the advantage of tar is that it has been around forever and will probably be around forever, even though "better" solutions like PAR have been created. I'd be concerned that somebody will come up with a "better" solution than PAR and implementations of PAR might be hard to find in the distant (e.g. decades away) future.

      --
      Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
    3. Re:No Filesystem is Best by Anonymous Coward · · Score: 0

      What? PAR would be something you use on top of a tar archive. PAR is not an archiving format.

    4. Re:No Filesystem is Best by Anonymous Coward · · Score: 2, Informative

      Depends, a 100% par set for a 100GB archive would take forever even on the faster machines. Even a simple "small" 4GB par set for a DVD backup takes hours on an Opteron 250.

    5. Re:No Filesystem is Best by Aladrin · · Score: 1

      That is a valid concern as the author of PAR has indeed been working on a better implementation. But in the past, he has kept the parchive program backwards compatible. (The PAR2 version also handles PAR, even though they are quite different.)

      --
      "If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
    6. Re:No Filesystem is Best by Anonymous Coward · · Score: 0

      Correct. The poster is arguing apples and oranges, but because of the hatred the morons with moderation points have for unregistered users, they punish you for posting something smart and reward the registered user for posting something stupid. This is yet more proof how the moderation system is ruining Slashdot.

  6. Son, look... by Project2501a · · Score: 1, Flamebait

    I don't know how to tell you this, but you're an idiot.
    "There have been many comparisons between various archival media". ARCHIVAL is the key word. As in "we don't move these files around a lot". As in "It doesn't make much sense as an end user to discuss the undelying filesystem to something which is used to just have files sit around, as long as it's stable". Buy something that has dedicated commercial support for the next 20-40 years, like the LTO standard and call me in the morning.

    --
    ----
  7. There is no best by information_storage · · Score: 1

    There is really no best file system for every purpose.

    Since you want reliability and portability, I recommend DVD+Rs. They are certainly more reliable than an external hard drive, and more portable too.

    It really depends on how you use your archive. Since you carry your archive around, I would recommend against an external hard drive since they can be quite fragile.

    Your file system choice depends a lot on the storage technology choice. Of course for the previously mentioned, 9660 would meet your needs the best: "reliability, future-proofing, data recovery, and availability across multiple platforms". Your going to be able to find a DVD writer for a while to come. Furthermore, there are very minimal DVD incompatibilities across the platforms.

    Like I said, application matters a lot. If your using hard drives, then what your storing can have a big effect on which file system you should use. For example, if you require stationary backup of very large files on a UNIX OS, then it benefits you to store the information using JFS or another large file efficient file system. Your not going to want to change the file system once you've stored everything on it, and since a certain file system was more efficient to store the information with in the first place, then its more effective to keep it in that format for the long term.

    File systems depend on the storage technology they are used with, and the utility of a file system depends on how you use it.

  8. What about error correction? by F00F · · Score: 5, Interesting

    I've been wondering lately why no common file systems seem to implement error correcting codes (ECC/EDAC).

    In hardware, there's often a checksum, ECC/Hamming code, parity bit, Reed-Solomon code, etc. to detect and/or correct for inadvertent bit flips. But, as far as I know, no error correcting information is ever stored within the filesystem itself. Certainly the filesystem tracks how many blocks are dedicated to a particular file, and how many bytes long the file is, and one can always hash the file twelve ways to Sunday to assure that it hasn't changed since it was originally hashed, but none of that helps repair errors to the file should the medium that's being used to store it decay beyond what's already correctable via the medium access hardware.

    I can imagine scenarios where, for example, the RAM buffer in a hard drive is upset and perfectly encodes the wrong bit into a file (or even multiple stripes + parity in a RAID). In this case, the medium access hardware is useless (the data was, after all, ecoded perfectly wrong), but ECC in the filesystem would detect and potentially correct the error the next time the file was read back, even if it were decades later. I appreciate that it would add overhead, and thus maybe shouldn't be the default, but I don't see it being even an option anywhere, and some people would pay the performance penalty to get the data integrity benefit.

    Especially in instances like encrypted (or compressed, or both) loopback file systems where one bad bit can destroy an entire partition, why don't we have more data assurance layers available? Or have I just not found them?

    Whining of which, what was the deal with GNU ecc? Everyone speaks of "oh, yeah, the algorithm was deeply flawed, bummer..." but I don't ever see any details ...

    1. Re:What about error correction? by Anonymous Coward · · Score: 0

      zfs

    2. Re:What about error correction? by whovian · · Score: 2, Informative

      zfs supports checksums (http://en.wikipedia.org/wiki/Comparison_of_file_s ystems#Allocation_and_layout_policies) but it is incompatible with GPL (http://linux.inet.hr/zfs_filesystem_for_linux.htm l). However, Ricardo Correia has an alpha version of zfs for FUSE/Linux (http://zfs-on-fuse.blogspot.com).

      --
      To-do List: Receive telemarketing call during a tornado warning. Check.
    3. Re:What about error correction? by bgat · · Score: 1

      Checksums let you detect errors, but don't let you do anything to correct them. I think what he's after, and what I'd like to see, is a filesystem that offers "forward error correcting" codes--- information that lets you actually _correct_ bitflips.

      In an archival setting, I'd rather get back corrupted data than no data at all. A filesystem that aborts on checksum errors would therefore be a bad choice when faced with that problem.

      The question isn't so theoretical. NAND flash requires forward error correcting codes today, since that medium is not 100% reliable. A FEC-capable filesystem might be a good choice for CD/Rs that will need to sit on the shelf for a few years. And it might even be a requirement for those multi-terabyte hard drives I'm hearing about, given that their data density is so high that there's bound to be some data squished into oblivion somewhere over the course of a day...

      --
      b.g.
    4. Re:What about error correction? by ir · · Score: 0

      you could write a shell script to automatically generate par2 files for your backups.

      personally, though, i think raid HD mirroring would be a better solution for reliability

      --
      Irina Romanov
    5. Re:What about error correction? by Cajal · · Score: 1

      If you are using ZFS in a mirror or raidz configuration, then the checksums do let the fs detect and correct corrupted data.

    6. Re:What about error correction? by Wesley+Felter · · Score: 1

      The probability that a disk will fail completely is much higher than the probability that it will corrupt a few sectors. ECC only protects against the latter case, while RAID+checksums protects against both cases. Unsurprisingly, RAID+checksums is what the industry tends to offer.

    7. Re:What about error correction? by vrmlguy · · Score: 1

      I don't know of any filesystems but there are applications that implement error detection: "Oracle ensures the data block's integrity by computing a checksum on the data value before writing the data block to the disk. This checksum value is also written to the disk. When the block is read from the disk, the reading process calculates the checksum again and then compares against the stored value. If the value is corrupted, the checksums will differ and the corruption revealed."
       
      My understanding is that there were some very large, very static databases that had disk blocks corrupted by, yes, cosmic rays. RAID systems usually trust the data on the disk, assuming that a successful read means valid data, so errors of this type won't be caught at that level. The checksum doesn't correct the error, but it does allow the application to prevent the error from propagating. Once caught, the data can be recovered from backup media, or (if the RAID system permits it) by triggering a rebuild of the bad block.

      --
      Nothing for 6-digit uids?
  9. This question keeps popping up by rjforster · · Score: 3, Insightful

    In one form or another anyway. People keep asking about the _best_ way to store data for a long time (for some definition of best)

    My take on this problem is that you should use the best you reasonably can today. Then in 5 years time when there is a new technology out there, move over to that for archiveing your new data AND move your old data over while you still have working hardware.
    I went from floppy disks to LS-120 drives. From LS-120 drives to CDs. From CDs to DVDs. I'll go from DVDs to whichever of HD or BD seems best in a couple of years (unless something else crops up). I might use hard drives instead but I'm not sure yet. The point is I don't need to decide until I need to store that much.
    If you're playing in the big leagues do the same with the various formats of giganto capacity tape storage etc.

    Plan around the shelf-life and working life of the hardware you can get and the answer drops out.

  10. Simple... by evilviper · · Score: 3, Interesting
    I routinely archive my data onto an external hard drive: easy to update and mirror, but which file system provides the best combination of reliability, future-proofing, data recovery, and availability across multiple platforms (Linux, OS X, BeOS/Zeta and Windows, in my case)?

    Ext2 fs mounted rw,sync. When just reading, or just writing, async can't possibly help performance. You're strictly limited by disk I/O. Async will, however, cause irrecoverable corruption if there's a system crash or power failure, which was a source of great frustration with Linux before the journaling filesystems came along.

    Ext2 can be read by nearly even operating system out there, and doesn't have the numerous limitations of FAT32.

    Which, incidentally, is the exact same answer I gave a few months ago, when the last guy wrote an Ask Slashdot to ask the exact same question...
    --
    Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    1. Re:Simple... by jonesy16 · · Score: 1

      Having immersed myself in Linux for half a decade I, too, believed that Ext2 was the perfect filesystem for this sort of thing. But the hoops you have to jump through to get it working on a non-linux platform are insane. There are drivers "available" for Windows XP (who knows if those will be rewritten to support Vista or not), and to date there is no official support for the latest versions of OSX. Now that our company is transitioning to Mac computers I'm realizing the shortcoming of having most of our storage on Ext2 formatted drives.

      With that in mind, I don't think it's a good idea to recommend Ext2 and definitely wouldn't say "can be read by nearly even[sic] operating system out there" when 2 of the 3 biggies don't support it natively. You don't want to rely on one or two individuals' works to support that filesystem 5 years from now if you need to get archival data back. FAT32 / ISO9660 / UDF / and NTFS (read-only) are the only filesystems I can think of that will work out-of-the-box on mac/linux/windows.

    2. Re:Simple... by evilviper · · Score: 1
      But the hoops you have to jump through to get it working on a non-linux platform are insane.

      Downloading Explore2fs isn't all that difficult.

      and to date there is no official support for the latest versions of OSX.

      Official or not, with Darwin and Ext2 both being open source, it should be quite easy for anyone who cares enough to want to do it.

      You don't want to rely on one or two individuals' works to support that filesystem 5 years from now if you need to get archival data back.

      With any archival process... The currently available software is only of minor importance. The real issue is having specifications freely available.

      Microsoft is already depreciating FAT32, making 32GB partitions the limit. I wouldn't be surprised if the version after Vista disables the creation of FAT32 partitions entirely, and future versions to stop reading FAT32 (forcing vendors to license NTFS).

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    3. Re:Simple... by jonesy16 · · Score: 2, Insightful

      Explore2fs is written and supported by one person and currently doesn't list support for Vista. I would find it hard to recommend to someone else that they use this and expect it to be a reliable solution 5 . . .10 years down the road. And if it was so easy to support ext2 on OSX then why is there no reliable support for Tiger. Last I checked into it (about a month ago) there was ONE person who was working on the project and it had been sitting idle for a while. Given that a lot of Mac users are also linux users, I don't see why there woudln't be widespread support if it was "quite easy". The advantage to the FAT filesystem is that it has been around forever with little changes. It will support MOST archival requirements for file size, etc.

    4. Re:Simple... by evilviper · · Score: 1
      And if it was so easy to support ext2 on OSX then why is there no reliable support for Tiger.

      As I said... *IF* somebody cares enough. Apparently, noone does.

      Given that a lot of Mac users are also linux users, I don't see why there woudln't be widespread support

      You're making a lot of assertions and speculation there. Most of which I don't happen to believe.

      Even if your premise was true, there's no way I could possibly guess why nobody has felt the need to do it. And the fact that it doesn't exist certainly doesn't prove it's difficult, any more than it proves that no Linux users use OSX, that OSX has a lowsy driver model, that Ext2 is a filesystem that nobody wants, or any other possible reasons you could think up on the spot...

      The advantage to the FAT filesystem is that it has been around forever with little changes.

      No. You're proving your inexperience here ("the grass in always greener" syndrome?). There have been plenty of changes, besides the FAT12/16/32 changes, just in the recent past.

      Multiple times I've created and formatted a partition with fdisk from Win98, only to find it creates something that Win95 can't read for some reason (I suspect it's boundry aligment, but I never really tried to track it down). Ditto for NT4 created (FAT16) partitions.

      FAT has been a moving target the entire time it was actively in-use. It's only around Windows 2000, when NTFS was being heavily favored, that it seems Microsoft has stopped making incompatible changes to FAT... Now, instead, they're introducing other artifical limits like I've mentioned.

      So, if you're a manufacturer of Flash memory, or similar, you can write your own ultra-compatible FAT implimentation for formatting your devices (or you could use the oldest version of DOS you can find) that will work on every version of Windows. But short of taking very careful steps like these, using FAT for archival purposes is a rather risky proposition. It's a mirror of the reasons you shouldn't depend on proprietary software for your backups... And unless you're writing your own FAT drivers for all your systems, you _are_ depending heavily on Microsoft's proprietary software for your archives.

      Ext2, OTOH, is about as stable of a target as you can get. The old file-size limit of 2GBs is the only issue you might need to worry about, and your archives created by the lasted software should be readable by the earliest Ext2 implimentations.
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  11. Is "sticktion" still a problem? by Marrow · · Score: 2, Interesting


    If you leave a drive in a closet for 10 years, will it still spin up?

    1. Re:Is "sticktion" still a problem? by evilneko · · Score: 1

      I'm not sure if it was quite ten years in a closet or not, but only a few months ago, I helped my granddad clean up and prep a 486/25 for donation. Yeah, someone actually wanted the thing. And yes, it booted up, Windows 3.1 and all. Again I'm not sure how long the machine actually spent in the closet (unpowered), but it had to have been close.

      --
      Slashdot - where to disagree, is to be a troll
    2. Re:Is "sticktion" still a problem? by LoRdTAW · · Score: 1

      Hell I still have 2 AT&T unix PC's, one with a 20MB hard disk and one with a 40MB. They still boot and work fine.

  12. Non-IT answer by Overzeetop · · Score: 4, Interesting

    The best file system for archival purposes is the one you're using today. Why? Because of you want that archive to be readable in any expedient manner, you are going to have to constantly monitor and update the media on which it is stored. All media will degrade over time, and you will have no idea how bad that degradation has been until you re-read it. No vendor will compensate you for the loss of your data, because there is some data which simply cannot be recreated.

    If you want archival storage, you need to have your data on- or near-line, and rewrite the data to the "new" hardware every couple of years. By choosing a filesystem that is current, you are more likely to be cable to read it in a couple years than if you (try to) stick with a single filesystem. I know this sounds like a lot of work, but if the data is truly worth archiving, it's worth keeping both the storage mechanism and format up to date.

    --
    Is it just my observation, or are there way too many stupid people in the world?
    1. Re:Non-IT answer by freedom_india · · Score: 1

      Hey! I use an iBook (with Mac OS X Tiger FYI) and a Windows Desktop running XP Professional. Both have different file systems (HFS,NTFS). What do you suggest for me? Both have valuable information i need. The Mac contains all my p0rn and my XP contains all my SG-Atlantis episodes.
      I think it is better to use FAT32 since both can read it.
      What do you think?

      --
      "Doing what i can, with what i have." ~ Burt Gummer
  13. Worry about the hardware, not software by MightyYar · · Score: 4, Interesting

    Thanks to the emulation community, I can read data from an old Commodore 64, Apple ][e, Atari, etc. on any modern computer running any mainstream operating system. What I cannot do is easily hook up an old Apple ][e disk drive to my modern hardware very easily. The filesystem will not really matter so much, because even if Wintel goes the way of the Commodore 64, someone will make a DOSBOX-esque emulator for it. Getting data off of an ATA, SATA, USB, or Firewire drive might be more challenging once new hardware ceases to support those standards.

    Personally, I just throw stuff on external hard drives. 3-5 years later, the new drives are so much bigger, faster, and cheaper that it becomes economical to consolidate to a new drive. I still have data from a 286 that had nothing but floppies, an Apple ][e, and 2 dead Macintoshes. Even my old Windows 95 computer lives on as a VirtualPC image. I don't really use them that much, but the Apple ][e and 286 stuff is under 50 megs, and the VirtualPC image is 2GB. The images of the old Mac hard drives total less than 1GB... it's simply not worth deleting them and it's kind of fun to have my old computers still around, if only "virtually".

    --
    W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    1. Re:Worry about the hardware, not software by Gothmolly · · Score: 2, Funny

      Dude, I'm sure you could find all that pr0n on the Internet again if you had to. Let it go.

      --
      I want to delete my account but Slashdot doesn't allow it.
    2. Re:Worry about the hardware, not software by donaldm · · Score: 1

      Backing up to larger disk is fine for a personal environment (ie. PC) and I do this myself but it useless and expensive in the academic, business and scientific worlds where proper backups are important. This means a strategy needs to be in place to take into account disaster scenarios.

      The problem that many organisations face today is the long term storage of data, however it is not a simple matter of just archiving data, it is knowing if you can retrieve and reuse that data. Alright I have over simplified it but try writing a Disaster Recovery plan, if it is anything less then a few 100 pages you have not done your homework.

      A trivial but important example is to consider a database. You religiously backup all appropriate data to some sort of media (ie. tape, CD, DVD, HD-DVD, Blueray or HVD*) and adopt recommended backup procedures such as "on" and "off-site" storage. Now a time comes when the company decides to move to a new database which is different such as moving from Oracle to MySQL. Basically your old data is effectively useless unless you are going to spend a fortune on transferring the data or keeping the old database accessible which could be very expensive. What do you do?

      To sum up you need to decide what data is useful for archiving purposes and this is a major issue in the corporate world. In the PC world it is very much more simpler but if you can honestly say I can recover from the flood, fire and theft then you are reasonably ok. Many PC owners cannot do this because all their data is in one place.

      Note: HVD* is Holographic Versatile Disk and if you believe the hype will eventually replace tape as a backup media, of course the tape companies are fighting back.

      --
      There ain't no such thing as proprietary standards only proprietary formats. Standards are by definition open.
  14. Ext2, GNU tar by gweihir · · Score: 1

    Ext2 is suitable, because it is very likely to have really long-term support. And as long as you can boot Linux and copy the files over to some other filesystem, that is enough. Of coruse there may come a time when Linux drops backwards compatibility, but considering that the 2.0 Kernels are still supported, can run on current hardware and all kernels are still available, I would say this will not be anytime soon. Same for FAT through Linux. It is not going away, and since it is not under development, maintaining the code is very little effort.

    As archiver, I would recomend GNU tar, which (AFAIK) still supports every format it could ever create in its (long) history. Compression by either gzip (if you are not concerned about bit-errors) or better bzip2. Both compile on with C compilers. Both are used for archiving Linux kernels, so I expect they do have long-term support as well. And both have been stable for a significant time now.

    Journalling is not needed for archiving at all. It is a feature designed to give better remount times after crashes.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    1. Re:Ext2, GNU tar by zyzzx0 · · Score: 1

      Very recently, a gentleman brought in his NAS device that had files he accidently deleted but wanted back. He had already taken the drive out, connected it to his windows box, discovered it was an ext2 partition, downloaded some application for his windows machine to read the partition and used a second application to try to recover the deleted files.

      For most people this is a recipe for disaster. He was smart enough to know what an ext2 partition was and just smart enough to destroy most all of the accidently deleted data on the drive.

      My suggestion: stick w/ your external HD. Hard Drives, while they're the bottleneck and the most likely part to fail, are pretty reliable. There may be up-sides to certain filesystems; here are my 2 cents: fat and fat32 are a no-go if you're using outlook w/ a large pst file that is regularily being backed up. ntfs is a poor choice if you're doing anything outside of windows. If you're a seasoned linux user, or at least able to find a solution to use ext2 and get data on/off the drive inexpensively (and easily) on your windows network, it's the only real choice IMO. If you're only using windows, your external drive is usb or firewire (non-NAS), and you're not comfortable doing advanced tasks on an ext2 partition, you should probably use NTFS.

      Open Source best guarantees the future availability of the standard and specification, but are file systems such as ext2 suitable for archival storage?

      many NAS devices use ext2. It is probably best suited for your questions of reliability, future-proofing, data recovery, etc.
      But are you experienced to fight w/ a failing ext2 drive?

  15. I use ext3 by rduke15 · · Score: 2, Insightful

    I use ext3 on my external backup disks because:
    - it is much better and more reliable than FAT32
    - it is both open source and (relatively) widely used, so I expect there will always be some way to read it
    - it can easily be read by attaching it to any machine and booting some Linux LiveCD or bootable USB.
    - the OS which traditionally can read ext2/3 is itself open source and also widely used, so there is no fear that it would become unavailable

    For archival and backup, I feel all these advantages far outweigh the slight inconvenience that the disks are not readable directly by Windows and Mac, requiring either a driver or a reboot into Linux.

    The important point is to label the disks very clearly. Otherwise, someone connecting them to a Windows or Mac machine may believe the disk is empty and re-partition/re-format it! I would not only put a big explanatory label on the disk's case, but also name the volume something like "Linux-..." or "Linux-ext3-...", and also explain to persons involved (manager(s) + people handling the disks) that they are not readable in Windows (some people don't read even big labels...).

  16. Don't use FAT by AusIV · · Score: 2, Interesting
    FAT has issues with partitions larger than 32 GB and files larger than 4 GB. It's nice for Flash drives that you're taking from a Windows PC to a Mac to a Linux box, but if you're talking about serious arches, you'll definitely run into the first problem, and quite possibly run into the second.

    I use Ext3 for my backup drive, and this driver for when I need to attach it to a Windows box.

    1. Re:Don't use FAT by dgatwood · · Score: 1

      The first limit is almost a non-issue. The FAT-32 filesystem supports up to 8 TB. Of course, Windows XP can't format a volume over 32 GB, but you can always create the volume in another way---in Windows 98, ME, or Vista; in Linux; or using a third-party formatting tool. Once you have created a larger volume, Windows (even XP) should be able to handle it just fine.

      FAT Limitations

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    2. Re:Don't use FAT by crossmr · · Score: 1

      It says ext2, does it work with ext3? I just installed it, and it seemed to automatically call my 20 GB linux partition on my HD D:, but I can't see a single thing on it. It also shows it as a CD icon not a hard drive. Maybe I'll try a restart.

    3. Re:Don't use FAT by Simon80 · · Score: 1

      it definitely works with ext3, but as of the last time I used it, you have to manually mount the drive yourself using the mount command it installs.

    4. Re:Don't use FAT by crossmr · · Score: 1

      Yeah it was bad. I restarted, and I couldn't login. When I clicked my username, it sayed userinit.exe failed to initialize then locked up the machine, I had to goto safemode and uninstall it.

    5. Re:Don't use FAT by Simon80 · · Score: 1

      Ouch, I've never seen it that bad. My biggest gripe with it was the manual mounting.

    6. Re:Don't use FAT by Storlek · · Score: 1

      Not to sound like an Apple fanboy, ext2/3 just isn't very well supported in Mac OS. Sure, there's an ext2 driver, but it's unstable and buggy at least in 10.4 -- if I'm mounting my ext3 partition, I have to kill Spotlight first, else it'll try to index it, and about half the time I'll end up with a kernel panic. There doesn't seem to be much development on it, either. I'm not sure the developers are even interested in the project anymore.

      It's a fine and stable filesystem in itself, but if it doesn't work on the second most popular commercial OS (using that title liberally of course), I can't keep my data on it.

      --
      Bears don't normally eat things that talk and move backwards.
    7. Re:Don't use FAT by AusIV · · Score: 1
      That hardly sounds fanboyish, I was under the impression ext2/3 were at least as well supported on OSX as they are on Windows, and I'd hope no one would consider a file system that's hardly compatible with their OS.

      I'm not really much of a fan of ext3. I recommend it for cross compatibility, but nothing else. I tried using ext3 for storage of MythTV recordings, and that turned out to be problematic - it would frequently crash if there was too much going on. Recording one show, transcoding another and watching a third was pretty risky. For lots of storage that only needs to be used on Linux, I use JFS. The downside is that the partition can't be reduced, but the stability and handling of large files is superb.

    8. Re:Don't use FAT by Storlek · · Score: 1

      Extfs unfortunately isn't very well supported in a lot of places besides Linux. It's the unfortunate truth. Nor are lots of filesystems, to be honest.

      By "fanboy" I was more referring to my bringing up the Mac in a Windows/Linux-centered discussion, although I suppose I could put in my 2 and suggest using HFS... OS X and Ubuntu have coexisted nicely on my Powerbook for a few months now and I have yet to see any problems with the Linux support for HFS; on the other hand the Mac ext2 driver crashes constantly. I have seen HFS-reading tools for Windows; I haven't researched extensively since I haven't had Windows installed on my own computer since before XP came out, so I can't vouch for their quality, but I also haven't seen any reports of major problems like with ext2 on the Mac. It might be something to look into.

      (Now that's probably closer to fanboyism :)

      --
      Bears don't normally eat things that talk and move backwards.
    9. Re:Don't use FAT by Anonymous Coward · · Score: 0

      That's open-sores in action. Who needs testing when it's pretty shiny open-sores?

  17. bad advice by oohshiny · · Score: 3, Insightful

    Buy something that has dedicated commercial support for the next 20-40 years

    You mean like DEC or any of the other out-of-business dinosaurs?

    As someone who has been through this, I can only say: do NOT buy anything that depends on "dedicated commercial support"; the companies and industry standards you think are going to be around for "20-40 years" are probably either not going to be, or they are not going to give a damn about you.

    Use open standards and open formats, with multi-vendor support; that's the only way to go. And you need to keep your eyes open and move to new formats and standards as the world changes.

    If LTO is the right choice, it's the right choice because of that. But I'm not convinced that LTO is going to be long-lived enough as a standard, no matter how many companies have tied their fortunes to it right now.

    1. Re:bad advice by Nutria · · Score: 1
      You mean like DEC or any of the other out-of-business dinosaurs?

      DEC might be gone, but companies still support DLT hardware.

      --
      "I don't know, therefore Aliens" Wafflebox1
  18. Keep it simple by Phaid · · Score: 1

    Tar.

  19. Tape by vadim_t · · Score: 2, Informative

    Here's why: IMO, unless you're doing it for a company, the most important thing is convenience.

    If it's your job, sure, you'll do it whether it's convenient or not.

    If it isn't, you'll quickly get tired of messing with CDs, plugging/unplugging hard drives, etc. So I went with the most convenient media possible: tape. Stick a tape into the drive, walk away, store when it spits it out. It doesn't interfere with the computer's usage since nothing else uses tape.

    For absolute convenience, get a tape robot from ebay. Then it can be completely automatic.

    Filesystem: use plain tar to write to the tape. If you must use compression, compress files individually, not the whole tape.

    Paranoid implementation: Tapes have file marks. You can ask the tape drive to give you file #1 for instance. You can use this to store some useful stuff in a format that will always be recoverable so long you have a drive that can read the tape. Store like this:

    File 1: Text document explaining what's all this stuff, and what's on the tape.
    File 2: RFC for tar format
    File 3: RFC for compression format
    File 4: source for tar program
    File 5: source for decompression program
    File 6: backup

    A tape formatted like this should be readable so long a drive capable of reading the data in it survives. To ensure that, go with a popular tape format, which is reliable, open, and has a high capacity (so that it's unlikely to become obsolete too fast)

  20. Dad, look... by Anonymous Coward · · Score: 0

    I don't know how to tell you this, but you're a jerk.
    (Or you're at least acting like one right now.)

    Admittedly this is not a brilliant question -- If the guy is at all savvy, it should be obvious with a moment's reflection that journaling and fragmentation are non-issues for an archival filesystem. But the reason he's asking is because he's not savvy on this topic. Is that okay with you? I have as little patience as any of us when it comes to idiot users, but aside from not knowing a lot about filesystems there's no evidence that this guy is an idiot. He looks to me like a newbie trying to be responsible and better himself. Oh, how I wish my lusers could be more like that!

    Despite it not being a *great* question, it's still a reasonable one, for which there are good answers and bad answers. FAT and ISO9660 are examples of good answers because they are straightforward, well-documented, without irrelevant features for the task at hand, and both have 20-something years of ubiquity behind them and are still very much alive and kicking -- they will be supported on all platforms long into the foreseeable future.

    You mention the LTO standard, and maybe that's a good answer too, but I don't know because you don't explain what it is or give any pointers to additional information. Do you really expect that answer to helpful to the guy asking the question? (I found what I *think* you're talking about on Wikipedia, no thanks to you, and it doesn't look very relevant to the original poster's question.)

  21. Take a look at Venti by asdavis · · Score: 1

    The guys over at Bell Labs developed Venti as a part of their Plan9 Operating System. If you are not adventurous enough to install Plan9, they have a great set of ports called Plan9 Port that has most of the exciting bits of Plan9 for other *nix like Operating Systems including Linux and Max OS X. Venti is an archival storage server, utilities and filesystem. It works with both magnetic and optical media.

    --
    TECMATIC - Intelligent Technology News
  22. ZFS - FTW by GuyverDH · · Score: 3, Informative

    While not as widely used (yet), it will eventually become the de-facto standard in safe filesystems.

    I've thrown all kinds of problems at it, and it has yet to lose a single byte of data.
    Add to that, taking snapshots every (x) minutes, you can look back in time as easily as reading a folder.

    With RAIDZ2 in the latest releases, you can set up sets that can withstand the loss of 2 physical drives. If you couple multiple RAIDZ2 sets into a single pool, you've increased the redundancy even further. With plain old JBOD and multiple controllers, you can reach levels of availability that only expensive EMC/Hitachi/StorEdge systems have reached in the past.

    It's opensource as well (although it's the Sun flavor at this time), and being worked on at www.opensolaris.org. I believe Sun is contemplating switching it to GPL at this time.

    --
    Who is general failure, and why is he reading my hard drive?
    1. Re:ZFS - FTW by PhunkySchtuff · · Score: 1

      I'll second that - with 256bit checksums on all data stored, journalling on metadata AND DATA, and now it's not Sun only it's been implemented in the latest builds of Mac OS X 10.5 Leopard.

  23. Is one copy enough? by Anonymous Coward · · Score: 0

    What appears to have been established here is that electronic storage media does not last for long (compared with, as mentioned in previous comments, written text or stone engravings), irrespective of the filesystem used on that media. Therefore perhaps a single archive copy is not enough. What about a distributed system such as LOCKSS http://www.lockss.org/lockss/Home would be better for archive storage as it essentially abstracts the hardwaee and filesystems that the data is stored on.

  24. What are your parameters? by davidwr · · Score: 2, Interesting

    Do you need data-readback in a matter of seconds? Minutes? hours? days?
    Do you need storage for years, decades, centuries, millennia, 10,000 years, or longer?
    Do you need an indexing system based on content or just on title/filename?
    Can the data be printed out or carved into stone without losing important information?
    Is this a go-to-jail-if-you-don't legal requirement, a may-go-bankrupt-if-you-don't business requirement, or a save-us-a-bunch-of-money-nice-thing-to-have requirement?
    Do you think the cost of researching the "best" solution worth the improvement over the 2nd- or 3rd-best solution?

    Let's assume you need it for 50 years, access is infrequent, and you can wait 24 hours for data recovery. Talk to the folks at Iron Mountain and other data-retention warehouses, they are experts in the field and will be happy to consult with you or do the entire job turn-key.

    My hunch:
    For most applications involving less than 50 year data retention, making 2 copies of the raw data, to a currently supported stable media such as tape or archival DVD, stored in separate locations, is key. Make sure the data is both in the original format and in a published-standard format which is widely supported.
    Keep multiple machines that can read the data around for as long as you need the original format. Every few years or as needed, verify the data is intact, re-convert the data from the original format or, if that format is unreadable, the highest-fidelity published-standard format, to a currently-supported published standard, and save it to a currently-supported archival format.

    Ideally, in 50 years time, you will have the original media plus several updated copies. You may or may not be able to read the original media but your most recent copies will be close enough to the original to be useful. If you are very lucky, the most recent copies will be identical to the originals AND you will still have the software and hardware to read them.

    Oh, for anything REALLY important, print it out on archival paper, or carve it into stone.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
    1. Re:What are your parameters? by Vombatus · · Score: 1

      My hunch: For most applications involving less than 50 year data retention, making 2 copies of the raw data, to a currently supported stable media such as tape or archival DVD, stored in separate locations, is key. Make sure the data is both in the original format and in a published-standard format which is widely supported. Keep multiple machines that can read the data around for as long as you need the original format. Every few years or as needed, verify the data is intact, re-convert the data from the original format or, if that format is unreadable, the highest-fidelity published-standard format, to a currently-supported published standard, and save it to a currently-supported archival format. Interestingly enough, this is very similar to the process developed by the National Archives of Australia http://www.naa.gov.au/recordkeeping/preservation/d igital/summary.html/. They are saving the 'original' document and a version converted to an open format (eg Open Document Format for word processing documents). If the format changes, they will use the converted version to generate something in the new format. They will be doing it for stuff that needs to be kept a lot longer than your arbitrary 50 years.

      Ideally, in 50 years time, you will have the original media plus several updated copies. You may or may not be able to read the original media but your most recent copies will be close enough to the original to be useful. If you are very lucky, the most recent copies will be identical to the originals AND you will still have the software and hardware to read them. But if you convert to an open document format, you will not need the original software or hardware to read them. If your business depends on it, you would also want to be pretty sure that the copy is an authentic replica of the original. You do not want to rewrite history inadvertently.

      Oh, for anything REALLY important, print it out on archival paper, or carve it into stone. Thats a lot of paper or stone.
      --
      This sig is intentionally blank
  25. Matrix codes by tepples · · Score: 1

    I'm sure that a while ago I read about a system that could print encoded data onto paper at a reasonably high density (eg not readable by a human, but easily decoded with a scanner). At a 'plucked out of the air' figure of .25mm x .25mm per 'bit', and an equally 'plucked out of the air' figure of 11 bits of data per byte (to allow for clocking and maybe some error correction), you'd fit about 80kbytes on a single page of A4

    What you read about is a matrix code (no, not backwards kana). 4 pixels per mm times 25.4 mm per inch = 100 dpi. Olympus Dot Code, used by Nintendo's e-Reader, is 3 times finer than that, at a bit over 300 dpi, improving data density by an order of magnitude.

  26. for hard disk media? Sun's ZFS, hands down by toby · · Score: 1
    --
    you had me at #!
    1. Re:for hard disk media? Sun's ZFS, hands down by Slashcrap · · Score: 1

      And because it's been around for nearly a whole year, you know it's perfect for long term archival storage.

      Look! There's even some blogs about it. What could go wrong?

    2. Re:for hard disk media? Sun's ZFS, hands down by GuyverDH · · Score: 1

      ZFS has been around for much longer, and used in production systems (at least internally to Sun for years - much longer than the latest ReiserFS).

      Now, couple this with Sun's test lab, where they've subjected ZFS to MILLIONS of intentionally data disrupting incidents, such as - reformating hard drives in the pools, removing power from hard drives, writing random data to disks in the pools, pulling SCSI cables from systems, physically powering off the system, re-cabling the disks and boxes so that they are on different controllers, channels, scsi IDs, all without a single byte of corrupted data.

      How many other filesystems can you say this of?

      --
      Who is general failure, and why is he reading my hard drive?
  27. Pack unpar+untar+gunzip along with each part by tepples · · Score: 1

    the advantage of tar is that it has been around forever and will probably be around forever, even though "better" solutions like PAR have been created. I'd be concerned that somebody will come up with a "better" solution than PAR and implementations of PAR might be hard to find in the distant (e.g. decades away) future.

    Any harder than implementations of tar, or even implementations of sh if you want to put GNU tar in .shar format? At least .shar is reasonably human readable and can be unpacked if you don't have a Bourne shell handy. My recommendation: On each volume, include two archives: a .shar archive containing the source code for a PAR reassembler, .tar unpacker, and .gzip decompressor in a widely supported language such as C, and a "part" of your .tar archive.

  28. PSA: Worse comes to worst by Anonymous Coward · · Score: 1, Interesting

    Worst come to worst... The expression is "Worse comes to worst" as in "should the condition arise such that what was considered 'worse' is so bad that it's now the worst thing that could happen..."
  29. My recommendation: HFS Extended by RedBear · · Score: 1

    I also looked into this problem for storing files on large external hard drives. The conclusion that I came to in the end was that at this point in time there really is only one option if you want to be able to access the drive from Windows, Mac OS X and Linux. That option is the Mac HFS Extended file system. Yes, you do have to purchase MacDrive in order to access HFS+ with Windows, but it is a very well-established and popular product that works well and is going to be around for a long time, so it's a safe investment. There is also a driver for Linux 2.4 and 2.6 kernels here. HFS+ has been made more popular because of all the people who want to be able to access their Mac-formatted iPods under Linux.

    Your only other choices for cross-platform compatibility are FAT32, NTFS, or Ext2, and they all failed my requirements in various ways.

    FAT32 is the most universally readable/writable by most operating systems, but it has serious problems. The main issue is the 4GB file size limit, which was absolutely ridiculous even several years ago. The other problem is that Windows simply won't allow you to format a drive larger than 32GB (or is it 127GB) as FAT32 anyway, but the file size limit is much more of a problem. The only other option that Windows offers you natively for formatting large drives is NTFS. That solves the filesystem and file size limits but then you block stable read-write access from any other OS. There is no read-write NTFS support under Mac OS X, and the read-write NTFS support under Linux is still experimental.

    I know a lot of people are recommending Ext2/3, and I also used to think that was the answer, but unfortunately the support for Ext2 on non-Linux platforms is dismal. There is supposedly an Ext2 driver for Mac OS X but it is basically alpha quality and highly unstable based on the user reports I've seen. Thus, Ext2 fails right there, for me. There are a couple of different options for Ext2 in Windows, and they work fairly well for the moment, but it seems to be one of those situations where one guy took some time to whip up some basic support a few years back when he had some free time. That's not the kind of thing that makes me feel good about being able to access my files easily from Windows in the future. There is no guarantee that it will be updated to work with Vista. There is no dedicated crew of people out there making sure that there is continuing stable support for Ext2 in Windows.

    I was quite frankly surprised to see just how poor the cross-platform support for Ext2 was. I really was hoping that the open source world would put a little more effort into making Linux filesystems more accessible to other platforms. Instead what I've found is a situation where basically one or two hobbyists have played around with creating some support for like a summer project, and everyone else just sits back and whines that Microsoft or Apple haven't built support in for Ext2 on their own.

    So in the end I plunked down my $49 for MacDrive (and $9.99 for a second license) and started formatting all my storage drives as HFS+. An added benefit is that I can copy files to and from my Mac without having tons of those dot files show up, since HFS+ is the native Mac filesystem and supports the Mac resource forks. A final and very nice benefit is that if the drive is hooked up directly to a Mac running OS X it can support journaling, just as Ext3 does under Linux. For me, HFS+ was the only feasible solution for file storage and archiving, and it's working out pretty well. YMMV, of course.

    It's not the most perfect situation, but until everyone is able to agree upon a single, standard filesystem for all platforms I don't see any other workable option. I don't see it happening. Mac OS X and Linux may converge on ZFS in a couple of years, but I doubt Windows will ever join the fold and start supporting an open standard filesystem unless somehow the market learns to demand more standardization.

  30. .par2 by tepples · · Score: 1

    I've been wondering lately why no common file systems seem to implement error correcting codes (ECC/EDAC).

    Because user mode tools such as PAR2 already implement them.

    I can imagine scenarios where, for example, the RAM buffer in a hard drive is upset and perfectly encodes the wrong bit into a file

    Likewise, I can see scenarios where, for example, the RAM buffer in an application's main memory or in the file system's buffer is upset and perfectly encodes the wrong bit into a file.

  31. ZFS by jafo · · Score: 1

    ZFS checksums everything on the file-system. If you are using RAID-Z with ZFS, it can detect corruption of the underlying data and correct it. For exmample, if you have a RAID-Z+ZFS with 3 drives, you can "dd if=/dev/urandom of=/dev/sdX" and then do a "zpool scan" and it will figure out what was corrupted and fix it. This is one of the standard demos they show with ZFS.

    This is great. Previously I had implemented a fax archive for a client and it was getting corrupted periodically because of some ext3 file-system bugs. Luckily, I had put file checksums in place, and we could generate a report on corrupted files daily, so we could pull them back from the backups.

    Sean

  32. Yeah but... by Anonymous Coward · · Score: 0

    "I have heard that the most permanent way of preserving data for long, LONG time is to write your data in stone. Granite being one of the best."

    But Granite can crack, and crumble causing file fragmentation.

    So carving the PAR files is a lot of extra work!

  33. Agreed by Rob+Simpson · · Score: 1
    What I'd like to see would be a filesystem that would look like a read-only FAT32 drive with hidden files or an extra partition to an OS that didn't support it, but to an OS with the correct driver would have error correction transparently built-in.

    Being able to transparently divide files above 4gig and have them look like a single file to a supported OS would be gravy.

  34. Hardware isn't everything by Rob+Simpson · · Score: 2, Insightful
    Even with hardware that seems to be working perfectly fine, in the process of storing and repeatedly transferring stuff between different types of storage I've had errors crop up.

    Sure, I could use archives with checksums or RAID, but it'd be nice if there was an option to sacrifice some speed and space on a single form of storage to improve the reliability without going to such cumbersome lengths.

    1. Re:Hardware isn't everything by rjforster · · Score: 1

      I've seen these too. DVDs I've burned have been most reliably read back on the same drive that burned them. Less reliably on other drives. Even a single one that doesn't work is a pain, but I don't want to burn multiple copies or anything like that.
      I've not tried using dvdisaster but it does seem to fit these requirements.

  35. The only format universally accessible by rfc1394 · · Score: 1

    Is the VFAT 32-Bit MS-DOS file system made available in Windows 95. On CD-ROM/DVD-ROM it's essentially the same as the "Joliet" format. It supports file names up to 63 characters, subdirectories, and blanks in file names. Now, I could be wrong but I think journaling is only important where you have transactional-based file systems, where you are doing update writes and want faster performance with the ability to recover in the event of failure of a transaction to finish, i.e. the computer is rebooted before all of the I/O is done to the file but after the journal was written. (You recover the data by replaying the journal.) For the purpose of creating archival backup I don't believe journaling buys you much of anything since typically you write a whole file as a single transaction (several blocks one after the other in a copy operation) and you restore the specific file the same way (as one copy of the whole file).

    Whether we like it or not, while NTFS and ext2/ext3 and a bunch of other file systems might provide better reliability, whenever you look at any system, anywhere, one thing they all have in common, on virtually all media: hard drives, floppy drives, usb thumb drives, removable media cards such as Smartmedia, CF, Memory Stick, cell phones, is the Windows VFAT file system. And everyone: Windows 95, 98, NT, 2000, XP, BSD, Linux, OS-9, you name it, every operating system can read MS-DOS VFAT format file systems.

    --
    The lessons of history teach us - if they teach us anything - that nobody learns the lessons that history teaches us.
  36. My Suggestion by ratboy666 · · Score: 1

    Archival meaning -- read-only. Multiple OS support meaning -- standard.

    This cuts the field down. ISO 9660 would be a good bet, but is a bit "overkill". TAR format (which can be viewed as a "primitive" filesystem) would be my choice. Simple, can be read on all your target systems. If a tar client is not (for whatever reason) easily available, the data can still be simply extracted.

    Bad point: the "directory" can only be obtained by scanning the entire byte stream. If that is tolerable (and, by indexing the files stored, is mostly just fine), its the one.

    If you need fast directory searching as well, consider ISO 9660. Again, clients to read the format are available (although they may be limited to 700MB at a time).

    "tar files" (a bit of a misnomer -- should be "tar format byte stream") can be recorded on any device - floppy, disc, tape, CD, DVD, USB stick. It may even provide amusement -- if someone trys to read a recordable CD with a tar image on it, rather than an ISO image :)

    --
    Just another "Cubible(sic) Joe" 2 17 3061
  37. define really important by davidwr · · Score: 1

    The amount of paper or stone is related to how important "really important" is.

    I'd say "really important" is stuff that needs to survive a collapse of technology or even civilization, but not the collapse of literacy. Things like the Rosetta Stone or the modern equivalents, basic instructions for subsistence farming, core religious texts such as the Bible and Koran, dictionaries, some history books, instructions for making a printing press and other basic inventions that could have been built 4,000 years ago if someone had basic instructions. These need to be printed on 500+ year archival materials and stored in multiple copies around the globe.

    "Really really" important is the stuff that needs to survive the collapse of literacy or even the human race. Stuff like "Welcome to the Nevada Nuclear Waste Dump." This needs to be on 10,000+ year archival material in a form that is recognizable by all people literate or not.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  38. Long term FS format by Anonymous Coward · · Score: 0

    I hear that Reiser is good for 20 to life.

  39. no expects the spanish inquisition! by Jose · · Score: 1

    gotta be careful.

    --
    The basic sleazeware produced in a drunken fury by a bunch of UCBerkeley grad students was still the core of BIND. --PV
  40. individual file size limit of 4gb though... by RMH101 · · Score: 1

    ...for FAT32. this is noticeable in an age of dvd backups...