Slashdot Mirror


Which OSS Clustered Filesystem Should I Use?

Dishwasha writes "For over a decade I have had arrays of 10-20 disks providing larger than normal storage at home. I have suffered twice through complete loss of data once due to accidentally not re-enabling the notification on my hardware RAID and having an array power supply fail and the RAID controller was unable to recover half of the entire array. Now, I run RAID-10 manually verifying that each mirrored pair is properly distributed across each enclosure. I would like to upgrade the hardware but am currently severely tied to the current RAID hardware and would like to take a more hardware agnostic approach by utilizing a cluster filesystem. I currently have 8TB of data (16TB raw storage) and am very paranoid about data loss. My research has yielded 3 possible solutions: Luster, GlusterFS, and Ceph." Read on for the rest of Dishwasha's question. "Lustre is well accepted and used in 7 of the top 10 supercomputers in the world, but it has been sullied by the buy-off of Sun to Oracle. Fortunately the creator seems to have Lustre back under control via his company Whamcloud, but I am still reticent to pick something once affiliated with Oracle and it also appears that the solution may be a bit more complex than I need. Right now I would like to reduce my hardware requirements to 2 servers total with an equal number of disks to serve as both filesystem cluster servers and KVM hosts."

"GlusterFS seems to be gaining a lot of momentum now having backing from Red Hat. It is much less complex and supports distributed replication and directly exporting volumes through CIFS, but doesn't quite have the same endorsement as Lustre."

"Ceph seems the smallest of the three projects, but has an interesting striping and replication block-level driver called Rados."

"I really would like a clustered filesystem with distributed, replicated, and striped capabilities. If possible, I would like to control the number of replications at a file level. The cluster filesystem should work well with hosting virtual machines in a high-available fashion thereby supporting guest migrations. And lastly it should require as minimal hardware as possible with the possibility of upgrading and scaling without taking down data."

"Has anybody here on Slashdot had any experience with one or more of these clustered file systems? Are there any bandwidth and/or latency comparisons between them? Has anyone experienced a failure and can share their experience with the ease of recovery? Does anyone have any recommendations and why?"

224 of 320 comments (clear)

  1. Repeat after me: by Anonymous Coward · · Score: 5, Insightful

    RAID is not a backup solution!

    1. Re:Repeat after me: by NFN_NLN · · Score: 5, Insightful

      Parent currently is marked as "0" but is dead on. His opening statement talks about a data loss (x2), is "very paranoid about data loss" and his closing remarks talk about "ease of recovery". Your statements suggest you are primarily concerned about data loss.

      Clustered filesystems are complex software that specialize in concurrent server access, not increased redundancy.

      You need to research backups and/or remote replication. Or buy an enterprise file server that does everything including call-home when it detects a hardware issue.. not waste time on a CFS.

    2. Re:Repeat after me: by NFN_NLN · · Score: 2

      And don't forget about RPO. If you want synchronous file replication over any useful distance we're talking $$$. If asynchronous is acceptable then decide what an acceptable RPO is, along with your data change rate. With those you can decide if you can afford offsite replication. Most business decide nightly tapes are acceptable at that point.

    3. Re:Repeat after me: by Enfixed · · Score: 1

      Heh, just buy a bunch of 3TB My Book's on black Friday and call it a day. ;)

      --
      Sigs are bad for you...
    4. Re:Repeat after me: by NFN_NLN · · Score: 5, Insightful

      Except when they do support redundancy:

      http://www.gluster.com/community/documentation/index.php/Gluster_3.2:_Creating_Replicated_Volumes - Replicated volumes replicate files throughout the bricks in the volume. You can use replicated volumes in environments where high-availability and high-reliability are critical.

      RAID is still NOT A BACKUP!

      I have a 500 node replicated filesystem... and I just overwrote the wrong file, or a virus infected a file, or the file got corrupted...

      The good news is my 500 replicated nodes are all consistent. The bad news is... wheres my fucking file!

    5. Re:Repeat after me: by afabbro · · Score: 2

      Clustered filesystems are complex software that specialize in concurrent server access, not increased redundancy.

      Bingo. Spot on perfect answer.

      --
      Advice: on VPS providers
    6. Re:Repeat after me: by Anonymous Coward · · Score: 1

      Two key values need to be defined: the RPO (Recovery Point Objective - how much data can you afford to lose?), and the RTO (Recovery Time Objective - how long a recovery process is acceptable?)

      Low RPO ==> big dollars.
      Low RTO ==> big dollars.
      Both together ==> huge dollars.

      To me, the description of the problem screams out, "Get a backup solution in place!" I wrangle TSM (Tivoli Storage Manager) for a living, but that's probably too expensive for most people. AMANDA is probably the option I'd look at for home use. LTO4 drives can be had relatively cheap second hand, now that LTO5 is on the market (and has been for over a year; lots of people are jumping over), and the media's not bad on a dollar/GB basis - around $AU50 for a 800 GB (native capacity, 1.6 TB if you believe the 2:1 compression hype) LTO4 cartridge - compare that with $AU132 for a 2 TB HDD, and then consider that tape is much more reliable when it's moved around frequently. Organise something with friends or family (people you trust - remember, this is a copy of your system data we're talking about here) so you can keep a copy offsite. You might lose a day or two worth of data in the worst case, but that's probably as good as you'll get without spending very big bikkies on a high speed link to a data centre somewhere.

      Ultimately, the OP's description is too vague. Step one: define what it is you need to defend against. Is it system downtime? Data loss due to hardware failure? Data loss due to fat-fingered sysadmins? Something else? You can't define an effective solution until you know what the problem is.

    7. Re:Repeat after me: by Anonymous Coward · · Score: 1

      At least you have backups. Whatever possessed you to choose RAID 5 though?

    8. Re:Repeat after me: by rwa2 · · Score: 2, Informative

      Yeah, subby just needs to:

      • delete some porn. Sure, it's a good feeling to know it's all there, but you really just watch the top 1% over and over. "The redyouwankjizzhutdb Cloud" will do when you just want some random fix.
      • compress the rest. There's no reason you need lossless 1080p masters of all your home videos of your kids spitting up. A nice h264 compressed archive can be enjoyed more often, is more portable to all your mobile devices, and you'll barely notice the loss of quality when someday you might get to run it through an upsampling 3D holodeck reconstruction algorithm
      • Once it's a bit more manageable than 8TB, sort and periodically rsync the "important" stuff (i.e. the stuff you created yourself and can't re-download from someone else) to a backup server, preferably offsite at a friend/relative's house. You can start it with a 2TB "sneakernet" and just do incremental updates over the net thereafter.
      • don't worry about the unimportant stuff, random movies/mp3s. Let go of the hoarding, gotta catch 'em all mentality. Your time is much better spent elsewhere than collecting and organizing other people's crap for some false sense of "completion" achievement.
      • Here's a nice goal to keep in mind, if some burglar breaks into your house and swipes your file server and holds it for ransom (more than the cost of the hardware), you should be able to just say "meh, I can just restore from the offsite backup"
    9. Re:Repeat after me: by houghi · · Score: 2

      When I hear people talk about backup, the first thing I start to do is start talking about restoring. To me restoring is much more important then the backup.

      Many people think that a backup is a copy of files. Partly: if I overwrite a file I still want the original and not the overwritten file when I notice it after a month, so incremental is a must.

      Most (ok, till now all) restores I do is because of human stupidity. I delete or overwrite the wrong file. So I want to be able to do an easy restore. For my home directory that would be something like "cp backup/file file" or with any file browser as a GUI for the latest version available.

      So when I started looking for a backup, started to look how I wanted my restore to behave and then looked what the best backup solution was to achieve that. So besides the easy restore my parameters were:
      No programs needed for restore, except for standard stuff like cp and mount
      Running from cron
      no GUI
      Workable over a network
      Incremental

      This excluded already a lot of programs.

      At this moment I use storeBackup without the compression. I understand that other people will have other requirements and will get to something else (including writing their own program).
      What I took away from all this that the important part is restore, not backup. When you start looking from that angle, many things are already a lot easier to decide.

      --
      Don't fight for your country, if your country does not fight for you.
    10. Re:Repeat after me: by vikingpower · · Score: 1

      Amen. Mod that guy up. His answer resumes the entire sense-making part of the discussion here.

      --
      Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
    11. Re:Repeat after me: by somersault · · Score: 1

      For those that only need a tiny set of files backed up, you can use stuff like Dropbox, Ubuntu One, etc. as a convenient addition to any other backup system you have. Those automatically synch to other devices, and keep a cloud backup with previous versions.

      To do your own Dropbox-like backup system, I suppose you could use a content versioning system like svn/git. I hadn't really considered doing something like that until now. It would be less user friendly than Dropbox, but you wouldn't have to pay a subscription, and could have ^gt;100GB repositories. You'd be able to restore using normal commands from a disk if you wanted, but you could also restore from your own remote server.. I may have to try this once my backup requirements grow :)

      --
      which is totally what she said
    12. Re:Repeat after me: by TheRaven64 · · Score: 2

      So you revert to the last snapshot. Or you mount the last snapshot and recover that file (you are making regular snapshots of your volumes, right?). That is not the problem with RAID. The problems that RAID does not address are:

      • What happens if there is a bug in the filesystem driver that causes the disk to be slowly filled with nonsense? Or the machine is compromised and malware overwrites the existing data.
      • What happens when thieves steal the server? Or when lightning strikes and fries all of the disk controllers?

      The second point is addressed by distributed filesystems - they may steal the server, but they (probably) won't steal the servers in both data centres at the same time. The first one can only really be addressed by off-site backups onto write-only media, or at least onto media that are removed from any machine that can write to them and stored safely.

      RAID is not a complete backup solution, but RAID and snapshots do provide most of the benefits at a fraction of the cost. If you want the other benefits, you need to be willing to spend a lot more money.

      --
      I am TheRaven on Soylent News
    13. Re:Repeat after me: by GameboyRMH · · Score: 1

      Beware that these disks come with an unusual partition layout. The 3TB Mybook works more like a NAS than a regular enclosure. A friend of mine took the disks out and plugged them into a Windows box, which didn't like the partition layout and asked some poorly-worded question that was really asking him whether he wanted to write a new partition table to the disk, and he hit Yes. I couldn't recover anything, I even tried photorec but the drive was full of Usenet downloads that weren't preallocated so they were too fragmented to recover.

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
    14. Re:Repeat after me: by Anarke_Incarnate · · Score: 1

      Loose is the opposite of tight. Lose is when something is lost. RAID is NOT evil. BAD RAID is evil. If you have proper RAID then you get checksums and can recover. You should also have backups.

    15. Re:Repeat after me: by Slashdot+Parent · · Score: 1

      RAID is not a backup solution!

      OP never said it was.

      It must be hard to walk with your knees jerking around like that.

      --
      They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
    16. Re:Repeat after me: by Bastardchyld · · Score: 1

      Parent and GP are correct, I think you are looking at this all wrong.

      Now I personally use a Solaris 11 Express box at home for my storage needs, you can serve iSCSI, FC, CIFS, NFS, whatever you need. I plan on switching to OpenIndiana when that is released, combine this with Crashplan (as a cloud backup provider with Solaris binaries) or a form of local offsite backups then you shouldn't ever have an issue with total data loss again.

      I have documented a lot around ZFS and KVM on http://blog.allanglesit.com./ -matt

      --
      $diff terrorists hippies
      $
      $rm -rf *terrorists *hippies
    17. Re:Repeat after me: by Sancho · · Score: 1

      I currently have 8TB of data (16TB raw storage) and am very paranoid about data loss.

      Maybe s/he has backups in place, and maybe not. But clusters and RAID aren't about data loss--they're about continuity when disks go down.

    18. Re:Repeat after me: by nedlohs · · Score: 1

      Clustered file systems are also about redundancy. Sure it's nice that your RAID system will survive a HDD failure and allow access to your files without interruption while it spins up the spare. But what if the motherboard fried or the PSU or whatever. A clustered file system lets you handle that without interrupting access (with the same caveats - if you lose more than N you are down, etc).

      But yes, redundancy isn't the be all and end all of preventing/recovering from data loss. rm -rf X/ on a RAID or a clustered file system does just as much damage as to a single disk file system.

      Backups are much more important, and much more boring.

    19. Re:Repeat after me: by Unequivocal · · Score: 1

      I thought raid was about coping with continuity and data loss? Say I do full/diff backups every night. With a raid set up, if a disk fails during the day, I won't have go back to last night's backup and lose all the data from today. I can pop in a new disk to replace the old one and I get continuity of service and no data loss? Now if the notion is that raid is a replacement for backups, that's crazy on its face, and I fear that is what the OP was implying (not sure - he didn't mention his backup strategy anyway). Are we saying the same thing?

    20. Re:Repeat after me: by Sancho · · Score: 1

      Yeah, I was imprecise, or looking at it too narrowly.

      RAID isn't a solution for the problem of data loss. Exactly for the reasons you state.

      Sorry :)

    21. Re:Repeat after me: by AvitarX · · Score: 1

      Ubuntu one won't protect you from stupidity though, as it syncs in fairly real time.

      And, if for some reason they foul up and delete your stuff, all your local copies are gone, you are adding a potential stupid user to the loss of all your data.

      (note, I say this as a happy Ubuntu One user (both music and files, actually paying even), but it is NOT a backup).

      --
      Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
    22. Re:Repeat after me: by somersault · · Score: 1

      Ubuntu one didn't work very well for me when I first tried it (soon after it came out), though I presume they've made it better now.

      Dropbox does keep track of previous versions, but yes I suppose taking an offline snapshot of your files from time to time is important depending on what kind of data it is.

      --
      which is totally what she said
    23. Re:Repeat after me: by growse · · Score: 1

      I'm thinking along this line, I've got a Nexenta box, but planning on migrating to Solaris 11 / OpenIndiana and running crashplan on it.

      My issue is how to approach the ZFS block devices exposed as LUNs on FC. Solaris can't peek inside these (they're formatted as either VMFS-5 or NTFS), so crashplan can't do file-based backup on them. Short of getting a subscription for every client and the NAS as well, I can't see a better solution :(

      --
      There is nothing interesting going on at my blog
    24. Re:Repeat after me: by Enfixed · · Score: 1

      Yeah, I always hide the partition which offers a minor level of additional safety. http://www.marccizravi.com/2010/remove-wd-smartware/ Granted they should never come with that partition setup in the first place.... but for the price.. eh.

      --
      Sigs are bad for you...
    25. Re:Repeat after me: by Bastardchyld · · Score: 1

      Assuming you are using this at home... You ought to be able to get away with the family plan, which lets you backup unlimited data from "every computer in your house" for $6/month. Though you could also look at using ZFS send and receive to another ZFS box.

      -matt

      --
      $diff terrorists hippies
      $
      $rm -rf *terrorists *hippies
    26. Re:Repeat after me: by jd · · Score: 1

      From a purely technical perspective, what is the actual difference between shoving a mirror RAID copy of a drive into a rack of backups and shoving a mirror image copy of a drive into a rack of backups? Beyond the fact that the first is done in hardware and the second is done in software.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    27. Re:Repeat after me: by jd · · Score: 1

      So had CODA been continued, that couldn't have been used for increased redundancy as well as concurrent server access?

      Seems to me that clustered file systems are INTENDED for concurrent server access and are OPTIMISED for that, but can be used for anything you damn well please especially if they've got the extra facility built-in.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    28. Re:Repeat after me: by jd · · Score: 1

      You're assuming every variant of RAID under the sun is identical. It is NOT. Different RAID schemes do different things. RAID 6 (which allows 2 drives to fail) cannot result in the entire RAID array being lost from a single error. You need at least 3, 2 of which you must have knowingly ignored. Mirror RAID can NEVER have a failure of the RAID array from errors because you aren't striping the data at all. The loss of 1 disk is the loss of 1 disk. The loss of 1 sector is the loss of 1 sector. That is all.

      1 error can certainly shut down a striped array but only people needing extremely high-speed data transfers (think CERN) or damn fools (every other user) use striped arrays without either running them through a mirror (CERN can't afford them, damn fools don't want to afford them) or some other backup solution.

      Google doesn't "backup" anything in the conventional sense - there is no master or slave concept in their system, it's completely emancipated. Data is duplicated, sure, but duplication != backing up, though all backing up is duplication.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  2. Re:You Should... by RobDollar · · Score: 2, Insightful

    If ever, this article is the case for your comment. Dishwasha, what the living fuck are you doing with your life. Answer that and then maybe, just maybe, coherent answers will abound.

  3. Obligatory: RAID is not a backup by Anthony+Mouse · · Score: 5, Insightful

    Is the only reason you're looking at a clustered filesystem that you don't want to lose data? Because if it is, it's probably not what you want. The purpose of a clustered filesystem is to minimize downtime in the face of a hardware failure. You still need a backup in the case of a software failure or in case you fat finger something, because a mass deletion can replicate to all copies.

    1. Re:Obligatory: RAID is not a backup by chrb · · Score: 2, Informative

      If you have more than one server then it's pretty easy to set up rsync with rolling backups (rsnapshot or rdiff-backup or whatever) which is more of a proper backup solution. It's also probably a bit easier to administrate than a clusterfs.

      Having said that, Hadoop's HDFS looks quite good. AFAIK it is pretty robust, and it runs on top of an existing FS so you won't need to repartition, which is useful. FUSE file system driver, and Java, will be a bit slower than in-kernel, but probably not an issue for bulk data storage.

      Oh, and another option is the Distributed Replicated Block Device. Though this is basically network RAID and not replication on a per file basis.

    2. Re:Obligatory: RAID is not a backup by Anonymous Coward · · Score: 1

      HDFS is a really really /really/ bad suggestion considering the users reqs. There is zero reason to use HDFS if you're not using hadoop itself for processing. Period. The fuse plugin I'm sure has improved, but last I'd used it, it was several levels of hell dealing with; we dealt w/ it purely because it had some benefits as a side band way of injecting data into our hadoop processing.

      Beyond that, HDFS still has a single point of failure- the metadata/name node (the rest are bricks in gluster fsterminology). Which is pretty contrary to what the dude was looking for considering his description above...

    3. Re:Obligatory: RAID is not a backup by SuperQ · · Score: 1

      And of course what the post really wants is a DISTRIBUTED filesystem. Not a clustered filesystem.

    4. Re:Obligatory: RAID is not a backup by Enfixed · · Score: 2

      Totally agree, the clustered approach doesn't seem to solve the problem posed. It's simple, buy a bunch of 2TB drives and set them up with ZFS. Configure a nightly snapshot job to another similar machine and call it a day. You can have a larger storage area with a fully redundant backup for less than 2K in parts.

      --
      Sigs are bad for you...
    5. Re:Obligatory: RAID is not a backup by Demonantis · · Score: 1

      He needs to get priorities in order. I would say raid is probably what he wants for the most part for like you said hardware failure. An online backup service for the stuff he truly needs to back up. I sincerely doubt a single person can amass 8 TB of data that would be critical to have. Having it all is nice, but definitely not realistic.

    6. Re:Obligatory: RAID is not a backup by Anonymous Coward · · Score: 1

      Having said that, Hadoop's HDFS looks quite good. AFAIK it is pretty robust, and it runs on top of an existing FS so you won't need to repartition, which is useful. FUSE file system driver, and Java, will be a bit slower than in-kernel, but probably not an issue for bulk data storage.

      HDFS is not a solution. It doesn't provide POSIX capabilities such as random writes and altering existing files. Although FUSE lets you mount it and make it look like a regular FS, you need to make sure apps that use it only use the features that it supports otherwise, the apps will start getting errors when doing disk operations and potentially going down in flames when they try to save files or some such thing.

    7. Re:Obligatory: RAID is not a backup by allenw · · Score: 2

      The fuse support has likely gotten worse since no one on the core dev team really spends any time with it. I'd be surprised if it still compiles.

    8. Re:Obligatory: RAID is not a backup by TheRaven64 · · Score: 1

      rsync? Ouch! Sounds like a great way to wear out the disks and waste a lot of CPU.

      If you want cheap remote replication, then use ZFS on both ends, and use zfs send / receive to move hourly snapshots to the machine that isn't running any services and isn't accepting any incoming traffic except the ZFS data stream.

      --
      I am TheRaven on Soylent News
    9. Re:Obligatory: RAID is not a backup by chrb · · Score: 1

      By default rsync just checks modification time and file size to determine whether a file has changed, so it is reasonably quick. ZFS remote snapshots do look cool though.

    10. Re:Obligatory: RAID is not a backup by GameboyRMH · · Score: 1

      I'd go further and say that RAID should *ONLY* be used for high availability (RAID 1/5/6) or high performance (RAID0) or in unusual cases where combinations are required (I don't need to tell you if you do data center/supercomputing stuff). For any other purpose RAID doesn't make sense. Want your data safe? Get a backup drive. Want expandable storage? Use LVM/ZFS/BTRFS or some other filesystem-level solution.

      Even when you do want to use RAID, if performance isn't absolutely critical I recommend software RAID. Not even firmware RAID like Intel Matrix tech, but slow-ass software RAID, because if you use hardware RAID it's a bitch when you change your hardware. If you use software RAID then you just slap new stuff in and continue like nothing happened.

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
    11. Re:Obligatory: RAID is not a backup by Slashdot+Parent · · Score: 1

      The purpose of a clustered filesystem is to minimize downtime in the face of a hardware failure. You still need a backup in the case of a software failure or in case you fat finger something, because a mass deletion can replicate to all copies.

      The OP was talking about losing data due to hardware RAID controller failure. To me, that sounds like he is asking how to protect against hardware failure, not against accidental file deletion/destruction.

      Backup is important, no doubt, but it sounds like OP was asking about protecting against hardware faults.

      --
      They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
    12. Re:Obligatory: RAID is not a backup by Anthony+Mouse · · Score: 1

      slow-ass software RAID

      Software RAID isn't even that slow. I mean think about it, which is faster? Your 3GHz Core i7 or whatever five dollar coprocessor they put on your RAID card?

      The only time hardware RAID makes sense anymore is if your workload was already CPU bound. And then the purpose isn't to make disk access faster, it's to offload the parity calculations to free up the CPU for other things.

    13. Re:Obligatory: RAID is not a backup by GameboyRMH · · Score: 1

      True, software RAID's speed penalty these days isn't much, that's sort of my point - it's only an issue if performance is super-important.

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
    14. Re:Obligatory: RAID is not a backup by Doc+Hopper · · Score: 1

      I'm gonna throw in another vote for ZFS with Remote Replication. I currently manage a few hundred petabytes of storage and we rely on it day-in, day-out for disaster recovery, archival, and site mirroring. Combine regular snapshots with continuous or scheduled remote replication and a decent backup strategy storing tapes off-site and you have a pretty bullet-proof disaster recovery and data integrity plan.

      That's last bit is really key. ZFS is much better than plain old RAID for verifying data integrity. It's a huge selling point, and the horror stories of multiple-component-failure we've still recovered data from only because the underlying filesystem is ZFS are legion.

      Throw out the idea of "cluster" filesystems. Kind of pointless for what you're talking about. Set up two ZFS arrays on the two computers you're going to use. I'd recommend mirrored or triple-mirrored vdevs if you're performance-conscious, RAIDZ2 or RAIDZ1 + spare if performance is less critical and you're tight on disks; you want to be able to weather at least two simultaneous disk failures (and preferably a path failure, too) without any issues. Make sure both systems are already set up with the IP address range you expect them to use; moving a remote replica to a new IP is sometimes an exercise in frustration. Or set up an SSH tunnel as described in the FreeNAS documentation.

      Get your initial replica up and running and set up a cron job to kick off replications at regular intervals. You can also write a little daemon to monitor the replication and start the next RR job the moment the one before completes, but that's a bit complicated. We do it all the time, but still, it's a little more complicated than cron.

      One read/write master, and one read-only replica. At any time you can also reverse the relationship if needed. Set up hourly, daily, weekly, and monthly snapshots so you can recover from an "oops".

      Backing up to tape is where you really get hit in the pocketbook. Whether you need tape or not is up to you; for many situations tape makes great sense, for other situations it does not. Many less-critical installations do fine with an outsized area for snapshots (typically we reserve about 25% of the total space for snapshots) and an extended snapshot preservation window. It all really depends on the volatility of your data. If you're like most users, you don't really "churn" your data a lot: things tend to stay where they get put once they are where they're supposed to be. And you flush out old movies or whatever a few times a year.

      The cool thing about ZFS is that it scales very well. Whether you need just a snapshotting filesystem for a single drive in your notebook computer or a 200-spindle half-petabyte array synchronizing data across a continent, it can handle most tasks. There are a few corner cases where I wouldn't use it -- mammoth media farms and OLTP databases requiring huge throughput as well as great transactional performance come to mind -- but for a home user it's easy to use and overkill all at the same time :)

      Disclaimers: Yes, I work for Oracle. Yes, I'm a huge fan of ZFS, and I was exposed to it because I work here. But that's really irrelevant to the fact that it beats the tar out of every home-brew snapshotting/backup/replication system I've tried over the past seventeen years.

    15. Re:Obligatory: RAID is not a backup by Doc+Hopper · · Score: 2

      Mass delete.

      ZFS with a snapshot schedule. Sorted, as long as you catch it within the reach of your oldest snapshot.

      Overwrite with bad data.

      ZFS with a snapshot schedule. Sorted.

      Silent filesystem corruption.

      ZFS. Sorted.

      Batches of disks at one end of the bathtub curve.

      ZFS verifies the data, and when your disks poop out the data is rendered read-only long before just about anything else would have realized there's a problem.

      Trees going through your roof.

      ZFS scheduled remote replication to a second array at your buddy's house. All your data remains intact, including snapshots to protect against all the above issues.

      Bets are off if the tree hits you, though.

    16. Re:Obligatory: RAID is not a backup by Atzanteol · · Score: 1

      Ummm. What? Wear out disks? Waste CPU? I'm pretty sure you know nothing about rsync...

      --
      "Ignorance more frequently begets confidence than does knowledge"

      - Charles Darwin
  4. PronFS by igny · · Score: 2

    Where is PronFS when we desperately need one?

    --
    In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
    1. Re:PronFS by Jeremi · · Score: 2

      Where is PronFS when we desperately need one?

      It's widely available... these days it goes by the name "the Internet".

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
  5. Nagios Monitoring by Anonymous Coward · · Score: 1

    Would recommend you look at nagios monitoring - you can monitor your raid with that. Has saved me a number of times (always nice to be notified when something fails).

  6. I know this isn't what you asked but... by KendyForTheState · · Score: 5, Interesting

    20 disks seems like overkill for your storage needs. Seems like the more disks you use the greater the risk of failure of one or more of them. Also, your electricity bill must be through the roof. I have 4 3TB drives with a 3Ware controller in RAID5 array which gives me the same storage capacity with 1/5th the drives Aren't you making this more complicated than it needs to be? ...Maybe that's the point?

    --
    ...I just came for the free beer.
    1. Re:I know this isn't what you asked but... by Kagetsuki · · Score: 1

      Please, if you're going to make such good posts don't do it as AC - you deserve the karma. It just so happens we're looking at a scheme almost exactly like what you outline, we've currently got a dual server "humming" configuration with on and off site backups but we need something more serious after getting an influx of customers.

    2. Re:I know this isn't what you asked but... by doodleboy · · Score: 2

      I also have a 3ware card and four 1 TB drives in RAID 5 in my 10.04 desktop PC at home. Some of that space is exported via iSCSI to a couple of Windows boxes. Then I back the RAID array up with a couple of external SATA drives. My wife thinks this is excessive, but I lost a lot of data, once, nothing critical but stuff I cared about, emails and papers from college, pics of friends and family, etc. But when the drive started throwing SMART errors I thought, yup, better go pick up a new drive soon... 3 days later, it was dead.

      The irony is that one of my main responsibilities at work is backups, mostly with shell scripts I wrote myself.

      Many of you probably have most of your important stuff on one drive that you don't back up. At the very least, pick up an external USB drive and schedule backups for anything you care about.

    3. Re:I know this isn't what you asked but... by rwa2 · · Score: 1

      Yeah, if it were me, instead of a RAID10 with exotic hardware, I'd split it across a few cheap servers and run software RAID6 for a more hardware-agnostic approach. Then use something like OpenAFS (which I unfortunately have 0 actual experience with) to make those servers look like one filesystem to clients. That should get you a good bang for the buck, since motherboards and tower chassis that can fit 6 disks and gigabit networking hardware is relatively cheap compared to JBODs and junk.

      Lustre and OCFS2 are more suited for homogenous cluster performance, so accessing the data wouldn't be very convenient. At least with OpenAFS you can run clients on Windows and OSX as well. With the cluster FS's I don't think it's even safe to run different kernels on the nodes.

      I've read unflattering things about the performance of GlusterFS, even if you do have exotic multi-homed SAN fabrics to run it on. Never heard of subby's last option. I had also tried to get CODA working for the longest time, but it still seems too complicated and experimental compared to *AFS.

      On my own home system, for most of the past 8 years I've been running a hybrid software RAID of 4x 250GB disks, with one set of partitions in RAID0 for /tmp , one set in RAID10 for performance, and one set in RAID5 for maximum storage. (And my important dirs rsync'd offsite to a friend's server which I donated hardware towards). This setup has survived about 2 disk failures over the years. The oldest file in my home dir goes back to 1998 or so.

      But if he really insists on lots of on-line storage, check out this custom box linked from slashdot a few months back:
      http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/

    4. Re:I know this isn't what you asked but... by dbIII · · Score: 1

      I've recently ditched RAID10 and gone for RAID6. With a decent controller and a lot of drives it's not much slower and you can lose ANY two drives. With RAID10 you may be able to lose up to 7 drives if it's one from each pair - but if you lose two from the same pair you could have a big hole in every single large file on the array. The only multiple disk failures I've ever had were adjacent overheating drives anyway.
      Tape or USB storage is good - you can't overwrite something that is in a box in another building. I just recovered a mailbox today from tape because the original and the day old offsite mirror had been overwritten with a nearly empty mailbox.
      Some sort of live mirror of spinning disks that is kept synchonised every now and again is nice and handy in a lot of situations but is no more a backup than RAID is. For that you need something that isn't easy to overwrite.

    5. Re:I know this isn't what you asked but... by GameboyRMH · · Score: 1

      HA RAID is a waste for home systems. Is it unacceptable if the home server goes down for a few hours? No? Then don't waste drives on live redundancy, just use backup drives. If you want easily expandable storage use a filesystem-level solution like LVM, ZFS or BTRFS - no live redundancy, just use all the drive space and back it all up to external disks or another server or NAS box.

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
    6. Re:I know this isn't what you asked but... by KendyForTheState · · Score: 1

      Backups!

      --
      ...I just came for the free beer.
    7. Re:I know this isn't what you asked but... by Doc+Hopper · · Score: 1

      ZFS with a RAIDZ2 VDEV. 3 disks of data, 2 disks of parity, 1 disk spare for resilvering when one of your cheap-ass 3TB disks eats it (and it will!). If it were me and I were hard up against a budget, that's the way I'd go. Decent performance and 9TB storage with all the data integrity, variable block size, compression, encryption, and deduplication benefits of ZFS, but more spindles would be better.

      If performance is what you want, a triple mirror is hard to beat. You can pick up 7200RPM drives for dirt cheap and high capacity. Good data redundancy and performance, at just three times the price :)

    8. Re:I know this isn't what you asked but... by Doc+Hopper · · Score: 1

      Snapshots are read-only. Budget a little excess capacity to hold snapshot churn and an off-site replication setup using a filesystem that supports snapshots. It's an unconventional backup, but satisfactory for many uses.

    9. Re:I know this isn't what you asked but... by g00ey · · Score: 1

      I think it is in place to post the following information about files systems and the risk of data corruption:

      (the information within this post is derived from a forum discussion with a user named "Kebabbert" so credits should go to him(/her never met him irl) for the excellent information on this post)

      Regarding shortcomings in hardware RAID, here is a whole PhD dissertation showing that normal file systems are unreliable:

      http://www.zdnet.com/blog/storage/ [...] t-risk/169

      Dr. Prabhakaran found that ALL the file systems shared

      ...ad hoc failure handling and a great deal of illogical inconsistency in failure policy...such inconsistency leads to substantially different detection and recovery strategies under similar fault scenarios, resulting in unpredictable and often undesirable fault-handling strategies.

      We observe little tolerance to transient failures;...none of the file systems can recover from partial disk failures, due to a lack of in-disk redundancy.


      Regarding shortcomings in hardware RAID:

      http://www.cs.wisc.edu/adsl/Public [...] fast08.pdf

      "Detecting and recovering from data corruption requires protection techniques beyond those provided by the disk drive. In fact, basic protection schemes such as RAID [13] may also be unable to detect these problems.
      ..
      As we discuss later, checksums do not protect against all forms of corruption"


      http://www.cs.wisc.edu/adsl/Public [...] icde10.pdf

      "Recent work has shown that even with sophisticated RAID protection strategies, the "right" combination of a single fault and certain repair activities (e.g., a parity scrub) can still lead to data loss [19]."

      CERN discusses how their data was corrupted in spite of hardware RAID:

      http://storagemojo.com/2007/09/19/ [...] -research/

      Here is a whole site that only talks about the lacks and shortcomings in RAID-5:

      http://www.baarf.com

      Lacks and shortcomings in RAID-6:

      http://kernel.org/pub/linux/kernel [...] /raid6.pdf

      "The paper explains that the best RAID-6 can do is use probabilistic methods to distinguish between single and dual-disk corruption, eg. "there are 95% chances it is single-disk corruption so I am going to fix it assuming that, but there are 5% chances I am going to actually corrupt more data, I just can't tell". I wouldn't want to rely on a RAID controller that takes gambles :-)"

      In other words, RAID-5 and RAID-6 are not safe at all and if you care about your data you should migrate to other solutions. In the past the disks were small and you were much less likely to run into problems. Today when the hard drives are big and RAID clusters are even bigger you are much more likely to run inte problems. Assume that there is a 0.00001% chance that you run into problems, if the hard drives are large and fast enough you will run into problems quite frequently.

  7. ZFS by Anonymous Coward · · Score: 4, Informative

    LVM, mdadm & Ext4 or ZFS seems like it would be more then adequate for this. A 2U server can hold 36TB of raw data with software raid and consumer disks. 2.5" would be preferable for home use considering power usage unless your a fellow Canadian; in which case servers make great space heaters.

    1. Re:ZFS by DarkDust · · Score: 1

      I do have a ZFS setup of currently 6 disks and I really recommend buying server-grade HDDs, unless you have set up a monitoring system that tells you whenever a HDD is failing so you can buy a new one.

      Until half a year ago I used normal USB HDDs that you can buy everywhere. My experience was that they simply aren't meant to be always on and fail pretty soon. I usually had a failed HDD once every quarter year. It drove me mad. Almost one year ago I started using these HDD docks where you can put two 2,5" or 3,5" HDDs into and bought HDDs that where labeled for server use. After half a year they still ran fine, so each time a normal USB HDD failed I replaced it with another dock. Haven't had a single failure since then. Nice bonus: double the amount of HDDs I can connect to the server (speed isn't so much an issue as space in my case). The solution with these docks with better HDDs costs more at first but turns out to be cheaper in the long run.

    2. Re:ZFS by AliasMarlowe · · Score: 1

      Until half a year ago I used normal USB HDDs that you can buy everywhere. My experience was that they simply aren't meant to be always on and fail pretty soon. I usually had a failed HDD once every quarter year.

      It seems to depend on the disk manufacturer.

      I have had similar poor experiences to you, but only with two Seagate 2TB "green" USB drives, where one failed without a SMART warning (its SMART data became suddenly spattered with badness) and another reached the "Failure Imminent" status and was promptly retired. They were both in the same environment as two other 2TB USB disks which have been in use for slightly longer without any issues - one is WD, the other is a Buffalo with a WD disk inside. I've replaced the two dead disks with older 1TB USB drives. They're all used for backup of our two home servers, and are cycled between working in a well-ventilated 20C low humidity environment and storage in a dry 0C to 20 enclosure in another building.

      --
      Those who can make you believe absurdities can make you commit atrocities. - Voltaire
    3. Re:ZFS by KendyForTheState · · Score: 1

      I had a server that had been running for 8 years on the same mirrored desktop-quality 40GB drives, 24/7, without a hiccup. I replaced the power supply and fans more than once during that period, but the drives never failed. Finally replaced the server with new hardware and virtualized the old server on it. The point is that you never can tell with hard drives. I've had them fail soon after installation, and I've had them last for years. I do tend to always use enterprise-rated drives now in servers, but it's no guarantee they'll last longer.

      --
      ...I just came for the free beer.
    4. Re:ZFS by c6gunner · · Score: 1

      I do have a ZFS setup of currently 6 disks and I really recommend buying server-grade HDDs, unless you have set up a monitoring system that tells you whenever a HDD is failing so you can buy a new one.

      Sounds like you're just unlucky, or USB disks really suck.

      I first built my Solaris/ZFS server in 2007, using 5x500gb WD disks. I added 1tb disks in early 2009. I finally swapped out the original 500 gb disks in late 2010, replacing them with 2tb drives. So the original 500 gig drives ran for 3+ years without an issue, and the 1tb drives have been going almost 3 years now as well. This is in a 24/7 system, of course.

      BTW, the actual system drive is an old Maxtor 120 gig IDE that was manufactured in 2003. It's now been running for almost 5 years 24/7, plus another 4 years of on-and-off use prior to that. It's still humming along just fine, but I recently used one of those 500 gig drives to mirror it, just in case.

      So I guess YMMV. I've only seen 3 consumer drives fail in my entire life - only one of which was mine - and I've never lost an array to a disk failure. As for monitoring, you can just set up a shell script that runs every half hour and sends you an e-mail if finds any corruption. It's not hard to do.

      I also looked at a USB solution like the one you're talking about, but decided to just buy a big case instead. Each of those docks costs $50-$60 - assuming I use 4 drives internally, I'd still need to spend $200+ for docks. Instead, I picked up a Cooler-Master HAF for $140, and a couple drive-bay conversion brackets for $20 each. For the same price as the USB docks, I get a kick-ass case that can house 11 drives internally. Of course, I then had to blow another $180 to get a decent controller card for them, but the speed difference made it well worth the money.

  8. Rsync + VPN by GeneralTurgidson · · Score: 1

    Setup a mirrored server at a parents/relatives house that's preseeded and run rsync jobs to it. Add more storage and you can do generations too.

    1. Re:Rsync + VPN by imemyself · · Score: 1

      Yep...I do this with Unison so writes on both sides can be replicated. Granted - I'm not replicating significant amounts of data, I've heard Unison may have problems with large volumes of data. But I think the Internet connection would be more of an issue than that.

      --
      Every time you post an article on Slashdot, I kill a server. Think of the servers!
    2. Re:Rsync + VPN by hawkinspeter · · Score: 1

      I used to use Unison for that, but it's a bit sensitive to having the versions the same. I switched back to rsync as that allows me to upgrade one side and still be able to replicate.

      --
      You're a temporary arrangement of matter sliding towards oblivion in a cold, uncaring universe
    3. Re:Rsync + VPN by GameboyRMH · · Score: 1

      You could encrypt the storage on the AWS server. Yeah encryption without physical security always means they can rip the key from the RAM, but it's an acceptable level of security for many uses.

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
  9. Re:ReiserFS by KendyForTheState · · Score: 3, Insightful

    Uh... he DID confess to the crime AND lead the cops to his wife's body. I know...sarcasm, right?

    --
    ...I just came for the free beer.
  10. You still need to make a decision by 93+Escort+Wagon · · Score: 4, Insightful

    You ask about the technical specifications; but, when commenting regarding the three likely candidates you found, you've put philosophical objections first and foremost. I think you first need to figure out which factor is more important to you - specs, or philosophy. Otherwise you're probably going to waste a lot of time arguing in circles.

    --
    #DeleteChrome
    1. Re:You still need to make a decision by SoupIsGood+Food · · Score: 2

      Philosophical objections are valid. It's why people decided to go with Open Source solutions in the first place... chose the right philosophy, and you're buying into a system that will have developer and user support for a long time, and pay off in more features implemented in satisfactory ways.

      If a tech has the right vision, it will go a long, long way, where pure technical excellence on its own is no guarantee the tech will grow with the user.

  11. Production ready... by Anonymous Coward · · Score: 1

    We've had a few problems with Gluster (nodes getting out of sync and corrupting data - despite following the docs to the letter). Very nice in theory, and will be great if the stability gets a bit of work, but until then I'm hesitant to recommend it. We've also found the performance a bit lacking.

    1. Re:Production ready... by Marillion · · Score: 1

      I'm working on a multi-institution team doing biomedical research and one of the team members is using Gluster. It's 200TB of high resolution microscopy spread across six brick (aka: nodes) systems. I don't know if the vendor misconfigured it, but it is a complete pig of a system. It's slow. Painfully slow. We ended up copying active data to a small 12TB consumer NAS for analysis and leave the Gluster as the permanent archive.

      --
      This is a boring sig
  12. No ZFS? by theskipper · · Score: 4, Interesting

    How about ZFS with your RAID controllers in single drive mode (or worst case JBOD)? Let ZFS handle the vdevs as mirrors or raidz1/2 as you wish. ZFSforLinux is rapidly maturing and definitely stable enough for a home nas. Or go the OpenIndiana route if that's what you're comfortable with.

    My 4TB setup has actually been a joy to maintain since committing to ZFS, with BTRFS waiting in the wings. The only downside is biting the bullet and using modern CPUs and 4-8GB memory. Recommissioning old hardware isn't the ideal way to go, ymmv.

    Just a thought.

    1. Re:No ZFS? by JoeMerchant · · Score: 1

      ZFSforLinux is rapidly maturing and definitely stable enough for a home nas.

      ZFS has been maturing rapidly for the last 6 years... Didn't it almost make its way into OS-X at one point? I'm not sure I'd put all of my eggs in that particular basket (or any single system, really).

      If it's backup you want, I'd look into a system that copies off of one type of file-system into another. Ever since my QNAP TS-109 took a dump, and with it my data because of their proprietary "Linux" partition formatting, I've stuck to nice simple low performance solutions like 2TB USB drives straight out of the box. They are readable on Win/Lin/OSX as well as being plug-and-play on anything that has called itself a personal computer in the last 10 years. If you need something esoteric like a single 20GB volume, then this isn't the way, but I think the wise course is to find a way to not need something esoteric.

    2. Re:No ZFS? by hjf · · Score: 3, Insightful

      ZFS isn't free anymore. It's all commercial and proprietary and no bugfixes or anything get released outside a big bad support contract with Oracle.

      If you want the free version you can still use v28 on FreeBSD and Solaris Express (no upgrades in over 1 year). Works great, the only thing you don't get is ZFS crypto (transparent encryption).

    3. Re:No ZFS? by bill_mcgonigle · · Score: 1

      If you want the free version you can still use v28 on FreeBSD and Solaris Express (no upgrades in over 1 year).

      or linux.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    4. Re:No ZFS? by Marsell · · Score: 1

      Odd. The company I work for uses ZFS on many thousands of disks, we don't pay Oracle a dime, and we shovel code back to illumos.

      Most of the top Solaris talent jumped the Oracle ship long ago. A lot of them are committing code to illumos as part of the jobs.

    5. Re:No ZFS? by Zemplar · · Score: 2

      Go read that "new" Oracle license and you'll realize Solaris isn't nearly as free as it once was.

      Too bad, Solaris was gaining more momentum while it was available for free for any purpose, not just "...only for the purpose of developing, testing, prototyping and demonstrating your applications, and not for any other purpose."

    6. Re:No ZFS? by GameboyRMH · · Score: 1

      I'm going to switch some of my backup drives to BTRFS real soon now. The filesystem itself is stable and from what I've read, if your filesystem has a problem that the current btrfsck tool can't fix, it's pretty much FUBAR'ed anyways. Waiting time's over.

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
    7. Re:No ZFS? by GameboyRMH · · Score: 1

      For encryption it may be better to run your filesystem of choice on a drive encrypted with dm-crypt & LUKS. It's totally transparent and can work with any app or filesystem. If you use a filesystem's encryption features it could make recovery more difficult vs. a lower-level transparent encryption system, with dm-crypt & LUKS recovery is no different than an unencrypted drive.

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
    8. Re:No ZFS? by c6gunner · · Score: 1

      But BTRFS doesn't support any kind of parity yet, right?

    9. Re:No ZFS? by GameboyRMH · · Score: 1
      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
    10. Re:No ZFS? by c6gunner · · Score: 1

      Oh, sweet! I think the last time I really looked into it, even that wasn't available.

      I'm a big fan of ZFS, but from what I've read on BTRFS it seems to have the potential to be far better as a home file-system. The ability to shrink volumes is a big one. Unfortunately, it's been stalled for a long time now, and even when they finally finish development (2032?) I'd be hesitant to adopt it for at least a year or two, until I'm sure all the kinks have been worked out.

    11. Re:No ZFS? by Doc+Hopper · · Score: 1

      Most of the top Solaris talent jumped the Oracle ship long ago.

      I beg to differ. I work at Oracle, and there's plenty of amazing ZFS & Solaris development talent everywhere you look. And you can scarcely throw a rock around here without thunking an Open Source or Free Software enthusiast in the head. Including yours truly.

    12. Re:No ZFS? by hjf · · Score: 1

      I think ZFS crypto can reclaim unused blocks. If you use dynamically sized zvols and crypto inside them, when the encrypted FS grows, so does the zvol. But when you delete stuff, the zvol remains the same size. If you use zfs-crypto, as you delete stuff from the zvol (i.e.: enable compression and write zeros) the zvol should "deflate". Again, I THINK that's the case. I didn't have a chance to try it since I moved away from S11X a while ago.

  13. Thoughts on OCFS by trawg · · Score: 3, Interesting

    We have been using OCFS (Oracle Cluster File System) for some time in production between a few different servers.

    Now, I am not a sysadmin so can't comment on that aspect. I'm like a product manager type, so I only really see two sides of it: 1) when it is working normally and everything is fine 2) when it stops working and everything is broken.

    Overall from my perspective, I would rate it as "satisfactory". The "working normally" aspect is most of the time; everything is relatively seamless - we add new content to our servers using a variety of techniques (HTTP uploads, FTP uploads, etc) and they are all magically distributed to the nodes.

    Unfortunately we have had several problems where something happens to the node and it seems to lose contact with the filesystem or something. At that point the node pretty much becomes worthless and needs to be rebooted, which seems to fix the problem (there might be other less drastic measures but this seems to be all we have at the moment).

    So far this has JUST been not annoying enough for us to look at alternatives. Downtime hasn't been too bad overall; now we know what to look for we have alarming and stuff set up so we can catch failures a little bit sooner before things spiral out of control.

    I have very briefly looked at the alternatives listed in the OP and look forward to reading what other reader's experiences are like with them.

    1. Re:Thoughts on OCFS by afabbro · · Score: 3, Insightful

      Thank you, this is one of the few valid answers to my primary question which is of actual experience with clustered file systems. I don't think most of the responders got the clue that I'm looking for a solution that will hopefully scale over a decade's worth of time.

      There is a question of missing clues, but I don't think it's in the responders. You either asked your question poorly or you don't understand your problem. Your question centers of being "paranoid about data loss" and yet you're discussing technologies designed to manage concurrent access to a filesystem. Do you put in gigabit ethernet when you want faster USB performance?

      I'll likely be upgrading to a Super Micro 2U Twin with QDR Infiniband

      Give me a break...

      --
      Advice: on VPS providers
    2. Re:Thoughts on OCFS by Anonymous Coward · · Score: 1

      I must first off get this out of the way: There is no alternative to actual backups. If you don't think you can scale your backups to match your storage growth, then you'll have to deal with that deficiency, and the potential for data loss that it provides. Snapshots help, but don't go the whole distance. Be aware that most of the time, snapshots are not whole copies of the volume they've snapshotted, merely a diff to the live version - one must copy the snapshot through some other method before a full copy exists (although it is conceivable that the snapshot mechanism will handle this automatically). Ultimately, though, backups to some kind of storage outside of this lovely system are necessary to cover the possibility of configuration corruption, or something. It might be more likely than you think.

      Now, to my experience with clustered file systems. IMHO, don't bother. Its quite unlikely that you'll need such a thing, especially in a home setting. I well understand the usefulness of shared storage for HA VMs and the like, but mostly, you'd be better served by ensuring that VMs running on different hypervisors with access the same shared storage don't try to access the same filesystem at the same time. Think of it as an active-passive HA setup, rather than active-active. You shouldn't need several hosts accessing the same thing for load reasons, unless you're doing some pretty *interesting* stuff. Clustered file systems sound like a great idea, but they involve lots of time and effort to set up correctly, and lots of learning to get to know where the failures can strike and how to rectify them. Some are better than others, of course. The only thing that they provide (that other solutions don't) is concurrent access from several different hosts.

      As for maintaining redundancy at the file system layer? Again, it sounds great, but I honestly can't think of a time when you wouldn't be better served by block-layer redundancy.

      Finally, I would recommend avoiding choosing any kind of proprietary system for storing data in - you'll be at the mercy of the vendor forever, and if you're not a large corporation, they will likely not care about any problems you have. In fact, avoid anything that you'd be able to describe as a black box.

      There is, however, something that doesn't seem to have been addressed: clustered file systems, while not terribly *useful* in this setting, perhaps, are nonetheless a worthy challenge for a hobbyist to set up. Using things at home means you don't have to base your decision on business reasons, and can just do things that are awesome/fun/challenging. Its certainly something I'd do for kicks.

      If this is the case, I can only recommend that you do a decent amount of testing of any alternative that seems to suit your needs. And remember that if you plan on sticking with it for a decade, you'll be stuck maintaining it for a decade. This might sound obvious, but its worth noting that there are often choices you make a file system creation time that can't be changed later - if you're going to migrate data to a new instance of the file system, with updated settings, then the whole investment was moot.

      I'm certainly interested to hear what others suggest :)

    3. Re:Thoughts on OCFS by Macka · · Score: 2

      How is this an answer to your question? You identified 3 cluster filesystem types that protect against hardware loss by distributing the data over a cluster of systems - but OCFS2 isn't like that. It's a filesystem that's designed to provide concurrent shared access to a filesystem by a cluster of servers, which in combination with a HA framework can provide a platform that applications can use to protect against node failure, not disk failure. With OCFS2 you still have to make the storage highly available with a RAID solution plus manage concurrent connectivity via a SAN, iSCSI, etc. So unfortunately this is not an answer to your question at all. The filesystem types you've identified would do what you want, but they're also expensive for a home solution because you have to throw more computers at the problem to increase redundancy and performance.

      Have you considered using software RAID (mdadm) on Linux instead of a hardware RAID controller? It has a useful feature that allows you to grow existing raid volumes by adding more disks. Maybe combine that with a small UPS to allow your system to shutdown gracefully in the event of a power failure. Alternatively, if you want to stick with a hardware solution have you taken a look at Drobo? I have no personal experience with Drobo, but from what I've read their proprietary RAID solution allows you to grow your array by just popping new disks in or increase capacity by replacing existing disks with larger ones on the fly. They have a couple of different models that can scale to 16TB. Best of luck with your search.

    4. Re:Thoughts on OCFS by Dishwasha · · Score: 1

      Since you're such a low UID I'll bother answering your question.

      Thank you, this is one of the few valid answers to my primary question which is of actual experience with clustered file systems.

      I had already thrown out OCFS2 and GFS2 as possible candidates, but that was irrelevant to my reply. Also currently I am unaware of any non-proprietary hardware or software RAID (mdadm in particular) that supports active/active or active/passive on a shared backplane at any RAID level other than 1 or 0 (i.e. DRBD) and rather expensive and not yet released Areca external RAID controllers. Also I'm looking for whitebox OSS solutions.

    5. Re:Thoughts on OCFS by drsmithy · · Score: 1

      I had already thrown out OCFS2 and GFS2 as possible candidates, but that was irrelevant to my reply. Also currently I am unaware of any non-proprietary hardware or software RAID (mdadm in particular) that supports active/active or active/passive on a shared backplane at any RAID level other than 1 or 0 (i.e. DRBD) and rather expensive and not yet released Areca external RAID controllers [areca.com.tw]. Also I'm looking for whitebox OSS solutions.

      This helps to explain more about what you want to do, but doesn't really help to explain why you want to do it (ie: is it the right solution).

      I can tell you about a solution we used to use at a previous job for bulk, cheap storage. This scaled from ~12T at the start of its life through to ~192T at the end, and was built using Linux cLVM and Promise vTrak disk shelves. It briefly also featured GFS2.

      The primary objective was lots of cheap, redundant space. Availability, and especially performance, were not high priorities. With that said, it was a (mostly) highly-available solution.

      At the base storage layer, were FC-attached Promise vTrak disk shelves. These were attached via two FC fabrics to two "controller nodes". The disks in the shelves were configured in single RAID6s of 15 drives + 1HS and presented to the two controller nodes as a single (multipathed) LUN.

      These LUNs were aggregated into a single logical storage pool using Clustered LVM, then redivided into LVs and controlled by the automounter.

      This system was used as an archival storage target for image files. It originally ran some proprietary software which would "receive" collections of image files, then derive some metadata about that collection and store them together. At the end of each day a consolidation process would create an appropriately-sized LV and move the last 24 hours' worth of images onto it. Initially, since the "receiving" process was done on the controller nodes, GFS2 was used so the load could be balanced. As the load grew, however, we decided to split off the receiving function onto dedicated machines, and with that the need for load-balancing and a clustered FS disappeared. So, we switched to ext3, reconfigured the controller nodes in an active/passive failover using heartbeat, and shared the storage to the new receiving nodes via NFS (with some SSH scripts to handle the daily LV creation on the controller nodes). Whenever a disk shelf would fill up, we'd simply buy another one, stuff it full of COTS drives, configure it up as a big RAID6, present the LUN to the controllers units and add it to the VG. With that system we went from a single shelf and 12T all the way up to eight shelves and ~192T, with something like 800 LVs averaging around 200G each.

      The weakness in this design is, obviously, the single set of disks on the back end that represent a SPOF. However, since the images were regularly rolled off to tape, and both the RTO and RPO were relatively high, it was considered a reasonable tradeoff.

      When originally conceived, Linux RAID in the CentOS of the day (4.something) was relatively immature, and in particular it lacked the bitmap capability present now. As such, the decision was made to go with the hardware RAID in the vTraks. Were I to do it again, I would use Linux software RAID, though this would necessitate some slightly more involved failover scripting to re-assemble the RAID arrays on the second host as part of a failover event. Obviously the array couldn't be active on both hosts simultaneously (though there was no real need for that to be possible anyway).

      Additionally, the storage bus in our case was FC, which makes sharing physical devices to multiple servers trivial. However, anything capable of having multiple hosts (eg: SAS) would work just as well.

      A second, similar, project was to have a more highly-available image store, with no SPOFs. For that we used pairs of servers, Linux software RAID and LVM on SAS shelves (for sufficient spindles), each shelf dedi

    6. Re:Thoughts on OCFS by dbIII · · Score: 1

      I'll likely be upgrading to a Super Micro 2U Twin with QDR Infiniband which none of the mentioned solutions have support for

      I think you are wrong, just about anything will run on those things, and although I don't have Infiniband I've noticed there are drivers for a lot of platforms.

    7. Re:Thoughts on OCFS by Kz · · Score: 1

      first, you're mixing cluster file systems (like GFS, OCFS, CXFS) and distributed file systems (Lustre, GlusteFS, Ceph). second, without backup hardware, you don't have backup; you'll lose data. at least, use a second offline or near-line array to copy to. third: "snapshots are effective backups". wrong fourth: "none of the mentioned solutions have support for (infiniband)". wrong

      --
      -Kz-
    8. Re:Thoughts on OCFS by afabbro · · Score: 1

      first, you're mixing cluster file systems (like GFS, OCFS, CXFS) and distributed file systems (Lustre, GlusteFS, Ceph). second, without backup hardware, you don't have backup; you'll lose data. at least, use a second offline or near-line array to copy to. third: "snapshots are effective backups". wrong fourth: "none of the mentioned solutions have support for (infiniband)". wrong

      Give him a break - he's a manager.

      --
      Advice: on VPS providers
    9. Re:Thoughts on OCFS by Macka · · Score: 1

      What you actually said was:

      Has anybody here on Slashdot had any experience with one or more of these clustered file systems?

      .. and OCFS2 was not on that list. From the rest of your reply it seemed to me that you were confused about the capabilities of OCFS2. My apologies.

      The rest of my comments, WRT mdadm etc, we're not related to clustering, or sharing direct attached raid devices between systems - because as others have also said, I don't think that a cluster best suits your requirements. Unless of cause you just want to do it for the fun of it, in which case go for it.

    10. Re:Thoughts on OCFS by Doc+Hopper · · Score: 1

      That's the exact same approach we (those in my storage department) follow in an awful lot of development, test, & staging environments: snapshots for primary backup, and physical backups only upon specific request.

      The strategy works, as long as you are fully aware of the window of loss you're looking at. My home backup strategy has me off-site important documents to a lockbox at a friend's house once every six months. Other than that it's just snapshots. I could tolerate losing six months of data, although it would be far from ideal.

    11. Re:Thoughts on OCFS by Doc+Hopper · · Score: 1

      Be aware that most of the time, snapshots are not whole copies of the volume they've snapshotted, merely a diff to the live version - one must copy the snapshot through some other method before a full copy exists (although it is conceivable that the snapshot mechanism will handle this automatically).

      Not quite. Imagine a snapshot this way. You have a file "A". It takes up blocks 1, 2, and 3 on your filesystem. You then snapshot the filesystem containing "A" with the name "first". You lose about 32kbytes (or so, depends on your structure) of data space to store the block layout.

      Then you lengthen file "A" into revision "B". "B" is twice as big, but the first half of the file is unchanged. It now occupies blocks 1, 2, 3, 4, 5, and 6. You snapshot again with the name "second".

      Then you alter file "B" into revision "C". This version also takes six blocks, but the changes are in blocks 2 and 3. A snapshotting filesystem cannot change those existing blocks! It has to allocate new blocks because the old blocks are marked as parts of snapshot "first" and "second". So your file now occupies blocks 1, 7, 8, 4, 5, and 6. You snapshot again with the name "third".

      Then you don't need the file anymore and delete it. Then you take a snapshot of the filesystem called "fourth".

      The full copy of the file exists. If you delete snapshot "first", blocks 1, 2, and 3 are still not available for overwrite because snapshot "second" also owns them.

      So basically, you got snapshots backward :) It's a block-level operation that flags existing blocks as always and forever -- until the snap is deleted -- frozen in time until they are freed by the snapshot being deleted. Subsequent revisions to a file & snapshots of the filesystem can re-use the data blocks, but only if an individual block is unchanged. It is in no way a "diff" from the live filesystem!

      If this explanation is confusing, feel free to email me directly: matthew@barnson.org. I'm glad to clear up misconceptions about what a snapshot is or is not, because it's a far more robust and reliable system than you suggested!

  14. AWS EBS by curmudgeon99 · · Score: 1

    Why are you spending your money like that? Sneaker-net your drives to AWS EBS. It's a no-brainer.

    1. Re:AWS EBS by Enfixed · · Score: 2

      Why are you spending your money like that? Sneaker-net your drives to AWS EBS. It's a no-brainer.

      AWS EBS = $0.10 per allocated GB per month or $102.40 per TB..... I doubt power and hardware is costing him > $819.20 a month.

      --
      Sigs are bad for you...
    2. Re:AWS EBS by afabbro · · Score: 1

      CrashPlan or a similar online backup service is what the questioner needs. But it sounds so much cooler to be discussing clustered filesystems.

      --
      Advice: on VPS providers
  15. I was going to say Lustre, but... by Anonymous Coward · · Score: 3, Insightful

    I was going to say Lustre, but then I saw that you only have 16TB. 15 years ago that would have been impressive, but these days, those supercomputers you mention probably have that much in DRAM, and their file storage is in the multi-petabyte range. Lustre is optimized for large scale clusters, in which you have entire nodes (a node is a computer, here) dedicated to I/O - bringing external data into the in-cluster network fabric, while other nodes are compute nodes - they don't talk to the outside world, except by getting data via the I/O nodes.

    That's why you'll see all this talk of OSSs and OSTs, as though they'd be distinct systems - on a large scale cluster they are.

    For only 16TB, what you want is a SAN, or maybe even a NAS.

    If you want open source, then go with openfiler. It supports pretty much everything. I haven't stress tested it, but it seems to work well for that order of magnitude of data.

  16. Tahoe-LAFS by the_brobdingnagian · · Score: 1

    Try Tahoe-LAFS.

    1. Re:Tahoe-LAFS by Dishwasha · · Score: 1

      Not a bad suggestion and more helpful than most. Thanks for the input!

  17. rm by kbrint · · Score: 1

    /bin/rm

  18. huh by madcat2c · · Score: 1

    I wonder how long it would take to backup 8TB to carbonite dot com?

  19. LTO4 by hawguy · · Score: 1

    I think the best disk-hardware agnostic solution for preventing filesystem dataloss is an LTO-4 autoloader and regular tape backups (hopefully taken off site regularly). They are pretty cheap, a superlader3 with an 8 tape (6TB/12TB) capacity is less than $3000. Or buy a refurb LTO3 autoloader for a third the price and half the capacity.

  20. Bad Dog. Wrong Tree! by SmurfButcher+Bob · · Score: 3, Insightful

    You will spend all this effort to build this solution... and then your house will catch fire.

    On the good side, the fire department WILL manage to save the basement by filling it with 80,000 gallons of water at 2,000GPM per fire engine.

    Or, you'll be wiped out by a flood. Or a drunk will drive through the side of your house. Or you'll have a gas leak and the house will detonate. Or carpenter ants will eat away the floor joists.

    Raid is not a backup solution. Neither is replication... if you whack the data, it'll likely be replicated. If you get a compromised machine somewhere, files they touch will likely be replicated. They only thing you're creating is an overly complex hardware mitigation. If THAT is how you define "data preservation"... you're doing it wrong.

    Look more for a solution to move stuff offsite - a cheap pair of N routers running Tomato or OpenWRT, to a neighbor's house, and you reciprocate with each other. Bonus points if you use versions, transaction logs, journals, etc.

    --

    help me i've cloned myself and can't remember which one I am

  21. Re:You Should... by slaker · · Score: 2

    As someone with considerably more than 8TB of porn (and a similarly vast quantity of non-porn content, handily digitized and indexed), until recently I used paired servers each holding 12TB of drives in RAID6 with 2 drives as hot spares (64 physical drives on four machines). I used rsync to maintain a second copy of all my data. I've decided that's insane, and I've moved to using a single 36TB FreeBSD server (running zfs for my storage pools) that has enough internal expansion to accommodate another 36TB without getting into external expanders. I've paired that with an LTO4 changer that I bought off Craigslist for around $1900. At the moment I have just enough tapes to have two complete copies of my data. I'd like to get another hundred tapes so I can comfortably manage grandfather-father-son backups and have some spares in reserve.

    I really don't have any confidence in common RAID with large arrays of large drives, since the possibility of a hard error during a rebuild or resync is too high for comfort. Large data sets really need to be mirrored and if at all possible stored in some offline fashion. That's really the only path to reliable storage.

    --
    -- I wanna decide who lives and who dies - Crow T. Robot, MST3K
  22. Obilg. by jampola · · Score: 1

    "For over a decade I have had arrays of 10-20 disks providing larger than normal storage at home"

    At home?? I've met some people pretty fanatical about their porn collections but this hits some new highs! Kudos to you, Sir!

  23. Paranoid about your data? Do off site backups! by Anonymous Coward · · Score: 1

    Take it from someone who has been there no better resource than IRON MOUNTAIN to store a backup copy of your data.

    Offsite every day for full or once a week depending on how important your data is..do Fulls and delta's

  24. Hadoop HDFS by mrcheesyfart · · Score: 1

    You can use Apache Hadoop's HDFS. http://hadoop.apache.org/hdfs/ It is fairly simple to set up, very scalable, and it is very easy to set up a replication factor so that all your data is replicated 2, 3 or even more number of times across your cluster. It is used at many places for distributed computing, but I see no reason that it couldn't serve you well as a large personal file service.

    1. Re:Hadoop HDFS by diekhans · · Score: 1

      it's not a POSIX file system

    2. Re:Hadoop HDFS by allenw · · Score: 1

      but I see no reason that it couldn't serve you well as a large personal file service.

      HDFS is not POSIX or mountable. So actually using the data from something that is expecting POSIX is going to painful. "But there is a FUSE plug-in!" Yes, there is, but you'll take a 60% perf hit using it, assuming that it still works in newer versions of Hadoop. See none of the hardcore devs actually use it, so there is a very good chance it is completely busted.

      In any case, there are still problems around losing the fsimage and having no real HA for the NN, needing quite a bit of RAM for any significant amount of files, don't forget that 8TB now turns into at least 24TB counting the 3x replication factor, etc, etc, etc.

      So no, really this isn't a solution for this particular problem.

  25. unRAID by aarongadberry · · Score: 1

    Unraid works well for a home solution. I had 2x2 TB drives fail within a one month period of time and lost no data.

    1. Re:unRAID by Jumperalex · · Score: 1

      Ditto for unRAID. Simplest description, JBOD with a parity disc. It is "more" than that but not much. Then again, that makes it:

      -simple
      -hardware agnostic
      -cheap
      -minimum hardware requirements
      -low power
      -easily expandable (think heterogenous drives that can also be swapped with a larger drive when needed)
      -excellent community support
      -files on drives are independantly readable even if the array is broken
      -non-proprietary ReiserFS, readable even on a windows machine with free driver
      -expandable to 20 data drive array (read 40 TB using 2tb drives)

      Cons:
      - not amazingly fast but WAY faster than a DROBO (saturate a 1gbit line) ... fast enough to serve multiple HD video streams
      - only beta supports 3TB drives at the moment

      --
      If you can't be good, be good at it!
  26. Drobo? by varmittang · · Score: 1, Informative

    Drobo pro with 3 TB drives setup with dual redundancy will get you 18 Gigs of drive space. In the future, just swap out drives as drive sizes get larger and you can continue to expand. www.drobo.com

    --
    -----BEGIN PGP SIGNATURE-----
    12345
    -----END PGP SIGNATURE-----
    1. Re:Drobo? by speedingant · · Score: 2

      Slow as molasses though. Way slower than any other solution out there..

    2. Re:Drobo? by Ralphus+Maximus · · Score: 2

      Stay away from Drobo. I just bought a new unit, and the only way to see how much "real" free space you have is to use their windows based dashboard program. I have five 1TB drives in mine, the dashboard reports 3.5tb, while the filesystem mounted on both windows and linux report I have 17TB of space on the drives. I contacted their support, and they say it's as designed.

      Cheers,
      RM

      --
      Nobody's as dumb, as I appear to be
    3. Re:Drobo? by LoRdTAW · · Score: 3, Interesting

      STAY AWAY FROM DROBO!

      I had a client ask me to set one up for them. You don't partition it like a standard raid array, you format it to some predetermined size that may be larger then the physical disk space in the machine (through their drobo dashboard). If you have three 1 TB disks you will have around 2TB of actual storage but you can format it for 16TB under Win 7. This is achieved via their "beyond raid" technology which fools the OS into thinking there is more disk space than there actually is. This lets the user make one large volume now and then add disks in the future, even disks of different sizes can be mixed and matched. If you start to go beyond the physical capacity, the array degrades and goes offline until you add another disk and wait hours or days for the disks to reorganize. My client was consolidating her photography library to the drobo when it just crapped out. Turns out she ran over the physical limit.

      Then if your lucky, your computer be it Apple or Windows will take upward of 30 to 45 minutes to boot and shutdown if the fucking thing is plugged in and powered on during either of those two procedures. Drobo recommends you move your data to another set of disks and re-format your drobo. As if people have a few spare TB of disk capacity just sitting around, that's the reason they bought your shit box to begin with, assholes. Its a known issue.

      I have personally used hardware raid 5, software block level raid 5 and ZFS. If you ask me id rather have the file system do the RAID work at the file system level, not the block level where the file system is ignorant of what lies beneath. ZFS is the way to go until BTRFS is fully stable and feature competitive with ZFS. Then you do incremental backups offsite, either to a family or friends house or to a commercial off site backup provider.

      And what is your 16 TB consist of? If its movies and the like then don't bother spending money backing it up. If its self make video and other personal large files then that makes sense. I know of people who spent oodles of cash to backup silly crap like downloaded movies that can easily be replaced or rented from Netflix.

      With the way things are going in the storage world, SSD's will eclipse mechanical disks at the desktop level and mechanical disks will be relegated to backup duty where they far outstrip SSD's in capacity. It reminds me of when tape drives were the king of capacity, often tapes were several orders of magnitude larger than current hard disks and tapes were cheap. They were slow but my god did they have capacity. Now it looks like SSD's will assume the role of desktop storage and to some degree server storage while mechanical disks will be used for large backup systems and file servers. Mechanical hard drives of today will be tomorrows tape drives and then obsolete when SSD's begin to overtake then in capacity. By then we might have something even higher in capacity like holographic or some other sci-fi sounding storage.

    4. Re:Drobo? by horza · · Score: 1

      That sounds about right. What exactly is the problem?

      Phillip.

    5. Re:Drobo? by Ralphus+Maximus · · Score: 1

      Mount the share, type df -h, get back from the OS: size 16T Used 2.0T Avail 14T Use% 13%. Start windows, load the dashboard program, click on free space, get Used 1.99T, Free 1.59T, Total 3.58T.

      See the problem?

      Hint: It lies to the OS on capacity and free space.

      Cheers,
      RM

      --
      Nobody's as dumb, as I appear to be
    6. Re:Drobo? by horza · · Score: 1

      No that seems correct to me. The dashboard is showing the real free space excluding redundancy, ie 3.58TB. This is because data is repeated across the drives to give redundancy, so you don't lose any data if any one drive fails. It's like a custom raid 5. However the free space is shown as 17TB as you have to format the drive to a certain size so this is the default, though you can probably change it. This way when you add another drive you don't have to reformat the whole RAID array. If you added another 1TB drive the dashboard space would change to eg 4.2TB, but the OS would still see 17TB. If you go over the 3.58TB then it will warn you that you are losing redundancy and need to add another drive.

      I know all this and don't even own a Drobo. I got it from the first online review I read. Did you read the manual?

      Phillip.

    7. Re:Drobo? by Ralphus+Maximus · · Score: 1

      A lie is still a lie. Once I reach 3.58 TB, I run out of space even though the NAS reports plenty. And where do I put a new drive? I already have all 5 slots filled.Even if I installed all 5 slots with 3tb drives, which is the largest it supports (for now), I still won't get the advertised 17tb.3+3+3+3+3=15tb, minus 3 for the "raid 5", leaves 12tb. If I can't trust a NAS to do math correctly on the file system size, how can I trust it to keep my data safe?

      Every raid I admin, with the exception of the Drobo, has the ability to accurately report the volume size.

      As far as reformatting the raid when adding a drive, I can resize all day long with LVM and not lose data.

      And yes, I did read the manual. It mentions nothing about the volume size problem. The only thing it mentions is if you don't want to use the dashboard, map the drive in windows just like any other NAS.

      Cheers,
      RM

      --
      Nobody's as dumb, as I appear to be
  27. Stop with experimental shit by ArchieBunker · · Score: 1, Insightful

    Seriously stop with the experimental and filesystem projects still in beta. You need one that is matured and time tested. Do a bit of research. I don't even run RAID and have yet to permanently lose anything in probably 20 years.

    --
    Only the State obtains its revenue by coercion. - Murray Rothbard
    1. Re:Stop with experimental shit by QuantumRiff · · Score: 1

      People sure seem to think clustering is the key to everything..

      I'm with you, tried and tested.. I like when a client mentions how they have 2 standby database servers in remote locations, with almost live replication, so they can survive everything... I ask them what happens if someone types "drop table user; commit;" in oracle.. Sure enough, it replicates to the standby's, just like its designed to... (they really get upset when I point that out too)

      Same thing with files.. I've seen way to often the mirrored site just replicate the deletes ;)

      --

      What are we going to do tonight Brain?
  28. Oracle Cluster File System or Global File System by xose · · Score: 1

    http://en.wikipedia.org/wiki/Global_File_System
    http://en.wikipedia.org/wiki/OCFS

  29. Lustre by JerkBoB · · Score: 3, Informative

    Lustre is pretty cool, but it's not magic pixie dust. It won't break the laws of physics and somehow make a single node faster than it would be as a NFS server. It's for situations when a single file server doesn't have the bandwidth to handle lots of simultaneous readers and writers. A "small" Lustre filesystem these days usually has 8-16 object storage servers serving mid-high tens of TB. The high end filesystems have literally hundreds of OSSes and multiple PB served. The largest I know of right now is the 5PB Spider filesystem at Oak Ridge National Labs.

    One nice thing about Lustre on the low end is that you can grow it... Start out small and add new OSSes and OSTs as you need them. This often makes sense in Life Sciences and digital animation scenarios where the initial fast storage needs are unknown or the initial budget is limited (but expected to grow). But if you're never planning to get beyond the capacity of a single node or two, Lustre is just going to be overhead. I don't know much about the other clustered filesystem options.

    --
    A host is a host from coast to coast...
    Unless it's down, or slow, or fails to POST!
    1. Re:Lustre by Dishwasha · · Score: 1

      Yeah, the complication is why I'm leaning more towards GlusterFS, yet so far Lustre is more proven. Unless I get some useful anecdotal experience here I'll probably model out all three solutions with VMs and do my own comparisons and performance analysis. Maybe I'll even post my experiences and results here afterwards.

  30. Fix the machines first... by k9mach3 · · Score: 2

    Lustre - no replication (it's on the roadmap for sometime in the next few years), and it relies on access to shared storage (read: FC/iSCSI disk array, and if that fails you loose your data.). OCFS - no replication, designed for multiple servers accessing one array. Ceph - has replication, but still in active development, and somewhat complex. Good if you don't mind loosing your data (it's in alpha... if it breaks, you get to keep both pieces...) GlusterFS - I have no experience with it, but it seems to be pretty stable at this point. And has some degree of replication with is a plus. If all you're going for is replicated storage across two systems I'd recommend just setting them up separately and rsync'ing from one to the other. Otherwise, one filesystem crash will take out all your data - parallel filesystems can buy you some reliability, but still can't be considered "backup" strategies. And you still need to pay attention to things like RAID (at least RAID6! RAID5 is likely to fall apart after one disk failure with >2 TB disks),

    1. Re:Fix the machines first... by Dishwasha · · Score: 1

      http://wiki.lustre.org/index.php/Lustre_2.0_Features lists filesystem replication as a benefit of Luster 2.0 back in November 2009. I won't be running any RAID since my requirement isn't really to reduce number of disks used by relying on parity. One or more replication partners/mirrors will handle that function. Rsync won't work for the aforementioned clustered virtualization needs.

  31. Performance by speedingant · · Score: 3, Informative

    What kind of performance are you after? If you're not after anything over 40MB/S, I'd go for unRAID. I use this at home and it's brilliant. I've replaced many drives over the years, and I've had two hard drives fail with no massive consequences (data isn't striped). Plus, many many plugins are now available. SimpleFeatures (replacement gui), Plex Media Server, SQL, Email notifications with APCUPSD support etc etc.

  32. Two servers using ZFS by drsmithy · · Score: 1

    One as the primary, sharing space via NFS for your VMs and whatever else. Throw a couple of SSDs in there for caching.

    The second replicating from the first (via ZFS send/receive, or just simple rsync) with snapshotting for backups and regular syncs to some off-site data store for truly irreplaceable data.

    This is the setup I use at home, and it sits behind a 3-node VMware cluster, several desktop PCs (one of which boots from the main server over iSCSI), and couple of media PCs.

    Other than that, your requirements seem a bit confused. "Cluster filesystem" looks to be a buzzword being thrown out there without any actual need for same. "the cluster filesystem should work well with hosting virtual machines in a high-available fashion thereby supporting guest migrations" is a non-sequitur as neither a cluster filesystem, nor high-availability are a necessity for "guest migration".

    What are your key requirements here ? Data reliability is a lot easier (=cheaper) to achieve than high availability, and it's a struggle to see how real high availability could be any sort of requirement in a home server scenario.

    1. Re:Two servers using ZFS by Dishwasha · · Score: 1

      Hmmm.....I'll have to look more deeply in to ZFS as I keep hearing it thrown out there. I should have probably qualified my statement as "thereby supporting live guest migrations". The non-sequitur was basically a hint thrown in to suggest what I meant by high-availability for those less likely to catch or understand the subtle distinction of what high-availability typically means. Just like most other things in life, key requirements may be more basic than what I've described, but if the car salesman throws in the air freshener in free with the car, I'll take it as long as it doesn't stink.

    2. Re:Two servers using ZFS by stewartjm · · Score: 1

      I'll second ZFS. Set up a 4, 6, or 10 disk raidz2(comparable to raid6) or a 5, 7, or 11 disk raidz3(3 "parity" drives). Raidz1(1 "parity" drive), and the similar raid5, are both too fragile for today's huge disks.

      ZFS does checksumming of all sectors it stores, and you can have it verify those checksums to test data integrity at any time by running a scrub, it's a good idea to run 1-4 scrubs a month.

      I currently run a Freebsd 8.2, 10 disc raidz2 array in one box, which is backed up over rsync to a linux box on to standalone xfs and/or ext3 formatted disks. The filesystems are served to various windows/linux/etc boxes and VMs using Samba and NFS. After having zero problems with the zfs box, over the past year, I'm planning to build a 2nd ZFS box(probably FreeBSD again, though I might try out a Solaris based distro first). This'll let me start using ZFS syncs, which will let me keep the full snapshotting history on both machines. If that works for a year or more, I'll retire and re-use the linux box components.

      If possible use server grade hardware to build said boxes, supermicro/tyan/etc. motheboards, hot swap bays(Norco cases are a good way to get a lot of bays cheap), etc. Or at the very least run an AMD AM3 system that supports ECC ram(that's what my Linux box currently is). And put the box(es) on a UPS(Uninteruptible Power Supply), and configure auto-shutdown at low battery.

      When you run ZFS you do not need or want a hardware raid controller, it'll just get in the way. I've had good luck with various LSI HBAs. The older 1068 based cards(br10i,3081-8, 3082-8,etc.) are cheaper but don't support 2.5TB and larger drives. The newer 92xx cards(9240-8,9211-8,m1015,etc.) are a bit more but they have full support for 3-4+ TB hdds.

    3. Re:Two servers using ZFS by drsmithy · · Score: 2

      I'll have to look more deeply in to ZFS as I keep hearing it thrown out there. I should have probably qualified my statement as "thereby supporting live guest migrations".

      Well, you still don't need a clustered filesystem for that. Or are you using file-backed virtual disks and using your data storage servers as virtualisation hosts as well ?

      If you are, my advise would be to split out your data storage to a separate set of machines. So:

      Two data storage servers, software RAID6 (or RAIDZ2 if ZFS), replicating, serving NFS, CIFS, iSCSI, etc. You can set them up as a failover pair as well if you really want HA, but you'll need to either a) be prepared for slow NFS performance (sync mount) or b) get some hardware that will let you make a caching SSD visibile to both hosts.

      X virtualisation hosts, mounting the data store via NFS. You will be able to live migrate VMs between them regardless of whether the NFS sever is a pair of HA machines, standalone, or requires manual intervention in case of a failure. Obviously in the latter two cases there is the potential for data loss within the VMs for the data they keep on virtual disks, should the storage server suffer a hard failure - but your need to store important, up-to-the-minute data on those VMs should be fairly low, if it exists at all.

      This is basically the system I have at home, and also nearly identical to the production system at my last employer (albeit with a much higher-end, fully redundant NetApp NAS rather than DIYed).

    4. Re:Two servers using ZFS by Anomalyst · · Score: 1

      Throw a couple of SSDs in there for caching.

      Expounding upon that for the ZFS neophytes.
      L2ARC sits in-between, extending the main memory cache using fast storage devices - such as flash memory based SSDs
      http://www.zfsbuild.com/2010/04/15/explanation-of-arc-and-l2arc/

      --
      There is no right to feel safe thru security vaudeville at the expense of everyone's freedom, privacy and tax money.
  33. Re:You Should... by JWSmythe · · Score: 2

        I have to ask, what the hell are you going to do with 8TB of porn? What's the total runtime of all of that?

        Consider, the whole Doctor Who series. 202GB is almost 11 days, 20 hours of runtime. Assuming roughly the same size, which may allow for higher resolution video with better compression, and rounding 202GB at 11 days 20 hrs down to 11 days (giving you bigger files per hour) you'd be looking at roughly 435 days.

        If you beat your meat for an hour a day, ever day, you'd have 10,455 days (or 28.6 years) of masturbation material.

        If you're just a perv, and for some reason like to have porn playing to enhance the ambiance of your home (which may be a bit funky with the aroma of semen and lube), assuming you sleep for 8 hours a day and don't need to have the porn playing while you sleep, you could leave it playing for every waking hour for 1.8 years before you ever watched the same smut twice.

        Based on those numbers, you haven't viewed all the videos to even ensure you downloaded what you think you did. You most likely you have a significant number of malware invested decoy videos. Well, unless you believe that all those DRM signups and codec suggestions are legitimate.

        So, based on this, why the hell do you need, or think you need, all that stuff and storage? There is a word for it. "hoarder". You should consider asking your shrink about disposophobia, and hypersexuality through masturbation. You can get help. It will save you a fortune in lube and unnecessary computer gear.

       

    --
    Serious? Seriousness is well above my pay grade.
  34. cheap NAS by borgasm · · Score: 1

    go buy 2 or 3 cheap 8-10TB NAS devices

    cycle one of them through every few months for a backup, and then store it at another physical location

    that will run you less than $3000 total and a lot fewer headaches

    1. Re:cheap NAS by carnivore302 · · Score: 1

      This is the best advise I've seen so far.

      --
      Please login to access my lawn
  35. Wow... by RecoveredMarketroid · · Score: 1

    You're serious about protecting your porn...

  36. MooseFS is Solid and mature by Anonymous Coward · · Score: 1

    I have been using MooseFS for over a year now, it has proven to be amazingly solid, and very easy to set up and manage. I am running a 600 TB install that is maintaining over 40 million files for a large music service.
    Check out the MooseFS website:
    http://moosefs.org

    Moose can also run on any Unix like system, so you are not restricted to Linux, I have connected Linux, FreeBSD and Mac OSX systems to it, it also scales very cleanly and was much faster in our initial tests than GlusterFS. I highly recommend it!

  37. what are you trying to fix? by dameon · · Score: 1

    Is online redundancy (IE availability) your concern? Or is it recover-ability?

    If your concern is the ability to recover in the event of hardware failure, you are over complicating the situation. I have about 1.5 TB of "data" between pictures of the family, movies, music, games, configs, documentation, and the list goes on. So, my primary storage server at home has 2x 2TB Western Digital Green drives that are just in a simple Linux software mirror. I also have two more disks that alternate between my house and a safe deposit box at the bank. About once a month (or more frequently if I add files to my server), I rsync my data to the disk at home, and take it to the bank.

    The script that syncs does a simple rsync --delete -avx /blah/ /backup/ I also mount /blah (the source) as read only while I do the rsync to prevent something stupid from happening.

    Now, you mentioned you had a large array, and that's fine. I'd buy a few 3TB drives and create a volume group with them, create your /backup on that volume group, and do the same thing. These are backup disks, they don't need to be fast.

    I don't trust hardware raid (specialized controller raid), and while I am a unix admin, and manage large GPFS, Ibrix, and GFS clusters at work, I think that simplicity is always better.

    The safe deposit box costs me about $25 / year, and keeps me safe in the event of a fire, theft, meteor, zombie invasion, etc.

    A friend suggested that I just put a few drives in one of his servers, and rsync via ssh to his box. I don't want to do this for two reasons.
    1) I don't have a lot to hide, but I don't really want everyone poking through all my pictures and whatnot
    2) I'm lazy, so I'd probably script it up and I wouldn't think about it until I needed it. So, it wouldn't prevent me accidentally blowing data away on the replica before I noticed I blew something up.

    --
    Remember, a truly wise man never plays leapfrom with a unicorn
  38. Re:You Should... by NotQuiteReal · · Score: 3, Interesting

    I have to ask...

    I'll go out on a limb and say it is just hoarding behavior. I wouldn't be surprised if slaker (53818) has a whole bunch of other stuff, besides data, but at least the data hoarding takes up less room than books, and isn't as sick as animal hoarding...

    Having observed some hoarders, first hand, I think something goes off in their head that is like a "gotta collect them all" flag. It usually is concentrated on a favorite subject, but it could even be set off with garbage, like tearing open a package and setting down the wrapper... one is trash, but, if it is not discarded, the second one is the "start of a collection", and off they go.

    --
    This issue is a bit more complicated than you think.
  39. Re:Bad Dog. Wrong Tree! by neonsignal · · Score: 1

    I guess you're not a smurf.

  40. Re:You Should... by CheshireDragon · · Score: 1, Insightful

    With as cunty as women are these days, why would anyone want a GF?

    --
    "That's right...I said it."
  41. Re:You Should... by slaker · · Score: 1

    Digital hoarding. Yes. It's a terrible disease barely kept in check by the constant threat of all the newspapers and empty cereal boxes that you apparently think occupy the remainder of volume in my home.

    That was sarcasm.

    No, I really don't have abnormally large collections of anything else. I have a half-dozen long boxes of comic books and perhaps a dozen full bookshelves. My home is actually quite tidy. I just have an odd hobby, which is far off-topic anyway. I note that no one has commented on the technical merits of my storage strategy.

    --
    -- I wanna decide who lives and who dies - Crow T. Robot, MST3K
  42. Not a viable solution by pavera · · Score: 3

    If you're looking to have any kind of decent performance in your VMs this just won't work.
    I've worked with VMs on all different kinds of storage (fiber channel SAN, local disk, iSCSI SAN (over 1Gb and 10Gb ethernet), Local hardware raid, NFS file shares, GFS2 (as in the RedHat cluster file system), and MooseFS and GlusterFS) All of these have been either in large test labs or in production cloud deployments. I've never had a cluster file system get close to passing muster as a storage medium for VM usage. IO is the number 1 bottleneck in virtualized environments, and these schemes just add completely unacceptable latency and bandwidth restrictions.

    The only way to really run VMs is fiber channel SAN, local disk (or hardward raid), or iSCSI with 10GbE (on the storage server side). Even iSCSI with 2GbE (2x1GbE bonded) is not speedy enough to support more than 5-10 VMs running concurrently. You'll start to see problems at 5 VMs if the VMs are windows... For whatever reason Windows really likes to write to the disk. Currently I have 4 servers in my basement, a single storage server (6 2TB drives in a raid6, giving 8TB of usable disk) and 3 VM servers (2 2TB drives each, in hardward RAID1). I run the VMs locally and back them up to the storage machine over iSCSI nightly. I also have a shared volume on the storage system that all VMs and my household computers can access. I use openfiler for my storage system, if I had the money it would be nice to get a second storage server and replicate it (which openfiler supports), but I don't have that cash just sitting around right now

    Backing up 8TB of data (ok, so I have about 5TB used), is basically impossible offsite, so we have a "special" folder on the shared drive that is backed up using crashplan, its about 600GB, and the first backup took nearly 3 months over a 5mbps upload.

    The above setup is the only one I've found that is both a) somewhat affordable, and b) performs well enough to do actual work in the VMs. It provides for some mobility in the event of a hardware failure (if a VM server crashes, I can run the crashed VMs via iSCSI on another server (from the day old backup), If the storage server crashes, the only "important" data is the 600GB in the special folder... which would take 2 months to download over my home connection... But could be downloaded in stages, IE get the most important stuff immediately). If both a vm server and the storage server crash, I'm out the VMs that were running on the vm server, but again the important data is off-site, and the VMs can be rebuilt in a day or less.

    1. Re:Not a viable solution by liquidweaver · · Score: 1

      Have you heard of ATA over Ethernet? It's the bees knees. For VM, it's pretty much better than iSCSI in every way I have seen. It's scales out in a peered fashion. It's really damn efficient, and last time I set one up the total configuration process took me around 10 min, given support is already in the kernel. Want to go faster? Grab a 10G switch - no Fibre channel required. Bottlenecks? Only if you design it that way - you could just have a "switch full of HD's" if you like. I may or may not have set up a 2 PB AoE installation for the Marines... just saying... :)

      --
      mov ah, 4ch
      int 21h
    2. Re:Not a viable solution by drsmithy · · Score: 1

      I've worked with VMs on all different kinds of storage (fiber channel SAN, local disk, iSCSI SAN (over 1Gb and 10Gb ethernet), Local hardware raid, NFS file shares, GFS2 (as in the RedHat cluster file system), and MooseFS and GlusterFS) All of these have been either in large test labs or in production cloud deployments. I've never had a cluster file system get close to passing muster as a storage medium for VM usage. IO is the number 1 bottleneck in virtualized environments, and these schemes just add completely unacceptable latency and bandwidth restrictions.

      Not to put too fine a point on it, but VMFS is a cluster filesystem and runs VMs exceptionally well.

      Also, NetApp (and EMC, and others) have demonstrated multiple times that the performance difference between FC, NFS and iSCSI is negligible (single-digit percentages). Certainly not enough to be a dealbreaker when it comes to choosing which one to use.

      And in terms of convenience, cost and manageability, NFS destroys them all.

      The only way to really run VMs is fiber channel SAN, local disk (or hardward raid), or iSCSI with 10GbE (on the storage server side). Even iSCSI with 2GbE (2x1GbE bonded) is not speedy enough to support more than 5-10 VMs running concurrently.

      Maybe if your VMs have high IO bandwidth needs, but most do not. In the same benchmarks I mentioned above, so long as there isn't a bandwidth constriction (ie: you're interested in IOPS) 1GbE iSCSI/NFS is just as fast as 4/8Gb FC or 10GbE iSCSI/NFS (and probably faster than your backing storage). 100MB/sec is a *lot* of data to sustain and there are few needs for it even in commercial environments, let alone the home server scenario being discussed here.

    3. Re:Not a viable solution by pavera · · Score: 1

      I point out that my solution is affordable... How many nics are on your SAN? 15k rpm sas drives are more than twice as expensive as 7200rpm sata. It may well be that the iscsi hardware we were using didn't have the io ops to handle the load but it really felt like a bandwidth issue, Same iscsi San with 10gb vs 1 gb nics performed well enough..

    4. Re:Not a viable solution by pavera · · Score: 1

      I use a Script I found on VMware forums that takes a snapshot and then backs up the snapshot... It's not fool proof but crash consistent at least. Yeah no live migrations...

    5. Re:Not a viable solution by Matheus · · Score: 1

      Somewhat off topic but... At pure transfer speeds (if you are actually getting that 5Mbit) you can send 600GB in roughly 11 days. Given this is the real world but seriously. If it takes you 3 months to transfer 600GB you really need to have a conversation with your ISP or whomever you're sending that data to about some SLA details, use a different method of transfer, OR stop streaming pr0n and hosting a game server while you're doing your backup ;-)

  43. Why bother? by guruevi · · Score: 1

    Simply use ZFS across your drives. There is no way you can use all your resources (network bandwidth, disk bandwidth) even on a low-end machine unless you get to ~50-200TB and require more than ~100,000 IOPS (which is doable on a single machine loaded with SSD, memory and 10GbE). There are setups that offer 1PB with 1M IOPS running on 2 very beefy (failover) hosts, only after that, distributed becomes necessary (unless of course you need geographical distribution).

    Distributed file systems are nice if you know what to use them for. If you don't (as you already admit being lackluster with eg. your RAID setup), you'll risk losing more data to it than it will ever help you. Yes, doing it wrong has a much higher chance of your data getting lost than simply going for a single machine.

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
  44. Re:The Cloud, obviously. by crudd · · Score: 1

    Or the 'internet'...

    --
    I only post when im drunk.
  45. Re:You Should... by Artifex · · Score: 1

    As someone with considerably more than 8TB of porn[..]

    Can't you just afford to lease a girlfriend, boyfriend, or goat at this point?
    The goat even has a pretty good shredder built in.

    --
    Get off my launchpad!
  46. Best License by SpaceLifeForm · · Score: 1

    Quickly reviewing, I would go with GlusterFS. GlusterFS is free software, licensed under GNU GPL v3 license. Lustre Filesystem is GPL, but tainted by Oracle as you noted. Ceph is LGPL. I would go with the license you are most comfortable with. OrangeFS is also LGPL which you may wish to check out.

    --
    You are being MICROattacked, from various angles, in a SOFT manner.
  47. 3 disks are just al vulnerable by dutchwhizzman · · Score: 1

    Because you get the same amount of single sector failures, no matter what the capacity of your discs is. As soon as they can slam more data on the same surface, they will do so, because the commercial threshold for data loss seems to be the chance of single sector failure.

    Also, if i had only 10 SATA discs for virtual machines image storage, I'd be really unhappy, let alone three. in the summary it clearly states hosting VMs for HA is a requirement. Judging by the number of disks without looking at the requirements is bad, m'kay?

    Because you need the VMs with HA, I'd really be looking at enterprise level storage with decent backups. Distributed filesystems will, as far as i know, not grant you transparent failover for your hypervisor. You'll still need some server to centralize your storage requests on a block device level, making the distribution layer invisible to your hypervisors.

    --
    I was promised a flying car. Where is my flying car?
  48. response to OP, please read parent as well by dutchwhizzman · · Score: 2

    You don't seem to understand a few basics about storage, so let me explain them briefly:

    Backup is a method of storing your data in a safe place, so if you accidentally or purposefully delete it, or if you have a (severe) hardware failure, you still have your data. This automatically means you'll want to store your backup data on a totally, physically separated medium. If someone wants to destroy your data, a distributed filesystem won't do you any good. Taking one way snapshots over a network link to a remote location, for instance using rsync and a remote filesystem that supports snapshots, can be a viable solution for short term backups, but if you want longer term retention, "old hat" backup equipment still is a viable solution. How are you planning to restore from data corruption that happened 2 weeks ago? How do you protect against single sector failure? I have yet to see consumer grade raid controllers that actually do a read-verify on every read, so you're depending on raid-scrubbing to detect failures, with the setups you're looking at. A backup is for recovery of data lost on your primary storage system. You can make your primary storage system resilient with distributing and snapshotting it to an inch of it's life, but it's not a backup. If you don't make backups, your data obviously isn't worth it, so why bother making your primary storage resilient in the first place?

    A "Super blahblahbla" or whatever hardware you are planning to buy now, will not give you "a decade's worth of time". Look at 10, 20 and 30 years ago. Would you honestly say you'd want to store all your data on a state of the art 7*40GB RAID5 system, as was the bees' knees in 2001? Or how about a pristine 40MB IDE hard drive, the best you could buy in 1991? I think 1981 was still cassette or single sided floppy disc territory.... Seriously, never look forward more than 3 years with setups like this.

    --
    I was promised a flying car. Where is my flying car?
    1. Re:response to OP, please read parent as well by Doc+Hopper · · Score: 1

      Taking one way snapshots over a network link to a remote location, for instance using rsync and a remote filesystem that supports snapshots, can be a viable solution for short term backups, but if you want longer term retention, "old hat" backup equipment still is a viable solution.

      I agree, for varying values of "short term". "Short term" can mean *years* for low-variability filesystems.

      If you plan to drop data somewhere and not change it very often after that, snapshots offer a great long-term storage option as long as you have some sort of off-site replication taking place.

      That said, I administer a gigantic SL8500 tape library for exactly those cases where disks won't do.

  49. Re:Bad Dog. Wrong Tree! by Macka · · Score: 1

    Or alternatively you back everything off to tape (rotating sets) and store them in a fireproof safe.

  50. rsync by tapanitarvainen · · Score: 1

    Taking one way snapshots over a network link to a remote location, for instance using rsync and a remote filesystem that supports snapshots, can be a viable solution for short term backups, but if you want longer term retention, "old hat" backup equipment still is a viable solution. How are you planning to restore from data corruption that happened 2 weeks ago?

    It's easy enough to keep any desired schedule of incremental backups with rsync - search for rsnapshot for example, or BackupPC if you want a fancy web-based interface.

    Otherwise, 100% agreement: backups should be physically separated from the primary data, preferably by significant geographical distance (think about fire) and duplicated on several locations.

  51. Re:Bad Dog. Wrong Tree! by houghi · · Score: 1

    I hear all the time this "must move stuff off site" and that is in theory a good practice. I reality it is overkill for the home user for the majority of his downloaded movies.

    When I looked at the data that I REALLY needed to keep, I came to "not very much". Nothing that I could not host (encrypted) at my provider. I am talking less then 20MB in data.

    When my house burns down, I have other worries then my MP3 collection or my movies. I have not take copies of all the books I have in the house either, even though that is technical possible.

    Remember: this is a home solution. Now if it were a business solution, then off-site backup that you pay for must be an option. If that is too expensive, then the data is not worth saving.

    --
    Don't fight for your country, if your country does not fight for you.
  52. He did not ask for a backup solution, calm down. by Barryke · · Score: 2

    Stop bashing on the Raid!=Backup thing, we all know and its irrelevant to the question.
    I believe his main concern is having one giant volume (say 30 TB) to store data, and not about using it as a backup solution. (he did not even use the word)
    A backup for that volume would simply be duplicating the setup offsite, possibly offline archiving the cloned disks or (what i'd do) the complete hardware setup.

    I once investigated GlusterFS too, was impressed and descided that its for larger scale projects and for me only overcomplicates things.
    I ultimately solved this by buying several cheap QNAP TS410's and giving up on single volumes over 6TB of size, and mounting those seperately on the machines that use that data.

    I'm still interested however in the possibility of running GlusterFS on QNAP products however.

    --
    Hivemind harvest in progress..
  53. Re:ReiserFS by pntkl · · Score: 1

    I nearly choked on a muffin, when I read this thread.

  54. Re:You Should... by Sparx139 · · Score: 2

    You can't mention as a passing fact that you have 8TB worth of porn and not expect people to respond with "wait, what?"

    --
    Our culture doesn't get smarter, it just finds new ways of being retarded.
  55. I'll add more by dbIII · · Score: 1

    A mirror of live spinning disks updated at intervals is icing on the cake if you can afford it after you have real backups - doing it instead is can look extremely stupid when things go wrong.
    A web hosting company near me failed spectacularly due to that mistake - their mirror was mirroring garbage and they lost all of their clients files. Of course it made it even into the print media and it made them look very stupid.

  56. Offsite backups by mrbill1234 · · Score: 1

    I'm surprised that removable backup media has not caught up with the speed of change in hard disk sizes. In the olden days we used to backup with a couple of QIC-60's and we were happy. Later it was DAT backups, What inexpensive tape backup technologies are available today? It would seem that the best alternative is to use a drive itself as a backup medium and take it off-site.

    1. Re:Offsite backups by mprinkey · · Score: 1

      Truecrypting external USB/eSATA drives are by far the better option. We also use normal 3.5" drives with external USB/eSATA docks. There are NO cheap tape solutions anymore. I'd further argue that what tape solutions exist are trumped by hard drive backup solutions for on-site backup--far slower and no more reliable than hard drives. Tape is dead. Anyone still using them is either leveraging a 5-to-10-year-old investment in a tape robot or is being sold a bill of goods by a vendor.

  57. Re:Anonymous has done this. by StarHeart · · Score: 1

    I am doing very much the same thing. I have six 1tb hard drives in my main desktop, and five 1.5tb in a iSCSI server. I then combine them with mhddfs. It is slow, but I only use it for big files that I am not going to be rewriting. I use linux software raid5 for the big filesystems, and linux software raid10 for my /home.

    I am excited to see 4-5tb drives coming down the pipe. With just four 5tb drives I could replace all my hard drives, and remove the need for the the iSCSI server.

    I have seen the same errors with iSCSI and ext4.

    [2687538.144009] EXT4-fs (sdi): error count: 54
    [2687538.144012] EXT4-fs (sdi): initial error at 1309736118: ext4_journal_start_sb:260
    [2687538.144016] EXT4-fs (sdi): last error at 1309761117: ext4_put_super:737: inode 8194
    [2774045.664009] EXT4-fs (sdi): error count: 54
    [2774045.664013] EXT4-fs (sdi): initial error at 1309736118: ext4_journal_start_sb:260
    [2774045.664016] EXT4-fs (sdi): last error at 1309761117: ext4_put_super:737: inode 8194
    [2860553.184009] EXT4-fs (sdi): error count: 54
    [2860553.184012] EXT4-fs (sdi): initial error at 1309736118: ext4_journal_start_sb:260
    [2860553.184015] EXT4-fs (sdi): last error at 1309761117: ext4_put_super:737: inode 8194

    --
    Havoc Penington, the bane of my Linux desktop.
  58. Also ACFS (next generation of OCFS...) by Meetch · · Score: 2
    Firstly, no I don't work for Oracle, and never have, and I know how hard it can be to justify using their products, especially the ones you pay for(!) considering some of the things I've seen, but credit where credit's due...

    OCFS was originally designed specifically for storing Oracle datafiles, in a cluster, in a non-POSIX fashion. After that came OCFS2, which is POSIX compliant, but can deadlock when NFS exported due to the way NFS handles locking, in a way that can be worked around with the "nodirplus" NFS mount option (not available on all OSes, but Linux is ok). They since developed ASM (Automatic(ed?) Storage Management) which threw away the traditional filesystem presentation of your oracle datafiles, and subequently bundled that into the release of 11gR2 clusterware and extended the functionality to give us ACFS - ASM Clustered Filesystem.

    11gR2 clusterware is designed to be clustered with shared storage, and depending on the options when created will happily give you a POSIX compliant clustered filesystem for any occasion - datafiles, regular files - whatever. It is Oracle's implementation of their "best practice" Stripe And Mirror Everything methodology with the aim of not only high availability, but consistently high performance, through spreading all your data across all your disks, and implementing mirroring in a sane way too (split your disks into two (or three!) failure groups, and the software will ensure there are 2 (or 3!) copies of each block. All you do is add disks to the pool(s), and if you have the space you can dynamically remove disks from the pool too. You can fsck, mkfs, mount and unmount it, take snapshots (!), and the lead-up to all that is all not much of a stretch from LVM. Google for Oracle ACFS and see the "Basic Steps to Manage Oracle ACFS Systems" section.

    OCFS was only ever available for Linux, but ACFS now supports other platforms... probably doesn't matter to you. The one catch I've found so far is the ~1Gb RAM overhead to run the clusterware PER NODE. There's other reasonable stuff, like you need the network layer to be up in order to start the ACFS supporting services, so you can't put anything related to the basic boot process on those volumes.

    The cost of 11gR2 clusterware? ... nothing. I think it's one of very few "free" (as in beer) products they do. It will work on anything they've compiled it for though - generally means your Enterprise OS like RHEL5 (and should be easy to shoehorn onto CentOS), a recent SuSE release, and of course their own Oracle Enterprise Linux - which I believe is also free to use, but pay through the nose if you want them to support your implementation. Remember that this system is the platform for some very expensive Oracle products, but at the same time it is perhaps a younger product than some you'll have already looked at.

    As for the fencing method, it all works via heartbeat to disks in your ACFS pool. If the clusterware can't "ping" the disk within the threshold, it forces the system that's having the issue to reboot. Such is the nature of ensuring sanity when using shared disk. I suggest looking at it if your boxen can spare the RAM and you're happy to accept their OTN license agreement, as it really does seem to be one of Oracle's better products at an amazing price for what you get.

    1. Re:Also ACFS (next generation of OCFS...) by trawg · · Score: 1

      Great info dude, thanks!

  59. Re:You Should... by DarkDust · · Score: 1

    Maybe he has a girlfriend and doesn't want to lose his homemade porn. I like mine and want them to be safe, too ;-)

  60. Re:The Cloud, obviously. by jareth-0205 · · Score: 4, Insightful

    I would be grateful if this bit of 'humour' could not be posted to *every single vaguely cloud-related post*.

    http://linux.slashdot.org/comments.pl?sid=2356014&cid=36928876

    http://tech.slashdot.org/comments.pl?sid=1683582&cid=32542918

    http://tech.slashdot.org/comments.pl?sid=2499970&cid=37882212

    http://it.slashdot.org/comments.pl?sid=2489600&cid=37805882

    Christ. It was only mildly amusing to begin with, let it go.

  61. Tahoe-lafs by Anonymous Coward · · Score: 1

    Take a look at tahoe least authority file system.

    It is intended to be used on systems that are distributed in a network (like the internet) for secure and failsafe storage of data. But no one prevents you from running multiple instances of this software on a single system.
    Data is encrypted on the client (so the storage servers know nothing about the data) and distributed in a configurable way. If you have 10 disks you can set configure it so that al data is distributed for example to at least 8 disks and 5 (configurable) disks are enough to recover the date. So even if 3 random disks fail, your data is still safe!

    take a look at it on https://tahoe-lafs.org/trac/tahoe-lafs

    Did i mention that it is free and open source?

  62. xtreemfs by marcello_dl · · Score: 2

    http://www.xtreemfs.org/ is a distributed fs with no single point of failure (i guess, depending on the configuration), for high latency networks, if you want to put nodes on WAN. It's fairly easy to set up, now it replicates also mutable files, I dunno about its performance or reliability.

    --
    ---- MISSING MISCELLANEOUS DATA SEGMENT --- [sigdash] trolololol
  63. Whoat do you want by Stonefish · · Score: 1

    You talk about VM, distribured data..... but what do you actually want?
    Have you heard of the CAP theorem? does it apply to what your doing?
    How dynamic is your data?
    How granular are updates?
    Is it of a transactional nature?
    Is it your DVD/Bluray collection?
    From what I'm hearing well managed software raid would suit, ATAoE 10Gps+ no need for more complexity, of course you could use a rack are file system but first you'd need to provide a problem that it actually solves.

  64. Amazon S3 by qualityassurancedept · · Score: 1

    In terms of just backing it up... you can use AMAZON s3... and you can just mail them hard drives instead of uploading over the internetorium. Of course, they mail your hard drive back just as soon as they have sucked all of the data off them and put it in your S3 account. Then you can start EC2 instances and do all of the supercomputing you want. I can't image what you would have AT HOME that takes up 5 TB of space though. I suppose you could be running your own version of pirate Netflix or something, but even so, a few hundred movies would, if ripped at the full 8 GB per film, take up only about 3 TB of space... so you might consider cramming all of those movies down to .avi files of 1 GB or so and thereby freeing up 4 TB of disk space, which will of course save you a lot of money when you upload it all to your S3 account.

    --
    if your life is such a big joke then why should I care?
  65. Why use a cluster FS anyhow? by Anarke_Incarnate · · Score: 1

    why not mirror them to other nodes using DRBD or something like Hadoop where there are copies of parts of data distributed across a larger subset of machines?

  66. Re:Bad Dog. Wrong Tree! by Skater · · Score: 1

    My wife and I thought through this, and the only thing we felt we HAD to put offsite was our pictures. So we have an account with a backup provider that allows rsync, and I have it set up to update nightly. Works great so far.

    We also discussed building three 'backup boxes' that we could place at some relative's houses...then everyone with one of the backup boxes could back up to the other two. We decided not to do it for the expense, though, and we didn't think the relatives would be that interested.

  67. Ill fit... by Junta · · Score: 3, Interesting

    Those filesystems are not designed primarily with your scenario in mind. If you want a hardware agnostic support, use software RAID or a non-cluster filesystem like ZFS.

    Distributing your storage will probably not enhance your ability to survive a mishap. In fact, the complexity of the situation probably increases your risk of messing up your data (I have heard more than a couple of instances of someone accidentally destroying all the contents of a distributed filesystem, but in those professional contexts they have a real backup strategy. You'll be pissing away money on power to drive multiple computers that you really don't need to power.

    If you care about catastrophic recovery, you need a real backup solution. This may mean identifying what's "important" from a practical home situation. If you don't mind downtime so long as your data is accessible in a day or two (e.g. time to get replacement parts) without going to your backup media and without suffering the loss of non-critical data, then also having a software raid or ZFS is the way to go. If you want to avoid downtime (within reason), get yourself a box with basic redundancy designed into it like a tower server from Dell/HP/IBM. If Intel, you would sadly want to go Xeon to get ECC, on AMD you can get ECC cheaper. In terms of drive count, I'd dial it back to 4 3TB drives in a RAID5 (or 5 in RAID6 if you wanted), safe on power and reduce risk in the system.

    --
    XML is like violence. If it doesn't solve the problem, use more.
  68. Enterprise Storage Option by ebursley · · Score: 1

    Home brew solutions are good for a small business, but once you move into multi-terabyte solutions, you should consider a more Enterprise ready solution. If I were in your position, I would consider a dedicated storage area network device such as an EMC VNX or NetApp storage array. Both handle multi-terabyte solutions well. Both are also easy to manage and integrate well into most network environments (CIFS / NFS / FC / FCoE / iSCSI). If you are looking for just NFS / CIFS, Isilon also makes a very fast and scalable NAS device that is super easy to manage.

    --
    Eric Bursley
  69. Small herd of AFS servers? by vlm · · Score: 1

    I have a small herd of AFS servers at home, sounds like it would meet your needs.
    One RW and a herd of RO replicants, at least for the important stuff. The RO replicants are updated automatically every day or so by a script I wrote, I can also run it manually. I believe you are limited to 6 RO replicants for each RW volume and I'm bumping up against that limit at home, don't know how big installations survive that limitation.
    If the RW blows up, which hasn't happened, supposedly its trivial to make one of the RO a live RW.
    I also snapshop backup each night at 2am and have the daily snapshots mounted on ~/backup to make it easy to correct accidental deletion errors.

    If you use AFS be prepared for an avalanche of people who have never used it, or haven't used it since 1996, or tried to use it without reading any docs or howtos or tutorials, telling you its impossible and too complicated and too difficult and should never be attempted and it'll never work. On the other hand, I just used some simple tutorials and walkthrus found via google, practically screencasts in terms of level of detail, and found it to be quite trivial, like a couple hours work, which isn't bad for all it does. I (almost) feel sorry for the haters. Sucks to be them, I guess.

    --
    "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
  70. rsync, with a buncha drives? by Prof.Phreak · · Score: 1

    Stick six 3T drives into a box, run rsync between'em daily (cron!). Build identical box in a different room/house, run rsync between first box and 2nd box daily/weekly (depending if local network or not)? each box, maybe ~$2k. That's 18T of space per box, with maybe 9T of it if you want to keep two copies of everything on each box.

    Monitor disks, and as soon as OS tells ya the disk is going bad, replace it. It's amazing what modern disks are capable of.

    Unless data changes rapidly, there's no need for replication that happens more than daily (e.g. typical home server box for whatever it is you're doing). Worst case scenario, you lose updates for that date. Have a tiny raid for critical stuff if that's not acceptable.

    Easy to recover, 'cause, you have all the files right there, just swap disk, rsync will ensure all is good. No weird formats to deal with, no configurations to fiddle with (not something you want to be doing while recovering from a bad disk).

    --

    "If anything can go wrong, it will." - Murphy

    1. Re:rsync, with a buncha drives? by ansible · · Score: 1

      Rsync is not a backup solution, though you can build a backup solution with it.

      We've been using rsnapshot to back up data to removable drives (carried offsite every day) and also to a local server with a lot of storage. The local server helps with "oops, I just deleted a file" moments. It uses hard links for identical files to save space.

      There is no special recovery software needed. Just go into the directory and look at the files. Simple, easy, and no lock-in.

  71. Something Else by Nite_Hawk · · Score: 2

    Hi,

    I work for a supercomputing center and am the maintainer of our 1/2 PB Lustre deployment. I also hang out on the GlusterFS and Ceph IRC channels and mailing lists and have spent some time looking at both solutions for some of our other systems.

    For what you want, Lustre isn't really the right answer. It's very fast for large transfer (though slow for small ones). On our storage I'm getting about 12GB/s under ideal conditions and that's totally uninteresting as far as Lustre goes. There are very few other options out there that are competitive at the ultra-high-end (ie PBs of storage at 100+ GB/s). On the other hand you *really* need to understand the intricacies of how it works to properly maintain it. It doesn't handle hardware failures very gracefully and there are still numerous bugs in production releases. A lot of progress has been made since the Oracle acquisition, but it's going to be a while before I'd consider Lustre mainstream. I wouldn't use it for anything other than scratch (ie temporary data) storage space on a top500 cluster.

    GlusterFS and Ceph are both interesting. GlusterFS is pretty easy to setup and has a replication mode but last I heard there were some issues simultaneously enabling striping and replication at the same time. Now that RedHat is backing it I imagine its going to pick up in popularity really fast. Also, having the metadata distributed on the storage servers eliminates a major problem that Lustre still has: A single centralized metadata server. Having said this it's still pretty young as far these kinds of filesystems go, and it's not immune from problems either. Read through the mailing list.

    Ceph is also very interesting, but you should really run it on btrfs and that's just not there yet. You can also run it on XFS but there have been some bugs (see the mailing list). Ceph is really neat but I wouldn't consider it production ready. Rumors abound though that dreamhost is going to be making some announcements soon. Watch this space.

    Ok, if you are still reading, here's what I would do if I were you:

    If you are running on straight up gigabit ethernet you basically have no reason to bother with distributed storage from a performance perspective. 10GE is a cheap upgrade path and a single server will easily be able to handle the number of clients you'll have on a home network. From a reliability standpoint I've personally found that something like 70-80% of the hardware problems I have are with hardware raid controllers. I'd stick with something like ZFS on BSD (or Nexenta if you don't mind staying under 18TB for the free license). Then export via NFS or iscsi depending on your needs. If you want HA across multiple servers, here's what people are doing on BSD with ZFS:

    http://blather.michaelwlucas.com/archives/221

  72. Keep it simple, stupid. by jimicus · · Score: 3, Informative

    Two issues here:

    1. You're approaching the problem from the wrong angle. IMV, the angle you take should be "how long can can I afford to be without this data and how much money am I prepared to throw at a solution?" rather than "what technology exists that I can use to make the system more reliable?". Taking the former approach allows you to plan exactly how you'd deal with data loss - whether it's through human error, software/hardware failure, fire, theft, flood or what have you. Taking the latter approach tends to result in some whacking great Heath Robinson (or if you're American, Rube Goldberg) of a solution that still has a whacking great hole in it somewhere.

    2. 8TB of data is not an enormous amount by any modern standard. You can buy a NAS box off-the-shelf today that will take 12x3TB hard disks for 36TB (18TB if you've got the good sense to run them in a RAID 1+0 configuration) of storage; at this level they typically have replication built right into them so you can buy two and replicate one to the other (though like all replication-type solutions, it's not a form of backup and you mustn't treat it as such). If that doesn't appeal, simply put a couple of SATA controllers in a cheap box and run OpenFiler. Anything you cobble together yourself based on the latest clustered filesystem du jour will suffer from one huge flaw - a system that's designed to be highly-available is frequently less reliable than one that isn't, simply because you're making it that much more complicated that there's a lot more to go wrong.

  73. Re:You Should... by GameboyRMH · · Score: 1

    Sounds to me like it was some kind of work project.

    --
    "When information is power, privacy is freedom" - Jah-Wren Ryel
  74. Re:You Should... by Karzz1 · · Score: 1

    Many thanks for the new sig. Unfortunately with the sig there is not enough space left to attribute it to you!

    --
    Beware of he who would deny you access to information, for in his heart he dreams himself your master.
  75. Performance, not recovery by riley · · Score: 2

    Clustered filesystems are not designed to make your data safer, or to provide ease of recovery. In fact, they make both of those things a bit more difficult. In the case of Lustre, the point is performance -- I have N servers that I am willing to dedicate to serving the filesystem, I can therefore get N times the throughput for large distributed jobs.

    File systems that provide replication help, but unless it is copy on write (COW), it does nto take the place of backups.

    If you are paranoid about data safety, invest in a backup solution. The only reason to use a distributed file system is for increased performance.

  76. Re:You Should... by GameboyRMH · · Score: 1

    8TB of porn is completely insane. That amount of video is hard to fathom, but I'll try to put it into context.

    One of the many, many complete series I have on my home server's 2TB hard drive (among many other things) is every episode of Bleach from 1 to 345. Each episode is 20 minutes long, plus some 40 minute specials but we'll discount those. Let's call the average episode size about 150MB. 150MB*345=51750MB, 51750MB/1024=50GB which sounds about correct (my Mythbusters collection is around 35GB). How long would it take to watch all that? (20*345)/60/24 = 4.8 days (less than I thought - time well spent I say).

    Let's assume his porn is stored at a roughly similar quality, 7.42 megs per minute (that's fairly high actually). (8*1024^2)/7.42=1130540.16, /60/24/365 = 2.15 YEARS OF PORN D8

    --
    "When information is power, privacy is freedom" - Jah-Wren Ryel
  77. Re:The Cloud, obviously. by sentimental.bryan · · Score: 1

    Wait a minute. I'm a manager, and I've been reading a lot of case studies and watching a lot of webcasts about The Cloud. Based on all of this glorious marketing literature, I, as a manager, have absolutely no reason to doubt the safety of any data put in The Cloud.

    The case studies all use words like "secure", "MD5", "RSS feeds" and "encryption" to describe the security of The Cloud. I don't know about you, but that sounds damn secure to me! Some Clouds even use SSL and HTTP. That's rock solid in my book.

    And don't forget that you have to use Web Services to access The Cloud. Nothing is more secure than SOA and Web Services, with the exception of perhaps SaaS. But I think that Cloud Services 2.0 will combine the tiers into an MVC-compliant stack that uses SaaS to increase the security and partitioning of the data.

    My main concern isn't with the security of The Cloud, but rather with getting my Indian team to learn all about it so we can deploy some first-generation The Cloud applications and Web Services to provide the ultimate platform upon which we can layer our business intelligence and reporting, because there are still a few verticals that we need to leverage before we can move to The Cloud 2.0.

    Bullshit bingo!! Bingo!!

  78. Re:You Should... by TangoMargarine · · Score: 1

    Starting to see why they complain about the Extravagant Western Infidels...

    --
    Unity? Screw that: XFCE. Slashdot Beta? Screw that: SoylentNews. Australis? Screw that: Pale Moon. UX developers DIAF
  79. Re:You Should... by TangoMargarine · · Score: 1

    slaker

    Well, there you go. It sounds to me like there's quite a lot of slaking going on, eh?

    --
    Unity? Screw that: XFCE. Slashdot Beta? Screw that: SoylentNews. Australis? Screw that: Pale Moon. UX developers DIAF
  80. Re:You Should... by slaker · · Score: 1

    I don't think it would be that interesting to anyone else.

    --
    -- I wanna decide who lives and who dies - Crow T. Robot, MST3K
  81. Re:You Should... by slaker · · Score: 1

    I have a similarly vast collection of non-adult content, yet that always passes without comment. They're just bits and bytes. I'm not assigning any particular value to one sort of content versus another.

    --
    -- I wanna decide who lives and who dies - Crow T. Robot, MST3K
  82. Re:You Should... by slaker · · Score: 1

    Also, don't you, or didn't you at one time work with the folks who run Voyeurweb?

    --
    -- I wanna decide who lives and who dies - Crow T. Robot, MST3K
  83. Other Options by Salamander · · Score: 1

    Disclaimer: I'm the project lead for HekaFS, which is based on GlusterFS.

    If you're concerned about data protection, you'll want to worry about node as well as disk failures. Some distributed filesystems, including Lustre and PVFS*, take a rather old-school "use RAID and implement your own heartbeat/failover between server pairs" approach, and that just sucks. GlusterFS and Ceph don't have that wart; neither do MooseFS or XtreemFS, which I would consider the other alternatives. They all have their own forms of replication built into the filesystem, so you don't need to set up and maintain another layer for them. Unfortunately, neither MooseFS nor Ceph survived even simple tests - write a few files in parallel, flush caches, read them back in parallel - when I ran those tests on the same hardware as GlusterFS and XtreemFS which did fine. That was a while ago, though, so take that with a grain of salt. Ceph in particular has a lot of awesome technology and has a very bright future IMO, but it's taking a while for it to realize that potential.

    Out of GlusterFS and XtreemFS, the choice has a lot to do with your exact use case. XtreemFS has a pretty strong focus on wide-area replication, so if that's part of your need now or likely to be in the future then it's probably a bit stronger. GlusterFS does have some wide-area replication, but I consider it rather weak. Within a single data center, I'd give GlusterFS the edge. It has better local performance than XtreemFS in my tests, and it has what I consider by far the best setup/management interface.

    The one caveat I'd offer is that all of the filesystem I've mentioned excel for sequential access for large files. For random access, and especially for metadata-heavy workloads, they all suck to some degree. As others have mentioned, you might very well be better off with a simple NFS server pair with cheap shared storage and heartbeat/failover to ensure availability.

    --
    Slashdot - News for Herds. Stuff that Splatters.
  84. RAID5 considered harmful by swillden · · Score: 1

    20 disks seems like overkill for your storage needs. Seems like the more disks you use the greater the risk of failure of one or more of them. Also, your electricity bill must be through the roof. I have 4 3TB drives with a 3Ware controller in RAID5 array which gives me the same storage capacity with 1/5th the drives.

    You should seriously consider adding another drive and migrating to RAID6, because RAID5 has a fatal flaw that may cost you all of your data (or at least force you to restore from backup, but, seriously, keeping 9 TB backed up isn't easy).

    The problem with RAID5 is that if you lose one of your drives it leaves your array in a very risky state. This seems obvious, since it's clear that the failure of any drive at that point will lose all of your data, but it's actually at least an order of magnitude worse than it appears. Why? Because the failure of a second drive at that point is actually quite likely. When you install a replacement drive, the array has to resync to incorporate the new drive and get back to a health state. Do do this, the resync operation has to read every single block of every remaining drive. This means that if there are any other latent failures, unrecoverable blocks that just haven't been noticed yet, the resync will find them and the resulting failure will lose all of your data.

    In fact, even a transient failure can lose all of your data. I was actually able to recover mine once, due to the fact that I was using software RAID rather than hardware. Linux mdraid allowed me to "forcibly" restart my degraded array (carefully specifying the order of my disks exactly as they had been; which information I had thanks to the e-mails md had sent me), at which point I ran out and bought enough big disks that I could back the entire set up. The backup succeeded.

    After a similar experience which was even more harrowing because the failure wasn't transient, I abandoned RAID5 for data I care about and switched to RAID6.

    My current approach is:

    1. Unimportant data goes on RAID0.
    2. Replaceable data goes on RAID5 (this is mostly movies ripped from DVD, in my case -- I have the DVDs and could re-rip if needed).
    3. Important data goes on RAID6, with a hot spare. Since my RAID6 array has six disks (including the spare), this is as inefficient as RAID10 would be, but with better survivability (and worse performance, but that doesn't matter to me).
    4. Irreplaceable, critically important data goes on RAID6 and gets backed up to a Tahoe-LAFS distributed grid, which will ensure that it will survive even in the event my house burns down. My portion of the LAFS grid resides on RAID5 at present, though I'd also be okay with keeping it on RAID0.

    In addition, I also run regular surface scans on all of my drives. In theory this should make RAID5 acceptable since it should identify any waiting problems before the array is degraded. In practice, I still don't trust RAID5.

    --
    Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  85. Re:Bad Dog. Wrong Tree! by swillden · · Score: 1

    When I looked at the data that I REALLY needed to keep, I came to "not very much". Nothing that I could not host (encrypted) at my provider. I am talking less then 20MB in data.

    Don't have kids?

    For those of us that do, we typically end up with a lot of photos and video that we would really, really hate to lose.

    My solution for this is high-volume off-site backup using Tahoe LAFS. I have about 200 GB backed up now, and will have 400 GB within a couple of months.

    --
    Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  86. What you want is a remote mirror, not a clusterfs by Mysticalfruit · · Score: 1

    I'm hearing you say "clusterfs" but what I'm reading from your post is "remotely recoverable filesystem". A cluster filesystem makes lots and lots of sense if you've got a 100 nodes that need high speed access to a single piece of storage.... this doesn't sound like this application...

    What you should setup is a ZFS box (I'm a fan of raid10, but pick the raid of your choice) we'll call this machine A.
    Now go build an indentical box named "B".

    Now, what you'll want to do is setup an rsync process that does the following...
    1. send a zfs command from A to B creating a snapshot of the appropriate file systems.
    2. rsync said filesystem from A to B
    3. Sleep some amount of time, goto step 1.
    Now at some point you'll want to clean up all the snapshots on B, but that's an excersie I'll leave to the reader.

    Another option is to take the snapshot on A and then use zfs send to send the snapshots as well.

    --
    Yes Francis, the world has gone crazy.
  87. Re:The Cloud, obviously. by tomhudson · · Score: 1

    Why not just recommend he continue to use the ClusterF$ck file system and be done with it?

    After all, the OP is using RAID as a backup ("I have suffered twice through complete loss of data once due to accidentally not re-enabling the notification on my hardware RAID and having an array power supply fail and the RAID controller was unable to recover half of the entire array").

  88. Re:You Should... by pnutjam · · Score: 1

    I would like to comment on the technical merits. I am impressed. You are definitely large enough that tapes start to make sense, however, I'm on the fence about tapes vs. disk on a cost / recovery time analysis.
    How many tapes are you currently using? Have you looked at the rsync hard-linking solutions you can use with disk? You could have 20 x 2TB disks and probably maintain a complete backup set w/ incrementals. You would ideally need 2 sets of 20 to rotate off-site, but I think that would be competitive with your tape solution from a cost perspective and it would blow it away when you consider backup and restore time.

    Care to discuss? I agree that RAID is a waste for most people and introduces an unnecessary point of failure.

  89. moose ? by bmimatt · · Score: 1

    I've put mooseFS through its paces with good results on FreeBSD and a couple of MacbookPro's.  The easy configuration, real-time stats, self-healing and the ability to quickly add more instances to increase throughput are just a few highlights.  Documentation is a bit terse, but complete enough for anyone with a few hours to spare to get it up and running.  There are a few quite large companies using it in production in Europe.

    Gluster - up and down all the time in heavy LAMP production for about a year.  Ended up replacing it with Netapp.

    P.S. I am in no way affiliated with any of the products/companies mentioned in my post.

  90. Re:He did not ask for a backup solution, calm down by SmurfButcher+Bob · · Score: 1

    Most certainly is very relevant to the question. His entire premise is NOT losing data. He went out of his way to recite two separate anecdotes to that effect. And none of his solutions resolve that goal.

    --

    help me i've cloned myself and can't remember which one I am

  91. Re:You Should... by slaker · · Score: 1

    The total size of data that actively needs to be backed up is a hair over 30TB at this point. Yes, I could move to a system of using some number of hard disk drives, but there are some mitigating factors there:

    1. Drive mechanics are excessively delicate, especially for the low-cost "green" drives. Given that engineers spend man-weeks trying to shave fractions of a cent off the per-unit cost of each drive, what are they doing to make "green" drives significantly cheaper than their full-speed siblings?

    I'm using Hitachi 3TB 0S03230 drives, which ARE 5400rpm "green" drives in my main storage server now, but had I not had access to my tape library, I don't think I would have felt comfortable enough to buy them in the first place.

    2. Drives in quantity aren't really all that portable. They're not going off site. My second set of tapes lives at my office. I run an incremental backup job every weekend and swap full sets every couple months, so I'm more or less carting backup media around all the time. I wouldn't want to do that with hard drives.

    3. One of the reasons I moved to tape was to dramatically reduce my power consumption. Moving from four file servers to one-and-a-half (I have one other machine that's been repurposed for non-media needs) cut my power bill by around 40%. And no, I don't ever bother to heat my home in the winter.

    At the moment, I need 38 tapes to get a full backup. I don't bother with hardware compression since my data is practically all video files. The tapes are around $20 apiece. I have 86 of them, most of which came with my autoloader, which covers me for incremental backups and a few spares in case I have a bad tape (I've only had one so far). I haven't tested a full restore due to the massive volume of data involved, but I was able to spot-recover 5TB from a full backup set without any issues. I don't really pay all that much attention to how long backups or restores take (though, gotta say, LTO is surprisingly fast). I basically load in my 16 tapes and come back in 12 hours to load in 16 more.

    --
    -- I wanna decide who lives and who dies - Crow T. Robot, MST3K
  92. Re:Bad Dog. Wrong Tree! by freshlimesoda · · Score: 1

    "help me i've cloned myself and can't remember which one I am" ----> Kill the less powerful one.

    --
    I come to Slashdot only to read sigs. One you are reading is mine.
  93. Why the fuss?? by freshlimesoda · · Score: 1

    Just use ZFS / SAN... why not ???

    --
    I come to Slashdot only to read sigs. One you are reading is mine.
  94. No one mentioned DRBD... by axelabs · · Score: 1
    1. Re:No one mentioned DRBD... by bt00x · · Score: 1

      Maybe because DRBD is not a Clustered Filesystem? http://www.drbd.org/docs/about/ states: "The Distributed Replicated Block Device (DRBD) is a software-based, shared-nothing, replicated storage solution mirroring the content of block devices (hard disks, partitions, logical volumes etc.) between hosts."

  95. Re:You Should... by JWSmythe · · Score: 1

        I swear, some of you people have memory like a steel trap.

        Yup, I designed, built, and ran the network for 8 years. When they had problems a few years ago, that was about 6 months after they kicked me to the curb for my dedicated service. It was the day before Thanksgiving, 2006. Family had come out to visit, and by the end of the November 22 I had the official word that I was fired. That make for a very depressing family gathering. But hey, who am I to hold a little resentment.

        Back in 2004, This was the biggest, baddest machine on the network. 1.15TB storage each. The actual configured array size was smaller. Part was RAID5, part was RAID0 (if I remember right). I remember transporting and racking those. They were damned heavy.

        We had two more machines (SuperMicro 1u, nothing fancy), each with the same array chassis, with 250GB drives. At first, I had them connected to one machine as 2 RAID5's, and RAID0 across them (6.4TB). I also did it as a single RAID 5 (7.2TB). Those Promise arrays would go funky on occasion, and just stop working, so we split them between their two machines. If you dig back far enough in my journal archives on here, you can read about me talking about issues formatting them. ext3 wouldn't do it then. It does now though.

        I found the posts.
    15 November 2004
    16 November 2004

        We could back up all of the VW servers (voyeurweb, redclouds, homeclips, funbags), plus all the free hosting servers, and ancillary servers (mail, dns, etc) several times over on those drives. Even with all that, we *still* had room left over.

        It's amazing what we can have in our home machines now. I have more storage than that, between a few machines here. Heck, why not buy a 2TB drive, they're cheap. I'll keep trying to fill them, but... :)

    --
    Serious? Seriousness is well above my pay grade.
  96. Re:You Should... by g00ey · · Score: 1

    I took the "porn" part as a joke and didn't raise a brow about it, but I guess that we're all different. I wouldn't call 8TB of storage "hoarding", it's actually kind of nice to have a large redundant array of storage to work with. You don't have to worry about running out of space anytime soon and duplicating hard disk images for virtual machines without considering storage limitations is really nice. Then we have people working with video editing which takes a lot of storage space uncompressed , especially when it is in Hi-Def so there are indeed legit reasons for using such a large storage capacity. Maybe 36TB is kind of pushing it but then again, who are we to judge if someone decides to get that sort of equipment for his or her own money?

  97. Re:You Should... by pnutjam · · Score: 1

    Makes sense, the up front cost for the drive is painful, and would create a single point of failure on my budget. I only have a terabyte or so of data. I use a pelican style case to transport hard drives and have never had a problem.

    I would consider filling the other 36TB, buying a spare 36TB and rotating them periodically. I would script them to mount before backup so they aren't spinning when they aren't in use. I personally do my backups weekly since my data doesn't change. If I throw some new pics on the server, I will kick it off manually.

  98. Re:Bad Dog. Wrong Tree! by g00ey · · Score: 1

    Yeah, when looking at his post it seems that things can never get boring around him;)

  99. Re:rdiff-backup by hendrikboom · · Score: 1

    rdiff-backup keeps old versions around via backwards differences. And if you use it right, the most recent version of a file can just be read from the backup drive without using rdiff-backup.

    And it can operate over a network.

    According to the docs, it's supposed to be available for mac and windows, too.

    -- hendrik

  100. Re:You Should... by JWSmythe · · Score: 1

        Well, it sounds like you have and will continue to spend an unhealthy amount of time collecting. It may not be stacks of newspapers waiting to crush you, but it's a obsessive behavior, and you should seek help. Hell, we could be wrong, but none of us are qualified to say that.

        But since you're asking for a review of your methodology, lets have a look.

        You have 64 drives in 4 machines. That's 16 drives per machine. What are the machines that you're using? 16 drives each is an awful lot. Like, my 4u raid chassis held 15 drives each. I have seen some machines like this that hold 16 drives or more in the same chassis as the motherboard. It's only about $1,000 for just the chassis. A bit pricy for a hobby.

        You say you have 4 machines with 12 TB each, and 16 drives each. That's 16 750GB drives, and I'd guess you spent over $6,000 for them. Probably a lot more, as you have already collected 8TB of porn, and vast quantities of other things.

        You say you spent $1,900 on the tape drive. That would be about 30 tapes (LTO4 800/1600), at about $30/ea, so another $900 on tapes. But you say you want another 100 tapes, so you can have one more generation of backups?

        You mentioned using rsync between the machines. rsync is great, but as I discovered with huge filesystems is that the memory consumption is likewise huge. That, and the fingerprinting and comparison of the files would take roughly ... forever and a day. Been there, done that, suffered the pain of it all. So you'd need a robust ordering system, and do them in pieces. That's how we did one of our huge sites. That's usually one of those things that people mention when talking about the wonders of a system they put together.

        I'd even expect you to talk about the extra power consumption and cooling requirements. I would strongly suspect that kind of load would require at least two power circuits, but probably 3 15A to 20A circuits. I would assume you did this at home, and not in a commercial building. Most residential rooms have one, maybe two, circuits. This is usually due to lazy contractors only wanting to run one cable for adjoining walls. You didn't even mention the power supplies per machine, much less these larger issues.

        And what about your network. Syncing terabytes of stuff between multiple machines is rather bandwidth hungry. Well, unless you expect it to be done this year. You wouldn't be using a regular consumer linksys/belkin/netgear switch. You'd want GigE ports on a real managed switch. That has a a pretty healthy price tag attached.

        If you had such a huge investment in large storage, I'd believe you'd have a home theater system to match. You wouldn't be watching that porn on a 15" CRT.

        Really, the 8TB of porn question that I raised, although questioning why you'd possibly want that, was really a question of if you had really done it.

        I'd say that you're thinking and dreaming of having such a system. I seriously doubt that you've spent over $13,000 for your home porn store. ($12,800 outlined above, not including motherboard, processor, memory, cables, network gear, portable air conditioner, electrical work, etc).

        This is the point that you'll argue, insist, call me a liar, probably with many profanities either written or implied.

        I could be wrong, but that would lead us down the road towards unhealthy obsessions.

    --
    Serious? Seriousness is well above my pay grade.
  101. Re:You Should... by slaker · · Score: 1

    Your estimates are off somewhat, but yes, it's not a cheap hobby to have. I've probably put about $18,000 in hardware over the last 5 years, with the first of my older systems being by far most expensive. Most of it is sitting in or on top of an APC Netshelter in a closet in my extra bedroom. It all runs on one 20A circuit, albeit one that really isn't used for much of anything else.

    My server systems are built with commodity parts other than disk controllers (Dell Perc5s or IBM ServerRAID M1015s) and chassis (generic Norco 20-bay units that have a SATA/SAS backplane and usually run $350 or so), rather than the much more expensive first tier OEM machines or even barebones Supermicro or Tyan rigs. This keeps cost and more importantly noise down to manageable levels compared to some of the things you're suggesting.

    For what it's worth, I really don't have much else going on in my life, which makes this sort of thing possible if not exactly practical. I think it's deeply cool that I have a personal application for this kind of equipment, and it gives me something to do as a techie hobbyist besides building a newer, faster desktop every six months. I suppose the comment about the affordability of the equipment is a little bit off-side. Plenty of reasonably well paid, unmarried IT guys and engineers buy themselves impractical $50,000 cars and no one bats an eye at that or casts aspersions at the habits of their consumption.

    --
    -- I wanna decide who lives and who dies - Crow T. Robot, MST3K
  102. Re:The Cloud, obviously. by jd · · Score: 1

    Maybe they would if the managers stopped moderating it +5 insightful at board meetings.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  103. Re:You Should... by jd · · Score: 1

    I'm shocked. You did all that analysis and did not ONCE offer to help in a backup in case of a fire, earthquake or terrorist attack! How would this person feel, if all that data was lost? That is so incredibly thoughtless of you! As a more civilized member of the Slashdot community, I think it only fair that we offer to help preserve this unique collection of alternative art in the event of disaster.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  104. Re:You Should... by JWSmythe · · Score: 1

        Well.. It's not my data. It's not any data that I'm paid to protect. What do you want me to do, save the world? It's the US Government's job to protect everyone else in the world, not mine. :)

        Wait.. what? Oh, my friend just told me, the US Gov't only cares if it involves oil. :)

    --
    Serious? Seriousness is well above my pay grade.
  105. Re:You Should... by saleenS281 · · Score: 1

    If they're only 150MB, they are extremely compressed or low resolution. A standard DVD comes in around 5GB. A blu-ray rip is 20-50GB depending on who ripped it and what settings they used. Seinfeld ran for 9 seasons, and had 180 episodes. The complete boxed set is 33 discs. If your 345 episodes were a high quality blu-ray rip, you would need a lot more than a 2TB HD to save them.

  106. MooseFS by int19h · · Score: 1

    I tried Ceph at around 5 computers for two weeks and experienced dataloss (reported the bug). They have probably fixed it by now.
    After that, I switched to MooseFS, which works great (although it's not as advanced as Ceph). MooseFS has done a great job for a year now.

  107. Re:You Should... by jd · · Score: 1

    The US Government already backs up all the porn on the Internet, except for NASA Ames which backs up all the movies.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)