Slashdot Mirror


Ask Slashdot: How Do You Manage Your Personal Data?

New submitter multimediavt writes "Ok, here's my problem. I have a lot of personal data! (And, no, it's not pr0n, warez, or anything the MPAA or RIAA would be concerned about.) I am realizing that I need to keep at least one spare drive the same size as my largest drive around in case of failure, or the need to reformat a drive due to corrupt file system issues. In my particular case I have a few external drives ranging in size from 200 GB to 2 TB (none with any more than 15 available), and the 2 TB drive is giving me fits at the moment so I need to move the data off and reformat the drive to see if it's just a file system issue or a component issue. I don't have 1.6 TB of free space anywhere and came to the above realization that an empty spare drive the size of my largest drive was needed. If I had a RAID I would have the same needs should a drive fail for some reason and the file system needed rebuilding. I am hitting a wall, and I am guessing that I am not the only one reaching this conclusion. This is my personal data and it is starting to become unbelievably unruly to deal with as far as data integrity and security are concerned. This problem is only going to get worse, and I'm sorry 'The Cloud' is not an acceptable nor practical solution. Tape for an individual as a backup mechanism is economically not feasible. Blu-ray Disc only holds 50 GB at best case and takes forever to backup any large amount of data, along with a great deal of human intervention in the process. So, as an individual with a large data collection and not a large budget, what do you see as options for now (other than keeping a spare blank drive around), and what do you see down the road that might help us deal with issues like this?"

7 of 414 comments (clear)

  1. Buddy NAS by Anonymous Coward · · Score: 5, Interesting

    I have a solution I call the "Buddy NAS". Go out and get two cheap computers. It could be a PC or a mini-NAS or a low-end server. Anything that will hold multiple hard drives. You jam both full of hard disks and use them as a backup/NAS server. One PC is kept at your place, the other at your friend's house.

    Both computers have an account for you and an account for your friend (it helps if your friend is nerdy and "gets" backup solutions). Both of you now have a backup solution in your own home and a remote backup server at a friend's place. Two copies of your data, one remote. Basically it's like having local and cloud storage for you and your friend and it'll cost less than a grand if you shop around. If neither of you have static IPs you can use dyndns.org to connect to the remote boxes. Bandwidth shouldn't be an issue if you use rsync to backup changed files nightly.

  2. Re:RAID array on a spare box by swalve · · Score: 4, Interesting

    That's not a bad idea. I started with the OP's problem, trying to keep data from multiple machines in sync and backed up and with enough room to spare. After having spent more weekends copying data back and forth to clear out a drive in order to replace it, I decided to go to the fileserver paradigm. I built a machine with three 40gb drives RAIDed together and made that the only place useful data would be stored. I've since expanded it up to 3tb in various increments, and it has worked well. It has saved tons of time and money by allowing my computers to use whatever cheap harddrive was available and just restore from backup when it went TU. But with the need for increased data availability outside my house (IE, making my notebook my main computer), I'm starting to reverse course and move to your idea. Using robocopy on the clients and shell scripts + hard links on the server, I've set up a workable versioning backup system that doesn't take up too much space.

    I also use Dropbox for some stuff.

  3. Data Hoarding and my solution by IndustrialComplex · · Score: 3, Interesting

    First, let's look at your problem: You are gathering too much data. Either the data is 100% needed and irreplaceable, or it isn't. If it isn't, your first step is to treat your data just like you would physical junk that accumulates in your house.

    Create Three folders.
    1. Critical Keep
    2. Unsure
    3. Toss

    Go through your data and MOVE it to one of those three folders. If it isn't critically important data that you would be upset that you lost and can't be recreated (wedding videos, etc) It goes in the Critical Keep folder. If you aren't sure about it right now, but you can't declare it for folder 1, put it in 2. Anything else "old install files, backup data from a windows 98 machine, etc" That stuff can be deleted. Be harsh with yourself. Think of it like moving from house to house, if you haven't opened that box by your third move, just toss it in folder 3.

    Repeat the process until you either have everything in your Critical Keep folder, or your delete folder.

    Now, hopefully you have reduced the size of the data you are using to something marginally manageable. I'm a data hoarder, and I've managed to keep the rate of growth of my data to lag behind the general rate of growth of HDD capacity. Now for the fun stuff:

    Two things you want to avoid.

    1. Loss due to a dying disk
    2. Loss due to a destroyed home (fire, theft, etc)

    Here was my budget solution that resulted in a fire and forget backup system that is suitable for a home user and is about as minimal as you can get for cost.

    3 Disk Drives.

    A primary drive to run the operating system and hold installed programs and two LARGE data drives in a RAID1 configuration.

    Static data files (Video, pictures, etc) get stored on the RAID1

    A scheduled process (once per month for me) backs up the OS drive to a virtual HD file on the RAID1. The files on the RAID1 are then backed up to a cloud storage service (Carbonite in my case).

    So, what is the result of this?

    My operating environment is backed up monthly. The only thing I lose here is configuration changes or programs installed since the last backup (less than 30 days for me)

    The RAID1 ensures that my personal/static data is protected from a single disk failure, and helps a bit with read performance for the static (and large) files.

    Should a cataclysmic failure occur and my entire computer is lost to something like a fire, remember that I've been sending what is on the RAID0 out to the cloud (carbonite), so when I can rebuild a computer I can just download the (very large) offsite backup from the cloud to my new machine.

    The downsides I have right now:
    1. I maintain the windows backup as a VHD file because it allows me to ensure that the backup data is 'packaged'. I don't know the exact details about windows backup, but given that Carbonite sometimes excludes system files I didn't want to risk an important hidden/system file being missed in the backup. In addition I didn't like how it could only backup to the root folder of a drive. The downside is that the resulting 100GB file is a pain to backup, which is why I restrict the backup histerisis to 30days (previously I had it backup every 3 days) This keeps it from continually uploading the VHD file to carbonite.

    2. The HDDs for the raid1 lose half their total capacity in that configuration. I used it because it let me only have to use 2 drives and the performance boost. If you can afford 3 drives, go for a RAID5.

    3. Most Motherboards support RAID natively now. However, I understand that you can run into issues with hardware RAID if you have to switch to a different hardware solution. I haven't tested this, but it could potentially be an issue if you use a RAID5 from hardware and your motherboard fails and you can't replace it with an exact model. The good news here though, is if you have been backing up to the cloud, typically it's done on a per file basis, and thus you don't have to worry about this. Just download your stuff ba

    --
    Out of modpoints but really liked a post? 1BDkF6TtmmeZ3yqXbz9yhdYVqRYnwFoXDj
  4. Re:Solution.. buy hard drives! by AngryDeuce · · Score: 4, Interesting

    Honestly, it's my Western Digital drives that have lasted the longest. My dad is still rocking several single digit GB capacity WD drives actively in his legacy tower, and I've yet to have one die on me. Not to say I haven't replaced them as their capacity becomes outdated, but I've had much better luck with them than Maxtor (the worst brand I've ever used), which is now a part of Seagate, which I've also had a couple fail on me (but nowhere near as bad as Maxtor).

    I've never used Hitachi or Samsung or any other brand that I know of, so I can't speak as to their quality, but I'm sticking with Western Digital.

  5. Use a NAS with backup by AliasMarlowe · · Score: 5, Interesting
    What I did some years ago was recognize that "manual backups" were not done often enough, and important stuff was scattered around a few PCs. So I got a NAS, stuck a pair of disks into it (RAID 0 for speed), and set up its automated incremental backup to run 3 times per week to an external USB drive. The PCs now mount the NAS at login, and that's where all data files are stored by default (even the kids use it).

    We're up to 2 NAS units now, with 7TB[*] of disk space between them, all backed up on schedule. The USB backup drives are rotated every few weeks with another set kept in a secure place in the garage.

    [*] One NAS unit doubles up as media server, so it's got a load of movies & music in addition to user files in its 6TB. The other one is our web server and email server with only 1TB of disk space.

    --
    Those who can make you believe absurdities can make you commit atrocities. - Voltaire
  6. rsnapshot + raid6 on server in basement by Janek+Kozicki · · Score: 3, Interesting

    My solution to this problem is painfully simple: about 5 years ago I bought 5 drives 500GB each. I have put a server (made from old parts, like pentium IV and so on) in the basement (where nobody hears it, and it can be as noisy as hell). I installed debian on it and configured cron to call rsnapshot three times per day for doing automatic backups of all PCs in my family. I never touched this machine since then.

    With one exception: 3 years ago I started to run out of space, so I bought 2 HDDs 2 TB each, reconfigured raid6, which was extremely easy because for raid I am using mdadm, which supports such operations online. Also I had few more spare drives during the years, so I kept adding them to the array, and currently there are 9 HDDs in this PC. It is very noisy, but nobody cares about that.

    It runs flawlessy, untouched for years, and nobody cares about it, except for when somebody in my family accidentally loses or deletes a file. Then suddenly backup comes very handy.

    Rsnapshot is especially good, because it keeps hardlinked copies of data from last week, 2 weeks ago, last month, and much more, depending on how you configure /etc/rsnapshot.conf. Currently I have backups dating back about 2 years, with granularity of 1 month. And it only occupies the space on HDD to reflect the changes between data, thanks to hardlinks.

    So my raid6 array has total size about 4TB and still 500GB free. And I feel this will last at least a year or two. In case of problems I can start deleting copies that are more than 1 year old. While most recent snapshot uses about 2 TB or such.

    Rsnapshot also can backup windows machines, so you don't need to worry about compatibility. Though I don't have windows machines and I don't test that in practice ;)

    --
    #
    #\ @ ? Colonize Mars
    #
  7. Data integrity by thereitis · · Score: 3, Interesting

    This is my personal data and it is starting to become unbelievably unruly to deal with as far as data integrity and security are concerned.

    Keep all your important files in a version control system. Personally, I use Perforce (it's free for 2 users or less). That gives you: multi-revision history and checkin comments, an easy way to pull a subset of files to any computer in your house, and peace of mind that you don't need to worry about kids deleting anything important as it's all stored on the server with history. Also easy to see what has changed on any computer and check those files in. And there's a big win for data integrity checks: Perforce stores the checksum of all files (and revisions) and can easily check that every file still matches the checksum in the central database. If you have any disk corruption, you'll know about it when you run 'p4 verify -q //...'. You can store files of several gigabytes each with no problem.

    On top of this, I use rsync to copy the server data onto backup drives. I'm also looking at storing backups online, but haven't taken that step yet.

    I've been using this system for years and I couldn't imagine being without it. It's so easy to find and retrieve exactly what I want - my resume 5 revisions ago, my tax return, photos from 2003. Even without that, the data integrity checks give a lot of peace of mind.