Easy, Reliable Distributed Storage and Backup?
RichiH writes "Most of you are the free IT staff of friends and family, just as I am. One of my largest headaches is backing up their data. What I am looking for allows for off-site storage on multiple server machines running Linux, has Linux & Windows clients that Just Work and require zero everyday effort (although a large-ish effort to set them up is just fine), allows for granular access control, is versioned and will, ideally, allow me to grab data automagically (think photo pool for your family where your mother, sister, etc., share each other's photos). This is something I've been trying to find for years, but I've never seen anything even closely resembling what I want. With the Wall Street Journal handing out its Technology Innovation Award to Cleversafe recently, I was once again reminded of this particular itch which needs scratching. Before I deploy it, I want to ask the Slashdot community for its opinion on that piece of software, and on potential alternatives. How do you solve this problem?"
Rename your data to 'Barely legal college girls having first time sex - XXX Vol1/256.r001' and use p2p to spread them all over the world!
I can tell you how I solve it in a business context, but whether or not it could be scaled down to personal I'm not sure.
The problem: 2 sites each with 70-100GB of data needs offsite backup with similar criteria to your own. Bandwidth available to these sites is 2-4Mbps. The only OS involved is Linux, though I'm sure Windows could be shoehorned in somehow. A third site which has a tape streamer and someone to take tapes offsite is available. Data protection legislation means that storing it with a hosted service is illegal unless I encrypt it myself before sending it offsite - I'm only aware of one tool which claims to be able to do this and still send data as a binary delta (it uses the rsync library) and that tool is still not particularly common in Linux distributions and not very widely used. I'm nervous of trusting my backups to a tool that isn't on heavy use, particularly if strong encryption is being employed.
The Solution: A server in the third site and some judicious scripting with rsync allows it to mirror the data in the other two sites. The first sync is fairly painful, of course, but provided you don't have too much data regularly changing subsequent syncs aren't too bad. The server is backed up to tape which provides versioning capability so if someone only realises that they lost a file a week after the fact it can still be restored,
Initial effort to set up was pretty great but now it's done it JFW and requires no brain power whatsoever to run on a daily basis. I can make the data available over the VPN (of course the access speed will be dog slow) more-or-less immediately and I can make it available at LAN speed by copying it to a hard disk and courier it to the remote office in under 48 hours. A full restore of 100GB across a 2Mbps connection will take at least 4-5 days.
Yes, because storing thousands of jpg images and other binary data is exactly what git was intended for. Get people to store their data on Samba fileservers. Set up home directories in their name as well as shared directories accessible by everybody or Samba groups. Use ACL if you need to. To backup, use rsync and OpenSSH, write a few batch scripts and hey - presto! Instant solution that'll even work with cheapo webhosts and your home linux box as backup servers. Versioning can be done for any amount of time by using rsync's backup feature, and you can allow people to browse old versions within Windows Explorer connected to a Samba share in that way.
You're asking two questions. The first is that you want backup, so that all their data just gets thrown somewhere and they lose the last few days' work their hard drive dies. You don't even necessarily want this on the network; just back up to a DVD-R every so often, and take every month's DVD-R offsite (a friend's house, a bank's vault, whatever). There's lots of backup software for this. Most can do fancy stuff like incremental backups. You can probably find something opensource you can host for your friends and family on a decently-available server.
The second question is networked file storage, where you don't care about automatically archiving files, but you do want frequent access and a good UI. For this I recommend something like Dropbox, which has good support for OS integration and a web interface.
Ars technica did a nice review of Dropbox, titled, "How Dropbox ended my search for seamless sync on Linux" (but it works on OSX 7 Windows too) http://arstechnica.com/news.ars/post/20080914-how-dropbox-ended-my-search-for-seamless-sync-on-linux.html
what's wrong with getting an account with Connected/Iron Mountain - easy to use intelligent online storage that doesn't cost a lot - saved my bacon many a time
Have you considered the JungleDisk client that works with the Amazon S3 storage cloud? This has backup clients for Windows, Linux, and Mac and with suitable configuration of 'buckets' would allow you to do most of what you are trying to achieve. Okay so it's a pay-for service (albeit cheap) but it does provide the all important off-siting, strong security/encryption and unlimited capacity.
"Only wimps use backup. Real men just upload their important stuff on ftp, and let the rest of the world mirror it."
God
I looked at Cleversafe, trying to get through the PR bubblespeak. It seems they are emulating disks, not offering integrated _backup_. As saving from my mom's SD card to a distributed online disk via a DSL line is not feasible, I will most likely need to scratch that idea.
Backup isn't the same as sharing. And do you want actual replication or merely fault tolerance to node failure? Actual n-fold replication means you're going to pay n times the amount of money for storage. And why do you insist on one application to do everything?
My suggestion: set up automatic backups to one of the many backup services on the net. They worry about how to replicate your data, you don't have to. For the same service to support both backup and sharing is hard and it's probably a bad idea. It's much easier if you know that the backup service simply cannot access the contents of any of your files.
For sharing, use services designed for that: Flickr Pro, Picasa, Google Docs, whatever. They are designed for sharing, they know about users and permissions, and they can only publish what you actually upload to them.
As for Cleversafe, the idea is as old as forward error correction, but the economics and management never seem to quite work out. And basically, you're getting the same functionality from hosted storage: Amazon, Google, Box.NET, etc. are already figuring out how to keep your data available and secure, and are probably doing a better job than you could do with a homebrew system.
No Linux client, AFAIK (though I do run it on my MBP). It's become rather impractical for me as a photographer though, as sometimes I'll shoot enough photos that my internet connection would be completely maxed out for days on end trying to sync up the new data - and I have a decent-for-cable 1Mbps upload rate.
rsync to Amazon S3 might be an option, if only for cross-platform capabilities. No versioning though, but outside of Apple's Time Machine (obviously useless for Windows and Linux), you're not going to get that without some major headache. Any remote system is going to be horribly slow for the first sync with any typical internet connection, and quite possibly problematically slow for photographers, media horaders, and in general people with big hard drives.
How are sites slashdotted when nobody reads TFAs?
The subject says it all:
- rdiff-backup to backup your data one backup server.
- chironfs to clone the file system to another remote server.
rdiff-backup runs on *nix and windows (with the help of Cygwin).
Once set up, rdiff-backup needs virtually no maintenance. If needed, setup Nagios to warn you if things run afoul.
Used this for years, never disappointed me so far!
Get 4 x 1TB disk and minimum RAID 6. Install Linux. Install rsnapshot, which offers:
* Filesystem snapshot - for local or remote systems.
* Database backup - MySQL backup
* Secure - Traffic between remote backup server is always encrypted using openssh
* Full backup - plus incrementals
* Easy to restore - Files can restored by the users who own them, without the root user getting involved.
* Automated backup - Runs in background via cron.
* Bandwidth friendly - rsync used to save bandwidth
You may also find CentOS or Debian tutorial useful.
Good luck!
Being easy to use (as in, not more than 3-5 mouse clicks, total) is one of my main concerns. Git definitely fails in this regard.
http://www.bacula.org/
Runs pretty tight (low bandwidth), supports channel encryption and datastore encryption, can even create Bare Metal Recovery disks. I have a server room with LTO3 tape drives that I use to backup my clients' incremental data changes nightly, including Linux, Mac and Windows clients and servers. I have VPN's out to each client, so don't use the built-in channel encryption, but I maintain a keypair for each client.
Backup only, but I /could/ present a maintained volume as a share over the VPN. Bacula supports disk and tape volumes as backup stores. I've personally had no need to do that to date.
We're not talking terabytes here - my ISP would pwn me if that was going on, but I do circa 20G of data changes every night from clients. Some of them are laptops that are not always on or connected. Most are friends and family PC's, so it backs up when it can. I have to do almost no maintenance apart from changing a tape occasionally. The backup client is tiny and unobtrusive, even when running. On Windows it uses VSS, so it is reliable.
I have had a number of panic phone calls (esp from my kids at Uni) who have lost a thesis or the like and are utterly amazed when, after a few clicks over the phone they look at their webmail and yesterday's version is in their inbox. That's what it's all about! I am the god of lost data! Which, of course, works for me.
actually, for my own digital assets repo - see signature - i see two features of git which might be handy, atomicity of commits and hashes which avoid storing duplicates. git has "plumbing" commands which might help. Still haven't explored it.
BTW if you have enough band you could do away with a doxroom instance on a host, don't forget to backup files and db and remember it's alpha quality.
---- MISSING MISCELLANEOUS DATA SEGMENT --- [sigdash] trolololol
For storing permissions and the such, are you using a .tar container? My biggest stumbling block with my backup scheme is storing ACLs and permissions.
I've got a few ideas about doing it, but they're all kludgy or force me to walk away from my rsync scripts which are really fairly mature at this point. Furthermore, I need to get deltas downstream and packing everything in to one file pretty much defeats that purpose at the several gig level unless I'm running an rsync server to calculate the diffs. These kinds of things become problematic due to the infrastructure I'm working with.
I'm really starting to lean towards running everything over iSCSI, but then I've got to get the VPN thing going which could require some re-subnetting at either end of the tunnel. Needless to say, I'd prefer to avoid that or any other solution that requires messing with stuff that Works Right Now.
Have you dealt with these issues at all, or at least know what won't work? I'd appreciate any insights before I use a brute force method.
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
Check out S3Backer. It lets you mount an Amazon S3 bucket to your Linux/Mac/BSD/*NIX box. GPL F/OSS as icing on the cake.
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
As well as all of the standard things you'd expect from a networked filesystem (ACLs, authentication, and so on).
If you set up an AFS cell with your volumes replicated across a few remote servers and get your clients to connect to this cell then it should be fine. Set a cron job to take regular snapshots, and dump them to some offline medium periodically.
I am TheRaven on Soylent News
cough ... JungleDisk ... cough
I think that the issue is faced by far more people than is readily apparent... it's the need for a VERY easy to use tool to share Our Stuff with Our Family. If my Mom and sisters were able to share all their photos with each other by carrying a USB drive around when they see each other... the most important thing they have on their computers would be backed up... the need for social file sharing is huge... we just don't have the tools to do it well yet. Something that does auto-discovery of stuff, remembers previous decisions, and just goes to work making copies in the right directions is what we need.
I just didn't want to deal with it. I use cloudbackup.openrsm.com and have them buy an account. It can do a whole network of Linux, MAC, and Windows machines with one account, or just a laptop. The client software is free and does network drive of the backup space too. I figure easy and my friends paying for it works. It's saved my butt a couple times too.
I'm surprised no one has mentioned Wuala - www.wua.la - which is a distributed online storage system. You agree to store (encrypted) bits of others' files in exchange for the ability to do so on others' machines across the wuala network. It's free and pretty damn cool. They can explain it better than I can: http://wua.la/en/learn/why
ObStdDisc: I work for the company I mention here... but suffice it to say that I left a very stable job to do so - so's to indicate that I do actually believe in the excellence of the product.
Keep an eye on Rebit. It doesn't do what you're asking about as of this moment... but (without treading into realms of "I'm not allowed to talk about that") I can safely say that the future holds some interesting things along this sort of direction.
Sig broken, watch for
I'd be happy to write a script that will handle that concern, but somebody else would have to do the UI unless you want it looking like it escaped from Windows 95.
If you are being serious..:
Afaik, Git supports Meta/recursive repos where I have one master repo with many subrepos. Thus, it would be best to have a master repo that contains all other repos. That will make replication easier.
The only other requirements would be that it adds all files in a given directory to repo foo and pulls repos bar, baz, quux. Preferably, it would happen automagically & regularly with a throttled connection. Requiring them to click a button in a butt-ugly app is fine, as well.
If Windows had cronjobs or I knew VB, I would do it myself, but..
_If_ you decide to do something like this, I will definitely give it a try. And it will finally give me a reason to poke Git :)
I use carbonite. Small app, I can have multiple machines within the same account, unlimited data for something like $49/year. I got it for a work machine - and it has already been used to retrieve deleted files (very painless process), liked it so much that I got it for a couple of the family machines that I support. I set it up for them and the only instructions they have to remember is "don't save tax returns under c:\windows\system32, save them under My Documents".
If the g'vt kept the data on you that google does you'd better believe you'd be calling it "doing evil"
I watched their CTO's Google Talks presentation and it was really interesting. I got all excited, joined their beta only to realize that they - IMO - misused the technology they had and designed a rather mediocre product. Wuala wants to be a backup tool, a sharing tool, a social networking medium as well as few other things. In other words it lacks focus and wants to do everything - an approach that rarely works.
3.243F6A8885A308D313
I'm perfectly serious. It's a useful app and a pretty easy problem. If you'll email me at CTO@Openmigration.net and let me know more about your specific requirements (number of remote hosts, total archive size, etc) I can start figuring out what the best way to do this is. Also, I'll need to know all of the platforms you're running on (will you need support on cell phones? Xbox?), the level of redundancy you're comfortable with, will you need a web interface, etc.
I would code it in Perl, but I do _not_ want to expose my mom to a CLI interface.
why not make a CGI and use the web as a GUI?! :-)
Even if your parents are non techy, they've probably had some experience browsing the web, so a web GUI shouldnt be too foreign to them.
couple that with something like CGI::Application and you've got a great framework in which to build and expand what you/your parents want/need.
-- $_='ab-bc ratvarre';tr"'a-z'"'n-za-m'";print