Easy, Reliable Distributed Storage and Backup?
RichiH writes "Most of you are the free IT staff of friends and family, just as I am. One of my largest headaches is backing up their data. What I am looking for allows for off-site storage on multiple server machines running Linux, has Linux & Windows clients that Just Work and require zero everyday effort (although a large-ish effort to set them up is just fine), allows for granular access control, is versioned and will, ideally, allow me to grab data automagically (think photo pool for your family where your mother, sister, etc., share each other's photos). This is something I've been trying to find for years, but I've never seen anything even closely resembling what I want. With the Wall Street Journal handing out its Technology Innovation Award to Cleversafe recently, I was once again reminded of this particular itch which needs scratching. Before I deploy it, I want to ask the Slashdot community for its opinion on that piece of software, and on potential alternatives. How do you solve this problem?"
Have a look at http://allmydata.org/trac/tahoe which might provide what you're looking while being way simpler to setup than Cleversafe.
what's wrong with getting an account with Connected/Iron Mountain - easy to use intelligent online storage that doesn't cost a lot - saved my bacon many a time
Get 4 x 1TB disk and minimum RAID 6. Install Linux. Install rsnapshot, which offers:
* Filesystem snapshot - for local or remote systems.
* Database backup - MySQL backup
* Secure - Traffic between remote backup server is always encrypted using openssh
* Full backup - plus incrementals
* Easy to restore - Files can restored by the users who own them, without the root user getting involved.
* Automated backup - Runs in background via cron.
* Bandwidth friendly - rsync used to save bandwidth
You may also find CentOS or Debian tutorial useful.
Good luck!
actually, for my own digital assets repo - see signature - i see two features of git which might be handy, atomicity of commits and hashes which avoid storing duplicates. git has "plumbing" commands which might help. Still haven't explored it.
BTW if you have enough band you could do away with a doxroom instance on a host, don't forget to backup files and db and remember it's alpha quality.
---- MISSING MISCELLANEOUS DATA SEGMENT --- [sigdash] trolololol
As well as all of the standard things you'd expect from a networked filesystem (ACLs, authentication, and so on).
If you set up an AFS cell with your volumes replicated across a few remote servers and get your clients to connect to this cell then it should be fine. Set a cron job to take regular snapshots, and dump them to some offline medium periodically.
I am TheRaven on Soylent News
Datto's new line uses ZFS with snapshots. If they're willing to spend a couple hundred bucks, it's a really easy (and foolproof) solution.
I'm surprised no one has mentioned Wuala - www.wua.la - which is a distributed online storage system. You agree to store (encrypted) bits of others' files in exchange for the ability to do so on others' machines across the wuala network. It's free and pretty damn cool. They can explain it better than I can: http://wua.la/en/learn/why
Data protection legislation means that storing it with a hosted service is illegal unless I encrypt it myself before sending it offsite - I'm only aware of one tool which claims to be able to do this and still send data as a binary delta (it uses the rsync library) and that tool is still not particularly common in Linux distributions and not very widely used. Based on my limited understanding of crypto, when you encrypt data it should turn into pseudo-random noise, so if *any* bits change the whole thing changes (unless you're doing a block-cypher, but if it's chained-block then every portion *after* that will also change). So for large files, this seems like the delta would end up being practically the entire file, wouldn't it?
ObStdDisc: I work for the company I mention here... but suffice it to say that I left a very stable job to do so - so's to indicate that I do actually believe in the excellence of the product.
Keep an eye on Rebit. It doesn't do what you're asking about as of this moment... but (without treading into realms of "I'm not allowed to talk about that") I can safely say that the future holds some interesting things along this sort of direction.
Sig broken, watch for
I'd be happy to write a script that will handle that concern, but somebody else would have to do the UI unless you want it looking like it escaped from Windows 95.
If you are being serious..:
Afaik, Git supports Meta/recursive repos where I have one master repo with many subrepos. Thus, it would be best to have a master repo that contains all other repos. That will make replication easier.
The only other requirements would be that it adds all files in a given directory to repo foo and pulls repos bar, baz, quux. Preferably, it would happen automagically & regularly with a throttled connection. Requiring them to click a button in a butt-ugly app is fine, as well.
If Windows had cronjobs or I knew VB, I would do it myself, but..
_If_ you decide to do something like this, I will definitely give it a try. And it will finally give me a reason to poke Git :)
I'm perfectly serious. It's a useful app and a pretty easy problem. If you'll email me at CTO@Openmigration.net and let me know more about your specific requirements (number of remote hosts, total archive size, etc) I can start figuring out what the best way to do this is. Also, I'll need to know all of the platforms you're running on (will you need support on cell phones? Xbox?), the level of redundancy you're comfortable with, will you need a web interface, etc.